[Gluster-users] RDMA Problems with GlusterFS 3.1.1

Jeremy Stout stout.jeremy at gmail.com
Fri Dec 3 02:38:00 UTC 2010


I'm currently using OFED 1.5.2.

For the sake of testing, I just compiled GlusterFS 3.1.1 from source,
without any modifications, on two systems that have a 2.6.33.7 kernel
and OFED 1.5.2 built from source. Here are the results:

Server:
[2010-12-02 21:17:55.886563] I
[glusterd-handler.c:936:glusterd_handle_cli_start_volume] glusterd:
Received start vol reqfor volume testdir
[2010-12-02 21:17:55.886597] I [glusterd-utils.c:232:glusterd_lock]
glusterd: Cluster lock held by 7dd23af5-277e-4ea1-a495-2a9d882287ec
[2010-12-02 21:17:55.886607] I
[glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired
local lock
[2010-12-02 21:17:55.886628] I
[glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock
req to 0 peers
[2010-12-02 21:17:55.887031] I
[glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req
to 0 peers
[2010-12-02 21:17:56.60427] I
[glusterd-utils.c:971:glusterd_volume_start_glusterfs] : About to
start glusterfs for brick submit-1:/mnt/gluster
[2010-12-02 21:17:56.104896] I
[glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req
to 0 peers
[2010-12-02 21:17:56.104935] I
[glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent
unlock req to 0 peers
[2010-12-02 21:17:56.104953] I
[glusterd-op-sm.c:4738:glusterd_op_txn_complete] glusterd: Cleared
local lock
[2010-12-02 21:17:56.114764] I
[glusterd-pmap.c:281:pmap_registry_remove] pmap: removing brick (null)
on port 24009

Client:
[2010-12-02 21:17:25.503395] W [io-stats.c:1644:init] testdir:
dangling volume. check volfile
[2010-12-02 21:17:25.503434] W [dict.c:1204:data_to_str] dict: @data=(nil)
[2010-12-02 21:17:25.503447] W [dict.c:1204:data_to_str] dict: @data=(nil)
[2010-12-02 21:17:25.543409] E [rdma.c:2066:rdma_create_cq]
rpc-transport/rdma: testdir-client-0: creation of send_cq failed
[2010-12-02 21:17:25.543660] E [rdma.c:3771:rdma_get_device]
rpc-transport/rdma: testdir-client-0: could not create CQ
[2010-12-02 21:17:25.543725] E [rdma.c:3957:rdma_init]
rpc-transport/rdma: could not create rdma device for mthca0
[2010-12-02 21:17:25.543812] E [rdma.c:4789:init] testdir-client-0:
Failed to initialize IB Device
[2010-12-02 21:17:25.543830] E
[rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma'
initialization failed

Thank you for the help so far.

On Thu, Dec 2, 2010 at 8:02 PM, Craig Carl <craig at gluster.com> wrote:
> Jeremy -
>   What version of OFED are you running? Would you mind install version 1.5.2
> from source? We have seen this resolve several issues of this type.
> http://www.openfabrics.org/downloads/OFED/ofed-1.5.2/
>
>
> Thanks,
>
> Craig
>
> -->
> Craig Carl
> Senior Systems Engineer
> Gluster
>
>
> On 12/02/2010 10:05 AM, Jeremy Stout wrote:
>>
>> An another follow-up, I tested several compilations today with
>> different values for send/receive count. I found the maximum value I
>> could use for both variables was 127. With a value of 127, GlusterFS
>> did not produce any errors. However, when I changed the value back to
>> 128, the RDMA errors appeared again.
>>
>> I also tried setting soft/hard "memlock" to unlimited in the
>> limits.conf file, but still ran into RDMA errors on the client side
>> when the count variables were set to 128.
>>
>> On Thu, Dec 2, 2010 at 9:04 AM, Jeremy Stout<stout.jeremy at gmail.com>
>>  wrote:
>>>
>>> Thank you for the response. I've been testing GlusterFS 3.1.1 on two
>>> different OpenSUSE 11.3 systems. Since both systems generated the same
>>> error messages, I'll include the output for both.
>>>
>>> System #1:
>>> fs-1:~ # cat /proc/meminfo
>>> MemTotal:       16468756 kB
>>> MemFree:        16126680 kB
>>> Buffers:           15680 kB
>>> Cached:           155860 kB
>>> SwapCached:            0 kB
>>> Active:            65228 kB
>>> Inactive:         123100 kB
>>> Active(anon):      18632 kB
>>> Inactive(anon):       48 kB
>>> Active(file):      46596 kB
>>> Inactive(file):   123052 kB
>>> Unevictable:        1988 kB
>>> Mlocked:            1988 kB
>>> SwapTotal:             0 kB
>>> SwapFree:              0 kB
>>> Dirty:             30072 kB
>>> Writeback:             4 kB
>>> AnonPages:         18780 kB
>>> Mapped:            12136 kB
>>> Shmem:               220 kB
>>> Slab:              39592 kB
>>> SReclaimable:      13108 kB
>>> SUnreclaim:        26484 kB
>>> KernelStack:        2360 kB
>>> PageTables:         2036 kB
>>> NFS_Unstable:          0 kB
>>> Bounce:                0 kB
>>> WritebackTmp:          0 kB
>>> CommitLimit:     8234376 kB
>>> Committed_AS:     107304 kB
>>> VmallocTotal:   34359738367 kB
>>> VmallocUsed:      314316 kB
>>> VmallocChunk:   34349860776 kB
>>> HardwareCorrupted:     0 kB
>>> HugePages_Total:       0
>>> HugePages_Free:        0
>>> HugePages_Rsvd:        0
>>> HugePages_Surp:        0
>>> Hugepagesize:       2048 kB
>>> DirectMap4k:        9856 kB
>>> DirectMap2M:     3135488 kB
>>> DirectMap1G:    13631488 kB
>>>
>>> fs-1:~ # uname -a
>>> Linux fs-1 2.6.32.25-November2010 #2 SMP PREEMPT Mon Nov 1 15:19:55
>>> EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> fs-1:~ # ulimit -l
>>> 64
>>>
>>> System #2:
>>> submit-1:~ # cat /proc/meminfo
>>> MemTotal:       16470424 kB
>>> MemFree:        16197292 kB
>>> Buffers:           11788 kB
>>> Cached:            85492 kB
>>> SwapCached:            0 kB
>>> Active:            39120 kB
>>> Inactive:          76548 kB
>>> Active(anon):      18532 kB
>>> Inactive(anon):       48 kB
>>> Active(file):      20588 kB
>>> Inactive(file):    76500 kB
>>> Unevictable:           0 kB
>>> Mlocked:               0 kB
>>> SwapTotal:      67100656 kB
>>> SwapFree:       67100656 kB
>>> Dirty:                24 kB
>>> Writeback:             0 kB
>>> AnonPages:         18408 kB
>>> Mapped:            11644 kB
>>> Shmem:               184 kB
>>> Slab:              34000 kB
>>> SReclaimable:       8512 kB
>>> SUnreclaim:        25488 kB
>>> KernelStack:        2160 kB
>>> PageTables:         1952 kB
>>> NFS_Unstable:          0 kB
>>> Bounce:                0 kB
>>> WritebackTmp:          0 kB
>>> CommitLimit:    75335868 kB
>>> Committed_AS:     105620 kB
>>> VmallocTotal:   34359738367 kB
>>> VmallocUsed:       76416 kB
>>> VmallocChunk:   34359652640 kB
>>> HardwareCorrupted:     0 kB
>>> HugePages_Total:       0
>>> HugePages_Free:        0
>>> HugePages_Rsvd:        0
>>> HugePages_Surp:        0
>>> Hugepagesize:       2048 kB
>>> DirectMap4k:        7488 kB
>>> DirectMap2M:    16769024 kB
>>>
>>> submit-1:~ # uname -a
>>> Linux submit-1 2.6.33.7-November2010 #1 SMP PREEMPT Mon Nov 8 13:49:00
>>> EST 2010 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> submit-1:~ # ulimit -l
>>> 64
>>>
>>> I retrieved the memory information on each machine after starting the
>>> glusterd process.
>>>
>>> On Thu, Dec 2, 2010 at 3:51 AM, Raghavendra G<raghavendra at gluster.com>
>>>  wrote:
>>>>
>>>> Hi Jeremy,
>>>>
>>>> can you also get the output of,
>>>>
>>>> #uname -a
>>>>
>>>> #ulimit -l
>>>>
>>>> regards,
>>>> ----- Original Message -----
>>>> From: "Raghavendra G"<raghavendra at gluster.com>
>>>> To: "Jeremy Stout"<stout.jeremy at gmail.com>
>>>> Cc: gluster-users at gluster.org
>>>> Sent: Thursday, December 2, 2010 10:20:04 AM
>>>> Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
>>>>
>>>> Hi Jeremy,
>>>>
>>>> In order to diagnoise why completion queue creation is failing (as
>>>> indicated by logs), we want to know what was the free memory available in
>>>> your system when glusterfs was started.
>>>>
>>>> regards,
>>>> ----- Original Message -----
>>>> From: "Raghavendra G"<raghavendra at gluster.com>
>>>> To: "Jeremy Stout"<stout.jeremy at gmail.com>
>>>> Cc: gluster-users at gluster.org
>>>> Sent: Thursday, December 2, 2010 10:11:18 AM
>>>> Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
>>>>
>>>> Hi Jeremy,
>>>>
>>>> Yes, there might be some performance decrease. But, it should not affect
>>>> working of rdma.
>>>>
>>>> regards,
>>>> ----- Original Message -----
>>>> From: "Jeremy Stout"<stout.jeremy at gmail.com>
>>>> To: gluster-users at gluster.org
>>>> Sent: Thursday, December 2, 2010 8:30:20 AM
>>>> Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1
>>>>
>>>> As an update to my situation, I think I have GlusterFS 3.1.1 working
>>>> now. I was able to create and mount RDMA volumes without any errors.
>>>>
>>>> To fix the problem, I had to make the following changes on lines 3562
>>>> and 3563 in rdma.c:
>>>> options->send_count = 32;
>>>> options->recv_count = 32;
>>>>
>>>> The values were set to 128.
>>>>
>>>> I'll run some tests tomorrow to verify that it is working correctly.
>>>> Assuming it does, what would be the expected side-effect of changing
>>>> the values from 128 to 32? Will there be a decrease in performance?
>>>>
>>>>
>>>> On Wed, Dec 1, 2010 at 10:07 AM, Jeremy Stout<stout.jeremy at gmail.com>
>>>>  wrote:
>>>>>
>>>>> Here are the results of the test:
>>>>> submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs #
>>>>> ibv_srq_pingpong
>>>>>  local address:  LID 0x0002, QPN 0x000406, PSN 0x703b96, GID ::
>>>>>  local address:  LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID ::
>>>>>  local address:  LID 0x0002, QPN 0x000408, PSN 0xd62272, GID ::
>>>>>  local address:  LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID ::
>>>>>  local address:  LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID ::
>>>>>  local address:  LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID ::
>>>>>  local address:  LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID ::
>>>>>  local address:  LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID ::
>>>>>  local address:  LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID ::
>>>>>  local address:  LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID ::
>>>>>  local address:  LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID ::
>>>>>  local address:  LID 0x0002, QPN 0x000411, PSN 0x89c683, GID ::
>>>>>  local address:  LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID ::
>>>>>  local address:  LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID ::
>>>>>  local address:  LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID ::
>>>>>  local address:  LID 0x0002, QPN 0x000415, PSN 0x29562e, GID ::
>>>>>  remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID ::
>>>>>  remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID ::
>>>>>  remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID ::
>>>>>  remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID ::
>>>>>  remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID ::
>>>>>  remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID ::
>>>>>  remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID ::
>>>>>  remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID ::
>>>>>  remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID ::
>>>>>  remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID ::
>>>>>  remote address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID ::
>>>>>  remote address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID ::
>>>>>  remote address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID ::
>>>>>  remote address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID ::
>>>>>  remote address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID ::
>>>>>  remote address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID ::
>>>>> 8192000 bytes in 0.01 seconds = 5917.47 Mbit/sec
>>>>> 1000 iters in 0.01 seconds = 11.07 usec/iter
>>>>>
>>>>> fs-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong
>>>>> submit-1
>>>>>  local address:  LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID ::
>>>>>  local address:  LID 0x000b, QPN 0x000407, PSN 0x173320, GID ::
>>>>>  local address:  LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID ::
>>>>>  local address:  LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID ::
>>>>>  local address:  LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID ::
>>>>>  local address:  LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID ::
>>>>>  local address:  LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID ::
>>>>>  local address:  LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID ::
>>>>>  local address:  LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID ::
>>>>>  local address:  LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID ::
>>>>>  local address:  LID 0x000b, QPN 0x000410, PSN 0xd50228, GID ::
>>>>>  local address:  LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID ::
>>>>>  local address:  LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID ::
>>>>>  local address:  LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID ::
>>>>>  local address:  LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID ::
>>>>>  local address:  LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID ::
>>>>>  remote address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID ::
>>>>>  remote address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID ::
>>>>>  remote address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID ::
>>>>>  remote address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID ::
>>>>>  remote address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID ::
>>>>>  remote address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID ::
>>>>>  remote address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID ::
>>>>>  remote address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID ::
>>>>>  remote address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID ::
>>>>>  remote address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID ::
>>>>>  remote address: LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID ::
>>>>>  remote address: LID 0x0002, QPN 0x000411, PSN 0x89c683, GID ::
>>>>>  remote address: LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID ::
>>>>>  remote address: LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID ::
>>>>>  remote address: LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID ::
>>>>>  remote address: LID 0x0002, QPN 0x000415, PSN 0x29562e, GID ::
>>>>> 8192000 bytes in 0.01 seconds = 7423.65 Mbit/sec
>>>>> 1000 iters in 0.01 seconds = 8.83 usec/iter
>>>>>
>>>>> Based on the output, I believe it ran correctly.
>>>>>
>>>>> On Wed, Dec 1, 2010 at 9:51 AM, Anand Avati<anand.avati at gmail.com>
>>>>>  wrote:
>>>>>>
>>>>>> Can you verify that ibv_srq_pingpong works from the server where this
>>>>>> log
>>>>>> file is from?
>>>>>>
>>>>>> Thanks,
>>>>>> Avati
>>>>>>
>>>>>> On Wed, Dec 1, 2010 at 7:44 PM, Jeremy Stout<stout.jeremy at gmail.com>
>>>>>>  wrote:
>>>>>>>
>>>>>>> Whenever I try to start or mount a GlusterFS 3.1.1 volume that uses
>>>>>>> RDMA, I'm seeing the following error messages in the log file on the
>>>>>>> server:
>>>>>>> [2010-11-30 18:37:53.51270] I [nfs.c:652:init] nfs: NFS service
>>>>>>> started
>>>>>>> [2010-11-30 18:37:53.51362] W [dict.c:1204:data_to_str] dict:
>>>>>>> @data=(nil)
>>>>>>> [2010-11-30 18:37:53.51375] W [dict.c:1204:data_to_str] dict:
>>>>>>> @data=(nil)
>>>>>>> [2010-11-30 18:37:53.59628] E [rdma.c:2066:rdma_create_cq]
>>>>>>> rpc-transport/rdma: testdir-client-0: creation of send_cq failed
>>>>>>> [2010-11-30 18:37:53.59851] E [rdma.c:3771:rdma_get_device]
>>>>>>> rpc-transport/rdma: testdir-client-0: could not create CQ
>>>>>>> [2010-11-30 18:37:53.59925] E [rdma.c:3957:rdma_init]
>>>>>>> rpc-transport/rdma: could not create rdma device for mthca0
>>>>>>> [2010-11-30 18:37:53.60009] E [rdma.c:4789:init] testdir-client-0:
>>>>>>> Failed to initialize IB Device
>>>>>>> [2010-11-30 18:37:53.60030] E
>>>>>>> [rpc-transport.c:971:rpc_transport_load]
>>>>>>> rpc-transport: 'rdma' initialization failed
>>>>>>>
>>>>>>> On the client, I see:
>>>>>>> [2010-11-30 18:43:49.653469] W [io-stats.c:1644:init] testdir:
>>>>>>> dangling volume. check volfile
>>>>>>> [2010-11-30 18:43:49.653573] W [dict.c:1204:data_to_str] dict:
>>>>>>> @data=(nil)
>>>>>>> [2010-11-30 18:43:49.653607] W [dict.c:1204:data_to_str] dict:
>>>>>>> @data=(nil)
>>>>>>> [2010-11-30 18:43:49.736275] E [rdma.c:2066:rdma_create_cq]
>>>>>>> rpc-transport/rdma: testdir-client-0: creation of send_cq failed
>>>>>>> [2010-11-30 18:43:49.736651] E [rdma.c:3771:rdma_get_device]
>>>>>>> rpc-transport/rdma: testdir-client-0: could not create CQ
>>>>>>> [2010-11-30 18:43:49.736689] E [rdma.c:3957:rdma_init]
>>>>>>> rpc-transport/rdma: could not create rdma device for mthca0
>>>>>>> [2010-11-30 18:43:49.736805] E [rdma.c:4789:init] testdir-client-0:
>>>>>>> Failed to initialize IB Device
>>>>>>> [2010-11-30 18:43:49.736841] E
>>>>>>> [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma'
>>>>>>> initialization failed
>>>>>>>
>>>>>>> This results in an unsuccessful mount.
>>>>>>>
>>>>>>> I created the mount using the following commands:
>>>>>>> /usr/local/glusterfs/3.1.1/sbin/gluster volume create testdir
>>>>>>> transport rdma submit-1:/exports
>>>>>>> /usr/local/glusterfs/3.1.1/sbin/gluster volume start testdir
>>>>>>>
>>>>>>> To mount the directory, I use:
>>>>>>> mount -t glusterfs submit-1:/testdir /mnt/glusterfs
>>>>>>>
>>>>>>> I don't think it is an Infiniband problem since GlusterFS 3.0.6 and
>>>>>>> GlusterFS 3.1.0 worked on the same systems. For GlusterFS 3.1.0, the
>>>>>>> commands listed above produced no error messages.
>>>>>>>
>>>>>>> If anyone can provide help with debugging these error messages, it
>>>>>>> would be appreciated.
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>



More information about the Gluster-users mailing list