[Gluster-users] RDMA Problems with GlusterFS 3.1.1

Jeremy Stout stout.jeremy at gmail.com
Wed Dec 1 15:07:30 UTC 2010


Here are the results of the test:
submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong
  local address:  LID 0x0002, QPN 0x000406, PSN 0x703b96, GID ::
  local address:  LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID ::
  local address:  LID 0x0002, QPN 0x000408, PSN 0xd62272, GID ::
  local address:  LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID ::
  local address:  LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID ::
  local address:  LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID ::
  local address:  LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID ::
  local address:  LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID ::
  local address:  LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID ::
  local address:  LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID ::
  local address:  LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID ::
  local address:  LID 0x0002, QPN 0x000411, PSN 0x89c683, GID ::
  local address:  LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID ::
  local address:  LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID ::
  local address:  LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID ::
  local address:  LID 0x0002, QPN 0x000415, PSN 0x29562e, GID ::
  remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID ::
  remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID ::
  remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID ::
  remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID ::
  remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID ::
  remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID ::
  remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID ::
  remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID ::
  remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID ::
  remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID ::
  remote address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID ::
  remote address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID ::
  remote address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID ::
  remote address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID ::
  remote address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID ::
  remote address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID ::
8192000 bytes in 0.01 seconds = 5917.47 Mbit/sec
1000 iters in 0.01 seconds = 11.07 usec/iter

fs-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong submit-1
  local address:  LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID ::
  local address:  LID 0x000b, QPN 0x000407, PSN 0x173320, GID ::
  local address:  LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID ::
  local address:  LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID ::
  local address:  LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID ::
  local address:  LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID ::
  local address:  LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID ::
  local address:  LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID ::
  local address:  LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID ::
  local address:  LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID ::
  local address:  LID 0x000b, QPN 0x000410, PSN 0xd50228, GID ::
  local address:  LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID ::
  local address:  LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID ::
  local address:  LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID ::
  local address:  LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID ::
  local address:  LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID ::
  remote address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID ::
  remote address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID ::
  remote address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID ::
  remote address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID ::
  remote address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID ::
  remote address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID ::
  remote address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID ::
  remote address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID ::
  remote address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID ::
  remote address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID ::
  remote address: LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID ::
  remote address: LID 0x0002, QPN 0x000411, PSN 0x89c683, GID ::
  remote address: LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID ::
  remote address: LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID ::
  remote address: LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID ::
  remote address: LID 0x0002, QPN 0x000415, PSN 0x29562e, GID ::
8192000 bytes in 0.01 seconds = 7423.65 Mbit/sec
1000 iters in 0.01 seconds = 8.83 usec/iter

Based on the output, I believe it ran correctly.

On Wed, Dec 1, 2010 at 9:51 AM, Anand Avati <anand.avati at gmail.com> wrote:
> Can you verify that ibv_srq_pingpong works from the server where this log
> file is from?
>
> Thanks,
> Avati
>
> On Wed, Dec 1, 2010 at 7:44 PM, Jeremy Stout <stout.jeremy at gmail.com> wrote:
>>
>> Whenever I try to start or mount a GlusterFS 3.1.1 volume that uses
>> RDMA, I'm seeing the following error messages in the log file on the
>> server:
>> [2010-11-30 18:37:53.51270] I [nfs.c:652:init] nfs: NFS service started
>> [2010-11-30 18:37:53.51362] W [dict.c:1204:data_to_str] dict: @data=(nil)
>> [2010-11-30 18:37:53.51375] W [dict.c:1204:data_to_str] dict: @data=(nil)
>> [2010-11-30 18:37:53.59628] E [rdma.c:2066:rdma_create_cq]
>> rpc-transport/rdma: testdir-client-0: creation of send_cq failed
>> [2010-11-30 18:37:53.59851] E [rdma.c:3771:rdma_get_device]
>> rpc-transport/rdma: testdir-client-0: could not create CQ
>> [2010-11-30 18:37:53.59925] E [rdma.c:3957:rdma_init]
>> rpc-transport/rdma: could not create rdma device for mthca0
>> [2010-11-30 18:37:53.60009] E [rdma.c:4789:init] testdir-client-0:
>> Failed to initialize IB Device
>> [2010-11-30 18:37:53.60030] E [rpc-transport.c:971:rpc_transport_load]
>> rpc-transport: 'rdma' initialization failed
>>
>> On the client, I see:
>> [2010-11-30 18:43:49.653469] W [io-stats.c:1644:init] testdir:
>> dangling volume. check volfile
>> [2010-11-30 18:43:49.653573] W [dict.c:1204:data_to_str] dict: @data=(nil)
>> [2010-11-30 18:43:49.653607] W [dict.c:1204:data_to_str] dict: @data=(nil)
>> [2010-11-30 18:43:49.736275] E [rdma.c:2066:rdma_create_cq]
>> rpc-transport/rdma: testdir-client-0: creation of send_cq failed
>> [2010-11-30 18:43:49.736651] E [rdma.c:3771:rdma_get_device]
>> rpc-transport/rdma: testdir-client-0: could not create CQ
>> [2010-11-30 18:43:49.736689] E [rdma.c:3957:rdma_init]
>> rpc-transport/rdma: could not create rdma device for mthca0
>> [2010-11-30 18:43:49.736805] E [rdma.c:4789:init] testdir-client-0:
>> Failed to initialize IB Device
>> [2010-11-30 18:43:49.736841] E
>> [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma'
>> initialization failed
>>
>> This results in an unsuccessful mount.
>>
>> I created the mount using the following commands:
>> /usr/local/glusterfs/3.1.1/sbin/gluster volume create testdir
>> transport rdma submit-1:/exports
>> /usr/local/glusterfs/3.1.1/sbin/gluster volume start testdir
>>
>> To mount the directory, I use:
>> mount -t glusterfs submit-1:/testdir /mnt/glusterfs
>>
>> I don't think it is an Infiniband problem since GlusterFS 3.0.6 and
>> GlusterFS 3.1.0 worked on the same systems. For GlusterFS 3.1.0, the
>> commands listed above produced no error messages.
>>
>> If anyone can provide help with debugging these error messages, it
>> would be appreciated.
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>



More information about the Gluster-users mailing list