[Gluster-users] RDMA Client Hang Problem

Raghavendra Gowdappa rgowdapp at redhat.com
Thu Apr 26 01:14:57 UTC 2018


+Amar, +Rafi - Other maintainers and Peers of transport/rdma

* Can you attach logs from client and brick? Please set
diagnostics.client-log-level and diagnostics.brick-log-level to TRACE
before starting your tests.
* Does fuse client recover from hang?

I think we might not be handling the poll_err path correctly. The fact that
we see issues only after brick reboots we are seeing the issues, makes me
suspect the error path.

regards,
Raghavendra

On Wed, Apr 25, 2018 at 6:05 PM, Necati E. SISECI <siseci at gmail.com> wrote:

> Thank you for your mail.
>
> ibv_rc_pingpong seems working between servers and client. Also udaddy,
> ucmatose, rping etc are working.
>
> root at gluster1:~# ibv_rc_pingpong -d mlx5_0 -g 0
>   local address:  LID 0x0000, QPN 0x0001e4, PSN 0x10090e, GID
> fe80::ee0d:9aff:fec0:1dc8
>   remote address: LID 0x0000, QPN 0x00014c, PSN 0x09402b, GID
> fe80::ee0d:9aff:fec0:1b14
> 8192000 bytes in 0.01 seconds = 7964.03 Mbit/sec
> 1000 iters in 0.01 seconds = 8.23 usec/iter
>
> root at cinder:~# ibv_rc_pingpong -g 0 -d mlx5_0 gluster1
>   local address:  LID 0x0000, QPN 0x00014c, PSN 0x09402b, GID
> fe80::ee0d:9aff:fec0:1b14
>   remote address: LID 0x0000, QPN 0x0001e4, PSN 0x10090e, GID
> fe80::ee0d:9aff:fec0:1dc8
> 8192000 bytes in 0.01 seconds = 8424.73 Mbit/sec
> 1000 iters in 0.01 seconds = 7.78 usec/iter
>
>
> Thank you.
>
> Necati.
>
>
> On 25-04-2018 12:27, Raghavendra Gowdappa wrote:
>
> Is infiniband itself working fine? You can run tools like ibv_rc_pingpong
> to find out.
>
> On Wed, Apr 25, 2018 at 12:23 PM, Necati E. SISECI <siseci at gmail.com>
> wrote:
>
>> Dear Gluster-Users,
>>
>> I am experiencing RDMA problems.
>>
>> I have installed Ubuntu 16.04.4 running with 4.15.0-13-generic kernel,
>> MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64 to 4 different servers.
>> All of them has Mellanox ConnectX-4 LX dual port NICs. These four servers
>> are connected via Mellanox SN2100 Switch.
>>
>> I have installed GlusterFS Server v3.10 (from Ubuntu PPA) to 3 servers.
>> These 3 boxes are running as gluster cluster. Additionally, I have
>> installed Glusterfs Client to the last one.
>>
>> I have created Gluster Volume with this command:
>>
>> # gluster volume create db transport rdma replica 3 arbiter 1
>> gluster1:/storage/db/ gluster2:/storage/db/ cinder:/storage/db force
>>
>> (network.ping-timeout is 3)
>>
>> Then I have mounted this volume using mount command below.
>>
>> mount -t glusterfs -o transport=rdma gluster1:/db /db
>>
>> After mountings "/db", I can access the files.
>>
>> The problem is, when I reboot one of the cluster nodes, fuse client gives
>> this error below and hangs.
>>
>> [2018-04-17 07:42:55.506422] W [MSGID: 103070]
>> [rdma.c:4284:gf_rdma_handle_failed_send_completion]
>> 0-rpc-transport/rdma: *send work request on `mlx5_0' returned error
>> wc.status = 5, wc.vendor_err = 245, post->buf = 0x7f8b92016000, wc.byte_len
>> = 0, post->reused = 135*
>>
>> When I change transport mode from rdma to tcp, fuse client works well. No
>> hangs.
>>
>> I also tried Gluster 3.8, 3.10, 4.0.0 and 4.0.1 (from Ubuntu PPAs) on
>> Ubuntu 16.04.4 and Centos 7.4. But results were the same.
>>
>> Thank you.
>> Necati.
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180426/581d5a31/attachment.html>


More information about the Gluster-users mailing list