[Gluster-users] RDMA Client Hang Problem
rgowdapp at redhat.com
Thu Apr 26 01:14:57 UTC 2018
+Amar, +Rafi - Other maintainers and Peers of transport/rdma
* Can you attach logs from client and brick? Please set
diagnostics.client-log-level and diagnostics.brick-log-level to TRACE
before starting your tests.
* Does fuse client recover from hang?
I think we might not be handling the poll_err path correctly. The fact that
we see issues only after brick reboots we are seeing the issues, makes me
suspect the error path.
On Wed, Apr 25, 2018 at 6:05 PM, Necati E. SISECI <siseci at gmail.com> wrote:
> Thank you for your mail.
> ibv_rc_pingpong seems working between servers and client. Also udaddy,
> ucmatose, rping etc are working.
> root at gluster1:~# ibv_rc_pingpong -d mlx5_0 -g 0
> local address: LID 0x0000, QPN 0x0001e4, PSN 0x10090e, GID
> remote address: LID 0x0000, QPN 0x00014c, PSN 0x09402b, GID
> 8192000 bytes in 0.01 seconds = 7964.03 Mbit/sec
> 1000 iters in 0.01 seconds = 8.23 usec/iter
> root at cinder:~# ibv_rc_pingpong -g 0 -d mlx5_0 gluster1
> local address: LID 0x0000, QPN 0x00014c, PSN 0x09402b, GID
> remote address: LID 0x0000, QPN 0x0001e4, PSN 0x10090e, GID
> 8192000 bytes in 0.01 seconds = 8424.73 Mbit/sec
> 1000 iters in 0.01 seconds = 7.78 usec/iter
> Thank you.
> On 25-04-2018 12:27, Raghavendra Gowdappa wrote:
> Is infiniband itself working fine? You can run tools like ibv_rc_pingpong
> to find out.
> On Wed, Apr 25, 2018 at 12:23 PM, Necati E. SISECI <siseci at gmail.com>
>> Dear Gluster-Users,
>> I am experiencing RDMA problems.
>> I have installed Ubuntu 16.04.4 running with 4.15.0-13-generic kernel,
>> MLNX_OFED_LINUX-4.3-18.104.22.168-ubuntu16.04-x86_64 to 4 different servers.
>> All of them has Mellanox ConnectX-4 LX dual port NICs. These four servers
>> are connected via Mellanox SN2100 Switch.
>> I have installed GlusterFS Server v3.10 (from Ubuntu PPA) to 3 servers.
>> These 3 boxes are running as gluster cluster. Additionally, I have
>> installed Glusterfs Client to the last one.
>> I have created Gluster Volume with this command:
>> # gluster volume create db transport rdma replica 3 arbiter 1
>> gluster1:/storage/db/ gluster2:/storage/db/ cinder:/storage/db force
>> (network.ping-timeout is 3)
>> Then I have mounted this volume using mount command below.
>> mount -t glusterfs -o transport=rdma gluster1:/db /db
>> After mountings "/db", I can access the files.
>> The problem is, when I reboot one of the cluster nodes, fuse client gives
>> this error below and hangs.
>> [2018-04-17 07:42:55.506422] W [MSGID: 103070]
>> 0-rpc-transport/rdma: *send work request on `mlx5_0' returned error
>> wc.status = 5, wc.vendor_err = 245, post->buf = 0x7f8b92016000, wc.byte_len
>> = 0, post->reused = 135*
>> When I change transport mode from rdma to tcp, fuse client works well. No
>> I also tried Gluster 3.8, 3.10, 4.0.0 and 4.0.1 (from Ubuntu PPAs) on
>> Ubuntu 16.04.4 and Centos 7.4. But results were the same.
>> Thank you.
>> Gluster-users mailing list
>> Gluster-users at gluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gluster-users