[Gluster-users] RDMA Client Hang Problem

Wed Apr 25 06:53:50 UTC 2018

Dear Gluster-Users,

I am experiencing RDMA problems.

I have installed Ubuntu 16.04.4 running with 4.15.0-13-generic kernel, 
MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64 to 4 different servers. 
All of them has Mellanox ConnectX-4 LX dual port NICs. These four 
servers are connected via Mellanox SN2100 Switch.

I have installed GlusterFS Server v3.10 (from Ubuntu PPA) to 3 servers. 
These 3 boxes are running as gluster cluster. Additionally, I have 
installed Glusterfs Client to the last one.

I have created Gluster Volume with this command:

# gluster volume create db transport rdma replica 3 arbiter 1 
gluster1:/storage/db/ gluster2:/storage/db/ cinder:/storage/db force

(network.ping-timeout is 3)

Then I have mounted this volume using mount command below.

mount -t glusterfs -o transport=rdma gluster1:/db /db

After mountings "/db", I can access the files.

The problem is, when I reboot one of the cluster nodes, fuse client 
gives this error below and hangs.

[2018-04-17 07:42:55.506422] W [MSGID: 103070] 
[rdma.c:4284:gf_rdma_handle_failed_send_completion] 
0-rpc-transport/rdma: *send work request on `mlx5_0' returned error 
wc.status = 5, wc.vendor_err = 245, post->buf = 0x7f8b92016000, 
wc.byte_len = 0, post->reused = 135*

When I change transport mode from rdma to tcp, fuse client works well. 
No hangs.

I also tried Gluster 3.8, 3.10, 4.0.0 and 4.0.1 (from Ubuntu PPAs) on 
Ubuntu 16.04.4 and Centos 7.4. But results were the same.

Thank you.

Necati.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180425/30df3475/attachment.html>