[Gluster-users] RDMA Client Hang Problem
Necati E. SISECI
siseci at gmail.com
Wed Apr 25 12:35:11 UTC 2018
Thank you for your mail.
ibv_rc_pingpong seems working between servers and client. Also udaddy,
ucmatose, rping etc are working.
root at gluster1:~# ibv_rc_pingpong -d mlx5_0 -g 0
local address: LID 0x0000, QPN 0x0001e4, PSN 0x10090e, GID
fe80::ee0d:9aff:fec0:1dc8
remote address: LID 0x0000, QPN 0x00014c, PSN 0x09402b, GID
fe80::ee0d:9aff:fec0:1b14
8192000 bytes in 0.01 seconds = 7964.03 Mbit/sec
1000 iters in 0.01 seconds = 8.23 usec/iter
root at cinder:~# ibv_rc_pingpong -g 0 -d mlx5_0 gluster1
local address: LID 0x0000, QPN 0x00014c, PSN 0x09402b, GID
fe80::ee0d:9aff:fec0:1b14
remote address: LID 0x0000, QPN 0x0001e4, PSN 0x10090e, GID
fe80::ee0d:9aff:fec0:1dc8
8192000 bytes in 0.01 seconds = 8424.73 Mbit/sec
1000 iters in 0.01 seconds = 7.78 usec/iter
Thank you.
Necati.
On 25-04-2018 12:27, Raghavendra Gowdappa wrote:
> Is infiniband itself working fine? You can run tools like
> ibv_rc_pingpong to find out.
>
> On Wed, Apr 25, 2018 at 12:23 PM, Necati E. SISECI <siseci at gmail.com
> <mailto:siseci at gmail.com>> wrote:
>
> Dear Gluster-Users,
>
> I am experiencing RDMA problems.
>
> I have installed Ubuntu 16.04.4 running with 4.15.0-13-generic
> kernel, MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64 to 4
> different servers. All of them has Mellanox ConnectX-4 LX dual
> port NICs. These four servers are connected via Mellanox SN2100
> Switch.
>
> I have installed GlusterFS Server v3.10 (from Ubuntu PPA) to 3
> servers. These 3 boxes are running as gluster cluster.
> Additionally, I have installed Glusterfs Client to the last one.
>
> I have created Gluster Volume with this command:
>
> # gluster volume create db transport rdma replica 3 arbiter 1
> gluster1:/storage/db/ gluster2:/storage/db/ cinder:/storage/db force
>
> (network.ping-timeout is 3)
>
> Then I have mounted this volume using mount command below.
>
> mount -t glusterfs -o transport=rdma gluster1:/db /db
>
> After mountings "/db", I can access the files.
>
> The problem is, when I reboot one of the cluster nodes, fuse
> client gives this error below and hangs.
>
> [2018-04-17 07:42:55.506422] W [MSGID: 103070]
> [rdma.c:4284:gf_rdma_handle_failed_send_completion]
> 0-rpc-transport/rdma: *send work request on `mlx5_0' returned
> error wc.status = 5, wc.vendor_err = 245, post->buf =
> 0x7f8b92016000, wc.byte_len = 0, post->reused = 135*
>
> When I change transport mode from rdma to tcp, fuse client works
> well. No hangs.
>
> I also tried Gluster 3.8, 3.10, 4.0.0 and 4.0.1 (from Ubuntu PPAs)
> on Ubuntu 16.04.4 and Centos 7.4. But results were the same.
>
> Thank you.
>
> Necati.
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> http://lists.gluster.org/mailman/listinfo/gluster-users
> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180425/d7f2e4ec/attachment.html>
More information about the Gluster-users
mailing list