[Gluster-users] RDMA inline threshold?
dan at redhat.com
Wed May 30 00:47:37 UTC 2018
Sounds like a brick process is not running. I have notice some strangeness
in my lab when using RDMA, I often have to forcibly restart the brick
process, often as in every single time I do a major operation, add a new
volume, remove a volume, stop a volume, etc.
gluster volume status <vol>
Does any of the self heal daemons show N/A? If that's the case, try forcing
a restart on the volume.
gluster volume start <vol> force
This will also explain why your volumes aren't being replicated properly.
On Tue, May 29, 2018 at 5:20 PM, Stefan Solbrig <stefan.solbrig at ur.de>
> Dear all,
> I faced a problem with a glusterfs volume (pure distributed, _not_
> dispersed) over RDMA transport. One user had a directory with a large
> number of files (50,000 files) and just doing an "ls" in this directory
> yields a "Transport endpoint not connected" error. The effect is, that "ls"
> only shows some files, but not all.
> The respective log file shows this error message:
> [2018-05-20 20:38:25.114978] W [MSGID: 114031] [client-rpc-fops.c:2578:client3_3_readdirp_cbk]
> 0-glurch-client-0: remote operation failed [Transport endpoint is not
> [2018-05-20 20:38:27.732796] W [MSGID: 103046]
> [rdma.c:4089:gf_rdma_process_recv] 0-rpc-transport/rdma: peer (
> 10.100.245.18:49153), couldn't encode or decode the msg properly or write
> chunks were not provided for replies that were bigger than
> RDMA_INLINE_THRESHOLD (2048)
> [2018-05-20 20:38:27.732844] W [MSGID: 114031] [client-rpc-fops.c:2578:client3_3_readdirp_cbk]
> 0-glurch-client-3: remote operation failed [Transport endpoint is not
> [2018-05-20 20:38:27.733181] W [fuse-bridge.c:2897:fuse_readdirp_cbk]
> 0-glusterfs-fuse: 72882828: READDIRP => -1 (Transport endpoint is not
> I already set the memlock limit for glusterd to unlimited, but the problem
> Only going from RDMA transport to TCP transport solved the problem. (I'm
> running the volume now in mixed mode, config.transport=tcp,rdma). Mounting
> with transport=rdma shows this error, mouting with transport=tcp is fine.
> however, this problem does not arise on all large directories, not on all.
> I didn't recognize a pattern yet.
> I'm using glusterfs v3.12.6 on the servers, QDR Infiniband HCAs .
> Is this a known issue with RDMA transport?
> best wishes,
> Gluster-users mailing list
> Gluster-users at gluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gluster-users