[Gluster-users] RDMA inline threshold?

Dan Lavu dan at redhat.com
Wed May 30 00:47:37 UTC 2018


Stefan,

Sounds like a brick process is not running. I have notice some strangeness
in my lab when using RDMA, I often have to forcibly restart the brick
process, often as in every single time I do a major operation, add a new
volume, remove a volume, stop a volume, etc.

gluster volume status <vol>

Does any of the self heal daemons show N/A? If that's the case, try forcing
a restart on the volume.

gluster volume start <vol> force

This will also explain why your volumes aren't being replicated properly.

On Tue, May 29, 2018 at 5:20 PM, Stefan Solbrig <stefan.solbrig at ur.de>
wrote:

> Dear all,
>
> I faced a problem with a glusterfs volume (pure distributed, _not_
> dispersed) over RDMA transport.  One user had a directory with a large
> number of files (50,000 files) and just doing an "ls" in this directory
> yields a "Transport endpoint not connected" error. The effect is, that "ls"
> only shows some files, but not all.
>
> The respective log file shows this error message:
>
> [2018-05-20 20:38:25.114978] W [MSGID: 114031] [client-rpc-fops.c:2578:client3_3_readdirp_cbk]
> 0-glurch-client-0: remote operation failed [Transport endpoint is not
> connected]
> [2018-05-20 20:38:27.732796] W [MSGID: 103046]
> [rdma.c:4089:gf_rdma_process_recv] 0-rpc-transport/rdma: peer (
> 10.100.245.18:49153), couldn't encode or decode the msg properly or write
> chunks were not provided for replies that were bigger than
> RDMA_INLINE_THRESHOLD (2048)
> [2018-05-20 20:38:27.732844] W [MSGID: 114031] [client-rpc-fops.c:2578:client3_3_readdirp_cbk]
> 0-glurch-client-3: remote operation failed [Transport endpoint is not
> connected]
> [2018-05-20 20:38:27.733181] W [fuse-bridge.c:2897:fuse_readdirp_cbk]
> 0-glusterfs-fuse: 72882828: READDIRP => -1 (Transport endpoint is not
> connected)
>
> I already set the memlock limit for glusterd to unlimited, but the problem
> persists.
>
> Only going from RDMA transport to TCP transport solved the problem.  (I'm
> running the volume now in mixed mode, config.transport=tcp,rdma).  Mounting
> with transport=rdma shows this error, mouting with transport=tcp is fine.
>
> however, this problem does not arise on all large directories, not on all.
> I didn't recognize a pattern yet.
>
> I'm using glusterfs v3.12.6 on the servers, QDR Infiniband HCAs .
>
> Is this a known issue with RDMA transport?
>
> best wishes,
> Stefan
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180529/8cd931fc/attachment.html>


More information about the Gluster-users mailing list