[Bugs] [Bug 1532842] New: Large directories in disperse volumes with rdma transport can' t be accessed with ls

bugzilla at redhat.com bugzilla at redhat.com
Tue Jan 9 21:42:42 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1532842

            Bug ID: 1532842
           Summary: Large directories in disperse volumes with rdma
                    transport can't be accessed with ls
           Product: GlusterFS
           Version: 3.13
         Component: rdma
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: shane at axiomalaska.com
                CC: bugs at gluster.org



Created attachment 1379248
  --> https://bugzilla.redhat.com/attachment.cgi?id=1379248&action=edit
Script to replicate disperse rdma bug

Description of problem:

In disperse volumes with rdma transport, large directories (containing >= 617
files) can't be listed with `ls`. Attempts to do so result in a "Transport
endpoint is not connected" error, and the following log messages appear in the
mount log:

[2018-01-09 21:33:15.186370] W [MSGID: 103046]
[rdma.c:3604:gf_rdma_decode_header] 0-rpc-transport/rdma: received a msg of
type RDMA_ERROR
[2018-01-09 21:33:15.186411] W [MSGID: 103046]
[rdma.c:4057:gf_rdma_process_recv] 0-rpc-transport/rdma: peer
(10.4.1.60:49152), couldn't encode or decode the msg properly or write chunks
were not provided for replies that were bigger than RDMA_INLINE_THRESHOLD
(2048)
[2018-01-09 21:33:15.186435] W [MSGID: 114031]
[client-rpc-fops.c:2577:client3_3_readdirp_cbk] 0-erasure-client-0: remote
operation failed [Transport endpoint is not connected]
[2018-01-09 21:33:15.186503] W [fuse-bridge.c:2897:fuse_readdirp_cbk]
0-glusterfs-fuse: 74631173: READDIRP => -1 (Transport endpoint is not
connected)

Repeated attempts to ls the directory will cause different peers in the cluster
to be identified in the log message, indicating that the problem is not with a
misconfigured peer.

Files in the problem directories can be accessed directly as normal (ls, cat,
etc work fine on full file paths within the large directories).

Changing the transport type of the disperse volume to tcp and restarting the
volume allows the problem directories to be accessed. The issue also does not
occur with distributed volumes, only disperse.

Version-Release number of selected component (if applicable):

3.13.1

How reproducible:

Extremely.

Steps to Reproduce:

General approach outlined here. See attached gluster-disperse-rdma-bug.sh for
working script to reproduce bug.

1. Create and start disperse volume with rdma transport
2. Mount disperse volume
3. Create directory in mounted disperse volume and create 616 empty files
4. Verify that the directory can be accessed with ls
5. Create the 617th file in the test directory
6. Verify that the directory can no longer be accessed with ls


Actual results:

Large directory cannot be accessed with ls

Expected results:

Large directory should be accessible with ls

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list