[Gluster-users] Crashing applications, RDMA_ERROR in logs

Anatoliy Dmytriyev tolid at tolid.eu.org
Fri May 4 11:43:45 UTC 2018

Hello gluster users and professionals,

We are running gluster 3.10.10 distributed volume (9 nodes) using RDMA 

 From time to time applications crash with I/O errors (can't access file) 
and in the client logs we can see messages like:

[2018-05-04 10:00:43.467490] W [MSGID: 114031] 
[client-rpc-fops.c:2640:client3_3_readdirp_cbk] 0-gv0-client-2: remote 
operation failed [Transport endpoint is not connected]
[2018-05-04 10:00:43.467585] W [MSGID: 103046] 
[rdma.c:3603:gf_rdma_decode_header] 0-rpc-transport/rdma: received a msg 
of type RDMA_ERROR
[2018-05-04 10:00:43.467601] W [MSGID: 103046] 
[rdma.c:4055:gf_rdma_process_recv] 0-rpc-transport/rdma: peer 
(, couldn't encode or decode the msg properly or 
write chunks were not provided for replies that were bigger than 

At the same time on gluster nodes in brick logs:
[2018-05-04 10:00:43.468470] W [MSGID: 103027] 
[rdma.c:2498:__gf_rdma_send_reply_type_nomsg] 0-rpc-transport/rdma: 
encoding write chunks failed

The gluster volume is mounted with options 

The same applications run perfectly on not gluster FS. Could you please 
help to debug and fix this?

# gluster volume status gv0
Status of volume: gv0
Gluster process                             TCP Port  RDMA Port  Online  
Brick cn01-ib:/gfs/gv0/brick1/brick         0         49152      Y       
Brick cn02-ib:/gfs/gv0/brick1/brick         0         49152      Y       
Brick cn03-ib:/gfs/gv0/brick1/brick         0         49152      Y       
Brick cn04-ib:/gfs/gv0/brick1/brick         0         49152      Y       
Brick cn05-ib:/gfs/gv0/brick1/brick         0         49152      Y       
Brick cn06-ib:/gfs/gv0/brick1/brick         0         49152      Y       
Brick cn07-ib:/gfs/gv0/brick1/brick         0         49152      Y       
Brick cn08-ib:/gfs/gv0/brick1/brick         0         49152      Y       
Brick cn09-ib:/gfs/gv0/brick1/brick         0         49152      Y       

Task Status of Volume gv0
There are no active volume tasks

# gluster volume info gv0

Volume Name: gv0
Type: Distribute
Volume ID: 5ee4b6a4-b8d2-4795-919f-c992b95d6221
Status: Started
Snapshot Count: 0
Number of Bricks: 9
Transport-type: rdma
Brick1: cn01-ib:/gfs/gv0/brick1/brick
Brick2: cn02-ib:/gfs/gv0/brick1/brick
Brick3: cn03-ib:/gfs/gv0/brick1/brick
Brick4: cn04-ib:/gfs/gv0/brick1/brick
Brick5: cn05-ib:/gfs/gv0/brick1/brick
Brick6: cn06-ib:/gfs/gv0/brick1/brick
Brick7: cn07-ib:/gfs/gv0/brick1/brick
Brick8: cn08-ib:/gfs/gv0/brick1/brick
Brick9: cn09-ib:/gfs/gv0/brick1/brick
Options Reconfigured:
performance.cache-size: 1GB
server.event-threads: 8
client.event-threads: 8
cluster.nufa: on
performance.readdir-ahead: on
performance.parallel-readdir: on
nfs.disable: on

Best regards,

