[Gluster-users] Crashing applications, RDMA_ERROR in logs
Anatoliy Dmytriyev
tolid at tolid.eu.org
Fri May 4 11:43:45 UTC 2018
Hello gluster users and professionals,
We are running gluster 3.10.10 distributed volume (9 nodes) using RDMA
transport.
From time to time applications crash with I/O errors (can't access file)
and in the client logs we can see messages like:
[2018-05-04 10:00:43.467490] W [MSGID: 114031]
[client-rpc-fops.c:2640:client3_3_readdirp_cbk] 0-gv0-client-2: remote
operation failed [Transport endpoint is not connected]
[2018-05-04 10:00:43.467585] W [MSGID: 103046]
[rdma.c:3603:gf_rdma_decode_header] 0-rpc-transport/rdma: received a msg
of type RDMA_ERROR
[2018-05-04 10:00:43.467601] W [MSGID: 103046]
[rdma.c:4055:gf_rdma_process_recv] 0-rpc-transport/rdma: peer
(192.168.2.104:49152), couldn't encode or decode the msg properly or
write chunks were not provided for replies that were bigger than
RDMA_INLINE_THRESHOLD (2048)
At the same time on gluster nodes in brick logs:
[2018-05-04 10:00:43.468470] W [MSGID: 103027]
[rdma.c:2498:__gf_rdma_send_reply_type_nomsg] 0-rpc-transport/rdma:
encoding write chunks failed
The gluster volume is mounted with options
"backupvolfile-server=cn03-ib,transport=rdma,log-level=WARNING"
The same applications run perfectly on not gluster FS. Could you please
help to debug and fix this?
# gluster volume status gv0
Status of volume: gv0
Gluster process TCP Port RDMA Port Online
Pid
------------------------------------------------------------------------------
Brick cn01-ib:/gfs/gv0/brick1/brick 0 49152 Y
3984
Brick cn02-ib:/gfs/gv0/brick1/brick 0 49152 Y
3352
Brick cn03-ib:/gfs/gv0/brick1/brick 0 49152 Y
3333
Brick cn04-ib:/gfs/gv0/brick1/brick 0 49152 Y
3079
Brick cn05-ib:/gfs/gv0/brick1/brick 0 49152 Y
3093
Brick cn06-ib:/gfs/gv0/brick1/brick 0 49152 Y
3148
Brick cn07-ib:/gfs/gv0/brick1/brick 0 49152 Y
2995
Brick cn08-ib:/gfs/gv0/brick1/brick 0 49152 Y
3107
Brick cn09-ib:/gfs/gv0/brick1/brick 0 49152 Y
3014
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks
# gluster volume info gv0
Volume Name: gv0
Type: Distribute
Volume ID: 5ee4b6a4-b8d2-4795-919f-c992b95d6221
Status: Started
Snapshot Count: 0
Number of Bricks: 9
Transport-type: rdma
Bricks:
Brick1: cn01-ib:/gfs/gv0/brick1/brick
Brick2: cn02-ib:/gfs/gv0/brick1/brick
Brick3: cn03-ib:/gfs/gv0/brick1/brick
Brick4: cn04-ib:/gfs/gv0/brick1/brick
Brick5: cn05-ib:/gfs/gv0/brick1/brick
Brick6: cn06-ib:/gfs/gv0/brick1/brick
Brick7: cn07-ib:/gfs/gv0/brick1/brick
Brick8: cn08-ib:/gfs/gv0/brick1/brick
Brick9: cn09-ib:/gfs/gv0/brick1/brick
Options Reconfigured:
performance.cache-size: 1GB
server.event-threads: 8
client.event-threads: 8
cluster.nufa: on
performance.readdir-ahead: on
performance.parallel-readdir: on
nfs.disable: on
--
Best regards,
Anatoliy
More information about the Gluster-users
mailing list