[Bugs] [Bug 1387767] New: nfs hangs / rpc issues
bugzilla at redhat.com
bugzilla at redhat.com
Fri Oct 21 19:56:23 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1387767
Bug ID: 1387767
Summary: nfs hangs / rpc issues
Product: GlusterFS
Version: 3.7.11
Component: nfs
Assignee: bugs at gluster.org
Reporter: jbarfknecht at verisign.com
CC: bugs at gluster.org
Description of problem:
gluster nfs.v3 mount unpredictably stops responding for a client. They receive
the following in their /var/log/messages
Oct 21 19:17:25 testvol kernel: nfs: server x.x.x.x not responding, timed out
Oct 21 19:17:28 testvol kernel: nfs: server x.x.x.x not responding, timed out
Oct 21 19:17:31 testvol kernel: nfs: server x.x.x.x not responding, timed out
The Gluster server only shows the following error messages in the nfs.log:
[2016-10-21 19:17:44.843790] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0xe100e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.843806] E [MSGID: 112074]
[nfs3.c:615:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed
[2016-10-21 19:17:44.843919] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x8301e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.844055] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x3c01e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.844174] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x5201e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.844268] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x3401e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.844334] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x2a01e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.844393] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x8e01e094, Program: NFS3,
ProgVers: 3, Proc: 1) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.844438] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x8f01e094, Program: NFS3,
ProgVers: 3, Proc: 1) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:45.051784] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x3c01e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:45.052042] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x8301e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:45.052202] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x3401e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:45.052356] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x5201e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
volume info:
Volume Name: testvol
Type: Distribute
Volume ID: 1a149875-b248-4330-ae70-0238820d7bad
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.171.156.220:/gluster/testvol/brick1
Options Reconfigured:
performance.readdir-ahead: on
performance.quick-read: on
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
auth.allow: x.x.x.x
nfs.disable: off
nfs.addr-namelookup: off
nfs.acl: off
nfs.rpc-auth-allow: x.x.x.x
nfs.trusted-sync: on
When this happens for an extended amount of time the client is unable to keep
the share mounted, and eventually the client's application locks up. This only
happens with one volume on this system all others are able to access the share
at the time of these events. This client is more active than others, but the
system is not under a heavy load (always about 75% CPU idle, 50% free RAM, disk
IO rarely raises above 20%) Network connectivity has been ruled out by my
network team as well. I gave them a new share with a single brick to rule out
a lot of other possibilities.
Version-Release number of selected component (if applicable):
Gluster 3.7.11-2 Running on CentOS 7.1.1503
Client is RHEL 6u6
How reproducible:
Unpredictable on my side, but always predictable with this host (just a matter
of time)
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
NFS session should stay established
Additional info:
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list