[Bugs] [Bug 1387767] New: nfs hangs / rpc issues

bugzilla at redhat.com bugzilla at redhat.com
Fri Oct 21 19:56:23 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1387767

            Bug ID: 1387767
           Summary: nfs hangs / rpc issues
           Product: GlusterFS
           Version: 3.7.11
         Component: nfs
          Assignee: bugs at gluster.org
          Reporter: jbarfknecht at verisign.com
                CC: bugs at gluster.org



Description of problem:
gluster nfs.v3 mount unpredictably stops responding for a client. They receive
the following in their /var/log/messages


Oct 21 19:17:25 testvol kernel: nfs: server x.x.x.x not responding, timed out
Oct 21 19:17:28 testvol kernel: nfs: server x.x.x.x not responding, timed out
Oct 21 19:17:31 testvol kernel: nfs: server x.x.x.x not responding, timed out

The Gluster server only shows the following error messages in the nfs.log:

[2016-10-21 19:17:44.843790] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0xe100e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.843806] E [MSGID: 112074]
[nfs3.c:615:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed
[2016-10-21 19:17:44.843919] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x8301e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.844055] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x3c01e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.844174] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x5201e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.844268] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x3401e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.844334] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x2a01e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.844393] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x8e01e094, Program: NFS3,
ProgVers: 3, Proc: 1) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.844438] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x8f01e094, Program: NFS3,
ProgVers: 3, Proc: 1) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:45.051784] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x3c01e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:45.052042] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x8301e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:45.052202] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x3401e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:45.052356] E [rpcsvc.c:1314:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x5201e094, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)

volume info:
Volume Name: testvol
Type: Distribute
Volume ID: 1a149875-b248-4330-ae70-0238820d7bad
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.171.156.220:/gluster/testvol/brick1
Options Reconfigured:
performance.readdir-ahead: on
performance.quick-read: on
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
auth.allow: x.x.x.x
nfs.disable: off
nfs.addr-namelookup: off
nfs.acl: off
nfs.rpc-auth-allow: x.x.x.x
nfs.trusted-sync: on



When this happens for an extended amount of time the client is unable to keep
the share mounted, and eventually the client's application locks up.  This only
happens with one volume on this system all others are able to access the share
at the time of these events.  This client is more active than others, but the
system is not under a heavy load (always about 75% CPU idle, 50% free RAM, disk
IO rarely raises above 20%)  Network connectivity has been ruled out by my
network team as well.  I gave them a new share with a single brick to rule out
a lot of other possibilities.

Version-Release number of selected component (if applicable):
Gluster 3.7.11-2 Running on CentOS 7.1.1503
Client is RHEL 6u6

How reproducible:

Unpredictable on my side, but always predictable with this host (just a matter
of time)

Steps to Reproduce:
1. 
2. 
3.

Actual results:


Expected results:
NFS session should stay established

Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list