[Bugs] [Bug 1272872] New: nfs-ganesha: the nfs-ganesha server is not responding even though the server is alive

Mon Oct 19 06:27:21 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1272872

            Bug ID: 1272872
           Summary: nfs-ganesha: the nfs-ganesha server is not responding
                    even though the server is alive
           Product: GlusterFS
           Version: 3.7.5
         Component: ganesha-nfs
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: saujain at redhat.com

Description of problem:
I had a testsetup with nfs-ganesha running on 4 nodes with HA capabilities. 
Now, I start I/O from one of the nodes and finds that I/O is moving ahead after
sometime. This is because the nfs-ganesha process is not responding back,
whereas the server process is still running and hence the failover is also not
happening.

Altogether the I/O gets stuck.

Version-Release number of selected component (if applicable):
glusterfs-3.7.5-1.el7.x86_64
nfs-ganesha-2.3-0.rc6.el7.centos.x86_64

How reproducible:
happens in first instance of execution 

Steps to Reproduce:
1. setup a 4 node cluster of glusterfs and 4 node nfs-ganesha front end
2. mount the volume over nfs-ganesha with vers=4 on a client
3. start executing arequal tool on the mount-point

Actual results:
step 3 result,
the I/O is stuck and nfs-ganesha is not responding,
even a showmount on the nfs-ganesha server results in rpc-timeout,
# showmount -e localhost
rpc mount export: RPC: Timed out

the strace on nfs-ganesha does not move beyond to display the calls,
# strace -p 19773
Process 19773 attached
futex(0x7efde8b819d0, FUTEX_WAIT, 19803, NULL^CProcess 19773 detached
 <detached ...>

The ganesha.log says,
17/10/2015 01:48:58 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[main]
nfs_Start_threads :THREAD :EVENT :General fridge was started successfully
17/10/2015 01:48:58 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[main] nfs_start
:NFS STARTUP :EVENT :-------------------------------------------------
17/10/2015 01:48:58 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[main] nfs_start
:NFS STARTUP :EVENT :             NFS SERVER INITIALIZED
17/10/2015 01:48:58 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[main] nfs_start
:NFS STARTUP :EVENT :-------------------------------------------------
17/10/2015 01:49:58 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[reaper]
nfs_in_grace :STATE :EVENT :NFS Server Now NOT IN GRACE
17/10/2015 01:52:01 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[dbus_heartbeat]
glusterfs_create_export :FSAL :EVENT :Volume vol2 exported at : '/'
17/10/2015 03:49:17 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[dbus_heartbeat]
dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending
heartbeat
17/10/2015 03:50:33 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[dbus_heartbeat]
dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending
heartbeat
17/10/2015 03:53:48 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[dbus_heartbeat]
dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending
heartbeat
17/10/2015 03:55:03 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[dbus_heartbeat]
dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending
heartbeat

Finally the failover does not happen and the I/O does not move ahead

Expected results:
nfs-ganesha should respond back or get killed, as in case of process not
running, HA capabilities can be used and I/O shall move ahead

Additional info:

-- 
You are receiving this mail because:
You are the assignee for the bug.