[Gluster-users] NFS to Gluster Hangs

Santosh Pradhan spradhan at redhat.com
Wed Jun 11 06:05:37 UTC 2014


These may be helpful while HANG is seen.

1. Was there a lot of I/O going on before you notice the hang?
2. Please capture the nfsstat -c (in NFS client machine) output, to 
check if there is any RPC
     retransmission.
3. Gluster NFS server log.
4. 'top' command output for NFS server process, to check the memory 
footprint of the server. (If this is the case, may turning off the DRC 
would help as mentioned by Niels. ).

Best R,
Santosh

On 06/11/2014 02:34 AM, Gene Liverman wrote:
>
> No firewalls in this case...
>
> --
> Gene Liverman
> Systems Administrator
> Information Technology Services
> University of West Georgia
> gliverma at westga.edu <mailto:gliverma at westga.edu>
>
> On Jun 10, 2014 12:57 PM, "Paul Robert Marino" <prmarino1 at gmail.com 
> <mailto:prmarino1 at gmail.com>> wrote:
>
>     Ive also seen this happen when there is a firewall in the middle and
>     nfslockd malfunctioned because of it.
>
>
>     On Tue, Jun 10, 2014 at 12:20 PM, Gene Liverman
>     <gliverma at westga.edu <mailto:gliverma at westga.edu>> wrote:
>     > Thanks! I turned off drc as suggested and will have to wait and
>     see how that
>     > works. Here are the packages I have installed via yum:
>     > # rpm -qa |grep -i gluster
>     > glusterfs-cli-3.5.0-2.el6.x86_64
>     > glusterfs-libs-3.5.0-2.el6.x86_64
>     > glusterfs-fuse-3.5.0-2.el6.x86_64
>     > glusterfs-server-3.5.0-2.el6.x86_64
>     > glusterfs-3.5.0-2.el6.x86_64
>     > glusterfs-geo-replication-3.5.0-2.el6.x86_64
>     >
>     > The nfs server service was showing to be running even when stuff
>     wasn't
>     > working.  This is from while it was broken:
>     >
>     > # gluster volume status
>     > Status of volume: gv0
>     > Gluster process           Port
>     > Online  Pid
>     >
>     ------------------------------------------------------------------------------------------------------------
>     > Brick eapps-gluster01.my.domain:/export/sdb1/gv0   49152   Y    
>       39593
>     > Brick eapps-gluster02.my.domain:/export/sdb1/gv0   49152   Y    
>       2472
>     > Brick eapps-gluster03.my.domain:/export/sdb1/gv0   49152   Y    
>       1866
>     > NFS Server on localhost        2049    Y
>     > 39603
>     > Self-heal Daemon on localhost      N/A     Y
>     > 39610
>     > NFS Server on eapps-gluster03.my.domain 2049    Y       35125
>     > Self-heal Daemon on eapps-gluster03.my.domain       N/A   Y    
>       35132
>     > NFS Server on eapps-gluster02.my.domain 2049    Y       37103
>     > Self-heal Daemon on eapps-gluster02.my.domain       N/A   Y    
>       37110
>     >
>     > Task Status of Volume gv0
>     >
>     ---------------------------------------------------------------------------------------------------------------
>     >
>     >
>     > Running 'service glusterd restart' on the NFS server made things
>     start
>     > working again after this.
>     >
>     >
>     > -- Gene
>     >
>     >
>     >
>     >
>     > On Tue, Jun 10, 2014 at 12:10 PM, Niels de Vos
>     <ndevos at redhat.com <mailto:ndevos at redhat.com>> wrote:
>     >>
>     >> On Tue, Jun 10, 2014 at 11:32:50AM -0400, Gene Liverman wrote:
>     >> > Twice now I have had my nfs connection to a replicated
>     gluster volume
>     >> > stop
>     >> > responding. On both servers that connect to the system I have the
>     >> > following
>     >> > symptoms:
>     >> >
>     >> >    1. Accessing the mount with the native client is still
>     working fine
>     >> > (the
>     >> >    volume is mounted both that way and via nfs. One app
>     requires the nfs
>     >> >    version)
>     >> >    2. The logs have messages stating the following: "kernel:
>     nfs: server
>     >> >    my-servers-name not responding, still trying"
>     >> >
>     >> > How can I fix this?
>     >>
>     >> You should check if the NFS-server (a glusterfs process) is still
>     >> running:
>     >>
>     >>     # gluster volume status
>     >>
>     >> If the NFS-server is not running anymore, you can start it with:
>     >>
>     >>     # gluster volume start $VOLUME force
>     >>     (you only need to do that for one volume)
>     >>
>     >>
>     >> In case this is with GlusterFS 3.5, you may be hitting a memory
>     leak in
>     >> the DRC (Duplicate Request Cache) implementation of the
>     NFS-server. You
>     >> can disable DRC with this:
>     >>
>     >>     # gluster volume set $VOLUME nfs.drc off
>     >>
>     >> In glusterfs-3.5.1 DRC will be disabled by default, there have
>     been too
>     >> many issues with DRC to enable it for everyone. We need to do
>     more tests
>     >> and fix DRC in the current development (master) branch.
>     >>
>     >> HTH,
>     >> Niels
>     >
>     >
>     >
>     > _______________________________________________
>     > Gluster-users mailing list
>     > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140611/fe301b45/attachment.html>


More information about the Gluster-users mailing list