[Gluster-devel] [RFC] What if client fuse process crash?

Niels de Vos ndevos at redhat.com
Tue Aug 6 07:50:30 UTC 2019


On Tue, Aug 06, 2019 at 03:14:46PM +0800, Changwei Ge wrote:
> On 2019/8/6 2:57 下午, Ravishankar N wrote:
> > 
> > On 06/08/19 11:44 AM, Changwei Ge wrote:
> > > Hi Ravishankar,
> > > 
> > > 
> > > Thanks for your share, it's very useful to me.
> > > 
> > > I am setting up a glusterfs storage cluster recently and the
> > > umount/mount recovering process bothered me.
> > Hi Changwei,
> > Why are you needing to do frequent remounts? If your gluster fuse client
> > is crashing frequently, that should be investigated and fixed. If you
> > have a reproducer, please raise a bug with all the details like the
> > glusterfs version, core files and log files.
> 
> 
> Hi Ravi,
> 
> Actually, glusterfs client fuse process ran well in my environment. But
> high-availability and fault-tolerance are also my big concerns.
> 
> So I killed the fuse process to see what would happen. AFAIK, userspace
> processes are likely to be killed or crashed somehow, which is not under our
> control. :-(
> 
> Another scenario is *software upgrade*. Since we have to upgrade glusterfs
> client version in order to enrich features and fix bugs.  It will be
> friendly to applications if the upgrade is transparent.

As open files have a state associated with them, and the state is lost
when the fuse process exits. Restarting the fuse process will then need
to restore the state of the open files (and caches, and more). This is
not trivial and I do not think any work on this end has been done yet.

Some users take an alternative route. Mounted filesystems have indeed
issues with online updating. So, maybe you do not need to mount the
filesystem at all. Depending on the need of your applications, using
glusterfs-coreutils instead of a FUSE (or NFS) mount might be an option
for you. The short living processes connect to the Gluster Volume when
needed, and do not keep a connection open. Updating userspace tools is
much simpler than long running processes that are hooked into the
kernel.

See https://github.com/gluster/glusterfs-coreutils for details.

HTH,
Niels


> 
> 
> Thanks,
> 
> Changwei
> 
> 
> > Regards,
> > Ravi
> > > 
> > > 
> > > I happened to find some patches[1] from internet aiming to address
> > > such a problem but no idea why they were not managed to merge into
> > > glusterfs mainline.
> > > 
> > > Do you know why?
> > > 
> > > 
> > > Thanks,
> > > 
> > > Changwei
> > > 
> > > 
> > > [1]:
> > > 
> > > https://review.gluster.org/#/c/glusterfs/+/16843/
> > > 
> > > https://github.com/gluster/glusterfs/issues/242
> > > 
> > > 
> > > On 2019/8/6 1:12 下午, Ravishankar N wrote:
> > > > On 05/08/19 3:31 PM, Changwei Ge wrote:
> > > > > Hi list,
> > > > > 
> > > > > If somehow, glusterfs client fuse process dies. All
> > > > > subsequent file operations will be failed with error 'no
> > > > > connection'.
> > > > > 
> > > > > I am curious if the only way to recover is umount and mount again?
> > > > Yes, this is pretty much the case with all fuse based file
> > > > systems. You can use -o auto_unmount
> > > > (https://review.gluster.org/#/c/17230/) to automatically cleanup
> > > > and not having to manually unmount.
> > > > > 
> > > > > If so, that means all processes working on top of glusterfs
> > > > > have to close files, which sometimes is hard to be
> > > > > acceptable.
> > > > 
> > > > There is
> > > > https://research.cs.wisc.edu/wind/Publications/refuse-eurosys11.html,
> > > > which claims to provide a framework for transparent failovers. I
> > > > can't find any publicly available code though.
> > > > 
> > > > Regards,
> > > > Ravi
> > > > > 
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > Changwei
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > 
> > > > > Community Meeting Calendar:
> > > > > 
> > > > > APAC Schedule -
> > > > > Every 2nd and 4th Tuesday at 11:30 AM IST
> > > > > Bridge: https://bluejeans.com/836554017
> > > > > 
> > > > > NA/EMEA Schedule -
> > > > > Every 1st and 3rd Tuesday at 01:00 PM EDT
> > > > > Bridge: https://bluejeans.com/486278655
> > > > > 
> > > > > Gluster-devel mailing list
> > > > > Gluster-devel at gluster.org
> > > > > https://lists.gluster.org/mailman/listinfo/gluster-devel
> > > > > 
> _______________________________________________
> 
> Community Meeting Calendar:
> 
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/836554017
> 
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/486278655
> 
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
> 


More information about the Gluster-devel mailing list