[Gluster-users] Issue with server reboot / shutdown

Wed Nov 12 08:56:28 UTC 2014

On Wed, Nov 12, 2014 at 09:07:46AM +0100, Florian Knorn wrote:
> Hi,
> 
> I have an issue where I can’t reboot or shutdown my server with
> gluster running. The setup:
> 
> Debian 7.7, Gluster - 3.5.2, 1 volume with 1 brick mounted via iSCSI,
> using multipath-tools.
> 
> On shutdown / reboot, the system hangs at “Unmounting local
> filesystems”, see this screenshot:
> 
> https://www.dropbox.com/s/7g22330382mvkm6/shutdown_issue.jpg?dl=0
> 
> Testing around, I noticed that if I issue “service glusterfs-server
> stop; umount /path/to/brick” it says the path is still in use.
> However, if I use “gluster volume stop THEVOLUME; service
> glusterfs-server stop; umount /path/to/brick” then it works.
> 
> Similarly, IF prior to shutdown / reboot I manually stop the volume
> first, then it all goes through.
> 
> So it seems to me that even though it is stopped, the gluster server
> still has some files on the brick open, which prevents the unmount,
> and locks up the system on reboot.

I think that "service glusterfs-server stop" does not stop the brick
processes (glusterfsd). These are stopped when doing a "gluster volume
stop ...", but that should not be needed for a reboot/shutdown. Stopping
the volume will also take down the brick processes on the other storage
servers.

I do not know which service scripts Debian uses, but there should be an
option that gracefully kills the brick processes on shutdown.

> Any pointers? Or has this to do with multipath, because I believe the
> issue started after using that?

It could also be related to the fact that you store the bricks on a
iscsi disk. It is common for distributions to wait until unmounting has
finished. There may be some processes that are flushing their data, and
you do not want them to abort writing out their data. However, if the
iscsi service or the network has been stopped already, it might not be
possible to have the processes write out their data. It is tricky to get
the shutdown procedure right, something in this order should work:

   4. stop glusterd, self-heal, quota, .. and glusterfsd processes
   3. unmount bricks
   2. stop iscsi
   1. stop network

Maybe with these details you can identify where things go wrong? Please
keep us informed about the results you get.

Thanks,
Niels
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141112/ba53d782/attachment.sig>