[Gluster-users] systemd kill mode

Wed Sep 2 19:30:33 UTC 2020

Hi,

you shouldn't do that,as it is intentional - glusterd is just a management layer and you might need to restart it in order to reconfigure a node. You don't want to kill your bricks to introduce a change, right? 
For details, you can check https://access.redhat.com/solutions/1313303 (you can obtain a subscription from developers.redhat.com).

In CentOS there is a dedicated service that takes care to shutdown all processes and avoid such freeze .
Here is it in case your distro doesn't provide it:

user at system:~/Gluster/usr/lib/systemd/system> cat glusterfsd.service
[Unit]
Description=GlusterFS brick processes (stopping only)
After=network.target glusterd.service

[Service]
Type=oneshot
# glusterd starts the glusterfsd processed on-demand
# /bin/true will mark this service as started, RemainAfterExit keeps it active
ExecStart=/bin/true
RemainAfterExit=yes
# if there are no glusterfsd processes, a stop/reload should not give an error
ExecStop=/bin/sh -c "/bin/killall --wait glusterfsd || /bin/true"
ExecReload=/bin/sh -c "/bin/killall -HUP glusterfsd || /bin/true"

[Install]
WantedBy=multi-user.target

Of course you can also use '/usr/share/glusterfs/scripts/stop-all-gluster-processes.sh' to prevent the freeze as it will kill all gluster processes (including FUSE mounts on the system) and thus allow the FUSE clients accessing the bricks' processes and the rest of the TSP to act accordingly.

Both the glusterfsd.service and the stop-all-gluster-processes.sh are provided by the glusterfs-server package.

Best Regards,
Strahil Nikolov

В сряда, 2 септември 2020 г., 21:59:45 Гринуич+3, Ward Poelmans <wpoely86 at gmail.com> написа: 

Hi,

I've playing with glusterfs on a couple of VMs to get some feeling with
it. The setup is 2 bricks with replication with a thin arbiter. I've
noticed something 'odd' with the systemd unit file for glusterd. It has
KillMode=process
which means that on a 'systemctl stop glusterd' it will only kill the
glusterd daemon and not any of the subprocesses started by glusterd
(like glusterfs and glusterfsd).

Does anyone know the reason for this? The git history of the file
doesn't help. It was added in 2013 but the commit doesn't mention
anything about it.

The reason I'm asking is because I noticed that a write was hanging when
I rebooted one of the brick VMs: a client was doing 'dd if=/dev/zero
of=/some/file' on gluster when I did a clean shut down of one of the
brick VMs. This caused the dd to hang for the duration of
network.ping-timeout (42 seconds by default). When I changed the kill
mode to 'control-group' (which kills all process started by glusterd
too), this didn't happen any more.

I was not expecting any 'hangs' on a proper shut down of one of the
bricks when replication is used. Is this a bug or is something wrong
with my setup?

Ward
________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users at gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users