[Bugs] [Bug 1491060] New: PID File handling: self-heal-deamon pid file leaves stale pid and indiscriminately kills pid when glusterd is started
bugzilla at redhat.com
bugzilla at redhat.com
Tue Sep 12 23:32:01 UTC 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1491060
Bug ID: 1491060
Summary: PID File handling: self-heal-deamon pid file leaves
stale pid and indiscriminately kills pid when glusterd
is started
Product: GlusterFS
Version: 3.10
Component: glusterd
Severity: high
Assignee: bugs at gluster.org
Reporter: ben at apcera.com
CC: bugs at gluster.org
Description of problem:
self-heal-deamon pid file leave stale pid and indiscriminately kills pid when
glusterd is started. pid files are stored in `/var/lib/glusterd` which persists
across reboots. When glusterd is started (or restarted or host rebooted) the
pid of any process matching the pid in the shd pid file is killed.
Version-Release number of selected component (if applicable):
3.10.4 from ppa:gluster/glusterfs-3.10
How reproducible:
1 to 1
Steps to Reproduce:
1. Create a volume.
2. Enable Self-Heal Deamon
3. pid status
find /var/lib/glusterd/ -name '*pid' -exec tail -v {} \;
==> /var/lib/glusterd/glustershd/run/glustershd.pid <==
11642
==> /var/lib/glusterd/vols/vol0/run/172.28.128.5-data-brick0.pid <==
11169
4. killall -w glusterfs
5. create a process, background it, record the pid
sleep infinity & pid=$!
[1] 11669
6. put the pid of the process into the pid file
echo $pid >/ var/lib/glusterd/glustershd/run/glustershd.pid
7. confirm above
find /var/lib/glusterd/ -name '*pid' -exec tail -v {} \;
==> /var/lib/glusterd/glustershd/run/glustershd.pid <==
11669
==> /var/lib/glusterd/vols/vol0/run/172.28.128.5-data-brick0.pid <==
11169
8. restart glusterfs-server
service glusterfs-server restart
glusterfs-server stop/waiting
glusterfs-server start/running, process 11687
9. shell notifies that the background process was terminated
[1]+ Terminated sleep infinity
10. shd starts, but kills a process other than glusterfs
gluster v status
Status of volume: vol0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 172.28.128.5:/data/brick0 49152 0 Y 11169
Brick 172.28.128.6:/data/brick0 49152 0 Y 11023
Self-heal Daemon on localhost N/A N/A Y 12023
Self-heal Daemon on 172.28.128.6 N/A N/A Y 11044
Task Status of Volume vol0
------------------------------------------------------------------------------
There are no active volume tasks
Note: In some cases shd fails to start.
Note2: In one case I saw the same pid listed for the brick and shd. In this
case the brick was terminated when shd started.
find /var/lib/glusterd/ -name '*pid' -exec tail -v {} \;
==> /var/lib/glusterd/vols/apcfs-default/run/172.27.0.19-data-brick0.pid <==
1468
==> /var/lib/glusterd/glustershd/run/glustershd.pid <==
1468
Actual results:
1. pid file /var/lib/glusterd/glustershd/run/glustershd.pid remains after shd
is stopped
2. glusterd kills any process number in the stale pid file.
Expected results:
1. shd pid file should be cleaned up
2. glusterd should only kill instances of glusterfs process
Additional info:
OS is Ubuntu Trusty
Workaround:
in our automation, when we stop all gluster processes (reboot, upgrade, etc.)
we ensure all processes are stopped and then cleanup the pids with 'find
/var/lib/glusterd/ -name '*pid' -delete'
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list