[Bugs] [Bug 1491059] New: PID File handling: brick pid file leaves stale pid and brick fails to start when glusterd is started
bugzilla at redhat.com
bugzilla at redhat.com
Tue Sep 12 23:31:57 UTC 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1491059
Bug ID: 1491059
Summary: PID File handling: brick pid file leaves stale pid and
brick fails to start when glusterd is started
Product: GlusterFS
Version: 3.10
Component: glusterd
Severity: high
Assignee: bugs at gluster.org
Reporter: ben at apcera.com
CC: bugs at gluster.org
Description of problem:
brick pid file leaves stale pid and brick fails to start when glusterd is
started. pid files are stored in `/var/lib/glusterd` which persists across
reboots. When glusterd is started (or restarted or host rebooted) and the pid
of any process matching the pid in the brick pid file, brick fails to start.
Version-Release number of selected component (if applicable):
3.10.4 from ppa:gluster/glusterfs-3.10
How reproducible:
1 to 1
Steps to Reproduce:
1. Create a volume.
2. Enable Self-Heal Deamon
3. pid status
==> /var/lib/glusterd/glustershd/run/glustershd.pid <==
1398
==> /var/lib/glusterd/vols/vol0/run/172.28.128.5-data-brick0.pid <==
1407
4. killall -w glusterfsd
5. sleep infinity & pid=$!
6. echo $pid >/var/lib/glusterd/vols/vol0/run/172.28.128.5-data-brick0.pid
7. service glusterfs-server restart
glusterfs-server stop/waiting
glusterfs-server start/running, process 1548
8. gluster v status
Status of volume: vol0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 172.28.128.5:/data/brick0 N/A N/A N N/A
Brick 172.28.128.6:/data/brick0 49152 0 Y 11023
Self-heal Daemon on localhost N/A N/A Y 1684
Self-heal Daemon on 172.28.128.6 N/A N/A Y 11044
Task Status of Volume vol0
------------------------------------------------------------------------------
There are no active volume tasks
Workaround:
9. rm /var/lib/glusterd/vols/vol0/run/172.28.128.5-data-brick0.pid
10. service glusterfs-server restart
glusterfs-server stop/waiting
glusterfs-server start/running, process 1743
11. gluster v status
Status of volume: vol0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 172.28.128.5:/data/brick0 49152 0 Y 1888
Brick 172.28.128.6:/data/brick0 49152 0 Y 11023
Self-heal Daemon on localhost N/A N/A Y 1879
Self-heal Daemon on 172.28.128.6 N/A N/A Y 11044
Task Status of Volume vol0
------------------------------------------------------------------------------
There are no active volume tasks
Actual results:
1. brick pid file(s) remain after brick is stopped
2. glusterd fails to start brick when the pid in the pid file matches any
process
Expected results:
1. brick pid file(s) should be cleaned up when the brick is stopped gracefully
2. glusterd should start the brick when the process in the pid file is not a
glusterfd process
Additional info:
OS is Ubuntu Trusty
Workaround:
in our automation, when we stop all gluster processes (reboot, upgrade, etc.)
we ensure all processes are stopped and then cleanup the pids with 'find
/var/lib/glusterd/ -name '*pid' -delete'
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list