[Bugs] [Bug 1491059] New: PID File handling: brick pid file leaves stale pid and brick fails to start when glusterd is started

bugzilla at redhat.com bugzilla at redhat.com
Tue Sep 12 23:31:57 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1491059

            Bug ID: 1491059
           Summary: PID File handling: brick pid file leaves stale pid and
                    brick fails to start when glusterd is started
           Product: GlusterFS
           Version: 3.10
         Component: glusterd
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: ben at apcera.com
                CC: bugs at gluster.org



Description of problem:

brick pid file leaves stale pid and brick fails to start when glusterd is
started. pid files are stored in `/var/lib/glusterd` which persists across
reboots. When glusterd is started (or restarted or host rebooted) and the pid
of any process matching the pid in the brick pid file, brick fails to start.

Version-Release number of selected component (if applicable):


3.10.4 from ppa:gluster/glusterfs-3.10

How reproducible:

1 to 1

Steps to Reproduce:
1. Create a volume. 
2. Enable Self-Heal Deamon
3. pid status
==> /var/lib/glusterd/glustershd/run/glustershd.pid <==
1398
==> /var/lib/glusterd/vols/vol0/run/172.28.128.5-data-brick0.pid <==
1407
4. killall -w glusterfsd
5. sleep infinity & pid=$!
6. echo $pid >/var/lib/glusterd/vols/vol0/run/172.28.128.5-data-brick0.pid
7. service glusterfs-server restart
glusterfs-server stop/waiting
glusterfs-server start/running, process 1548
8. gluster v status
Status of volume: vol0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 172.28.128.5:/data/brick0             N/A       N/A        N       N/A  
Brick 172.28.128.6:/data/brick0             49152     0          Y       11023
Self-heal Daemon on localhost               N/A       N/A        Y       1684 
Self-heal Daemon on 172.28.128.6            N/A       N/A        Y       11044

Task Status of Volume vol0
------------------------------------------------------------------------------
There are no active volume tasks

Workaround:
9. rm /var/lib/glusterd/vols/vol0/run/172.28.128.5-data-brick0.pid
10. service glusterfs-server restart
glusterfs-server stop/waiting
glusterfs-server start/running, process 1743
11. gluster v status
Status of volume: vol0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 172.28.128.5:/data/brick0             49152     0          Y       1888 
Brick 172.28.128.6:/data/brick0             49152     0          Y       11023
Self-heal Daemon on localhost               N/A       N/A        Y       1879 
Self-heal Daemon on 172.28.128.6            N/A       N/A        Y       11044

Task Status of Volume vol0
------------------------------------------------------------------------------
There are no active volume tasks


Actual results:
1. brick pid file(s) remain after brick is stopped
2. glusterd fails to start brick when the pid in the pid file matches any
process

Expected results:
1. brick pid file(s) should be cleaned up when the brick is stopped gracefully
2. glusterd should start the brick when the process in the pid file is not a
glusterfd process

Additional info:
OS is Ubuntu Trusty

Workaround:

in our automation, when we stop all gluster processes (reboot, upgrade, etc.)
we ensure all processes are stopped and then cleanup the pids with 'find
/var/lib/glusterd/ -name '*pid' -delete'

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list