[Bugs] [Bug 1175735] [USS]: snapd process is not killed once the glusterd comes back

Tue Jan 6 10:39:06 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1175735

Raghavendra Bhat <rabhat at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|POST                        |MODIFIED
                 CC|                            |rabhat at redhat.com

--- Comment #5 from Raghavendra Bhat <rabhat at redhat.com> ---
Description of problem:
=======================

When uss is enabled, it starts snapd on all the machines in the cluster. But in
a scenario where user tries to disable the uss and at the same time glusterd
goes down, the uss gets disabled but the snapd process is alive on the system
where glusterd went down. This is expected. But when the glusterd comes back
the snapd is still live whereas the uss is disabled. 

For example:
============

Uss is disabled and no snapd process running on any machines:
============================================================

[root at inception ~]# gluster v i vol3 | grep uss
features.uss: off
[root at inception ~]# ps -eaf | grep snapd
root      2299 26954  0 18:05 pts/0    00:00:00 grep snapd
[root at inception ~]# 

Enable the uss and snapd process should run on all the machines:
================================================================

[root at inception ~]# gluster v set vol3 uss on
volume set: success
[root at inception ~]# gluster v i vol3 | grep uss
features.uss: on
[root at inception ~]#
[root at inception ~]# gluster v status vol3 | grep -i "snapshot daemon"
Snapshot Daemon on localhost                49158    Y    2322
Snapshot Daemon on hostname1    49157    Y    3868
Snapshot Daemon on hostname2    49157    Y    3731
Snapshot Daemon on hostname3    49157    Y    3265
[root at inception ~]# 

Now, disable the USS and at the same time stop the glusterd on multiple
machines:
========================================================================

[root at inception ~]# gluster v set vol3 uss off
volume set: success
[root at inception ~]# gluster v status vol3 | grep -i "snapshot daemon"
[root at inception ~]# gluster v status vol3
Status of volume: vol3
Gluster process                        Port    Online    Pid
------------------------------------------------------------------------------
Brick hostname1:/rhs/brick4/b4    49155    Y    32406
NFS Server on localhost                    2049    Y    2431
Self-heal Daemon on localhost                N/A    Y    2202

Task Status of Volume vol3
------------------------------------------------------------------------------
There are no active volume tasks

[root at inception ~]# 

snapd should not be running on machine where glusterd is UP but should be
running on machines where glusterds are down:
==========================================================================

Node1:
======

[root at inception ~]# ps -eaf | grep snapd
root      2501 26954  0 18:11 pts/0    00:00:00 grep snapd
[root at inception ~]# 

Node2:
======

[root at rhs-arch-srv2 ~]# ps -eaf | grep snapd
root      3868     1  0 12:36 ?        00:00:00 /usr/sbin/glusterfsd -s
localhost --volfile-id snapd/vol3 -p
/var/lib/glusterd/vols/vol3/run/vol3-snapd.pid -l
/var/log/glusterfs/vol3-snapd.log --brick-name snapd-vol3 -S
/var/run/c01a04ffff6172926bfc0364bd457af3.socket --brick-port 49157
--xlator-option vol3-server.listen-port=49157
root      4163  5023  0 12:41 pts/0    00:00:00 grep snapd
[root at rhs-arch-srv2 ~]# 

Node3:
======

[root at rhs-arch-srv3 ~]# ps -eaf | grep snapd
root      3731     1  0 12:35 ?        00:00:00 /usr/sbin/glusterfsd -s
localhost --volfile-id snapd/vol3 -p
/var/lib/glusterd/vols/vol3/run/vol3-snapd.pid -l
/var/log/glusterfs/vol3-snapd.log --brick-name snapd-vol3 -S
/var/run/79af174d6c9c86897e0ff72f002994f2.socket --brick-port 49157
--xlator-option vol3-server.listen-port=49157
root      4028  5029  0 12:40 pts/0    00:00:00 grep snapd
[root at rhs-arch-srv3 ~]# 

Node4:
=======

[root at rhs-arch-srv4 ~]# ps -eaf | grep snapd
root      3265     1  0 12:36 ?        00:00:00 /usr/sbin/glusterfsd -s
localhost --volfile-id snapd/vol3 -p
/var/lib/glusterd/vols/vol3/run/vol3-snapd.pid -l
/var/log/glusterfs/vol3-snapd.log --brick-name snapd-vol3 -S
/var/run/4bd0ff786ad2fc2b7e504182d985b723.socket --brick-port 49157
--xlator-option vol3-server.listen-port=49157
root      3587  4733  0 12:41 pts/0    00:00:00 grep snapd
[root at rhs-arch-srv4 ~]# 

Start the glusterd on machines where it was stopped and look for snapd process,
it is still running.

Ran the same case with different scenario for bringing down the volume at the
same time bring down the glusterd. In that case when the glusterd comes online,
the brick process gets killed.

Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.6.1

How reproducible:
=================

always

Actual results:
===============

snapd process is online though for user uss is off

Expected results:
=================

snapd process should be killed

--- Additional comment from Rahul Hinduja on 2014-10-30 08:51:20 EDT ---

Additional info:
================

Lets say now, you enable the uss on the same volume, than the ports are shown
as N/A for all the servers which were brought online

[root at inception ~]# gluster v status vol3 | grep -i "snapshot daemon"
Snapshot Daemon on localhost                49159    Y    2716
Snapshot Daemon on hostname1    N/A    Y    3265
Snapshot Daemon on hostname2    N/A    Y    3868
Snapshot Daemon on hostname3    N/A    Y    3731
[root at inception ~]#

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.