[Bugs] [Bug 1223205] New: [Snapshot] Scheduled job is not processed when one of the node of shared storage volume is down

Wed May 20 06:06:41 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1223205

            Bug ID: 1223205
           Summary: [Snapshot] Scheduled job is not processed when one of
                    the node of shared storage volume is down
           Product: Red Hat Gluster Storage
           Version: 3.0
         Component: gluster-snapshot
          Keywords: Triaged
          Severity: urgent
          Assignee: rjoseph at redhat.com
          Reporter: asengupt at redhat.com
        QA Contact: storage-qa-internal at redhat.com
                CC: ashah at redhat.com, bugs at gluster.org,
                    gluster-bugs at redhat.com, rjoseph at redhat.com,
                    senaik at redhat.com
        Depends On: 1218573

+++ This bug was initially created as a clone of Bug #1218573 +++

Description of problem:

Scheduler is not picking scheduled jobs, when one of the storage node of shared
storage volume is down.

Version-Release number of selected component (if applicable):

[root at localhost glusterfs]# rpm -qa | grep glusterfs
glusterfs-debuginfo-3.7.0alpha0-0.9.git989bea3.el7.centos.x86_64
glusterfs-libs-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-fuse-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-extra-xlators-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-geo-replication-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-cli-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-api-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-server-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-devel-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64

How reproducible:

100%

Steps to Reproduce:

1. Create 2*2 distributed replicate volume.

2. Create  shared storage replicate volume on storage node which is not part of
volume whose snapshot is scheduled. and mount on each storage node on path
/var/run/gluster/shared_storage 
3. initialize scheduler on each storage node e.g run snap_scheduler.py init 
command 
4. Enable scheduler on storage nodes e.g run snap_scheduler.py enable 
5. Add jobs to create snapshot of volume, with interval of 5  min. e.g
snap_scheduler.py add job1 "*/5 * * * *" testvol
6. bring down the both shared storage node.
7. Bring up any  one of the shared storage node.

Actual results:

Scheduled job is not picked by scheduler  

Expected results:

Scheduler should pick the scheduled jobs

Additional info:

[root at localhost glusterfs]# gluster v info testvol

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: f5eed851-6f24-4cde-903e-7669f5437bc9
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.47.143:/rhs/brick1/b1
Brick2: 10.70.47.145:/rhs/brick1/b2
Brick3: 10.70.47.150:/rhs/brick1/b3
Brick4: 10.70.47.151:/rhs/brick1/b4
Options Reconfigured:
features.quota: on
features.quota-deem-statfs: on
features.uss: enable
features.barrier: disable
====================================
Shared storage volume

[root at localhost ~]# gluster v info meta

Volume Name: meta
Type: Replicate
Volume ID: b07daf4e-891d-4022-972a-af181250dc07
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.70.46.248:/rhs/brick1/b1
Brick2: 10.70.46.251:/rhs/brick1/b2

--- Additional comment from  on 2015-05-08 05:45:30 EDT ---

Version : glusterfs 3.7.0beta1 built on May  7 2015
=======

Another scenario where jobs are not picked up:

1) Create a dist-rep volume and mount it

2) Create a shared storage and mount it 

Enable Scheduler and schedule jobs on the volumes 
snap_scheduler.py add "A1"  "*/5 * * * * " "vol1"
snap_scheduler: Successfully added snapshot schedule

snap_scheduler.py add "A2"  "*/10 * * * * " "vol2"
snap_scheduler: Successfully added snapshot schedule

3) Take a snapshot of the shared storage 
gluster snapshot create MV_Snap gluster_shared_storage 
snapshot create: success: Snap MV_Snap_GMT-2015.05.08-09.20.26 created
successfully

4)Add some more jobs - A3 and A4 

5)Stop the volume and see that at the next scheduled time no job is picked up.

6)Restore the shared storage to the snap taken and start the volume 

7)After restoring the Scheduler lists A1 and A2 jobs, but none of them are
picked up

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1218573
[Bug 1218573] [Snapshot] Scheduled job is not processed when one of the
node of shared storage volume is down
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=VM9kDqSR9I&a=cc_unsubscribe