[Bugs] [Bug 1600941] [geo-rep]: geo-replication scheduler is failing due to unsuccessful umount

Fri Jul 13 12:44:47 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1600941

Kotresh HR <khiremat at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|bugs at gluster.org            |khiremat at redhat.com

--- Comment #1 from Kotresh HR <khiremat at redhat.com> ---

Description of problem:
=======================

What's broken: schedule_georep.py fails to complete the transition.
--------------

What this tool does: 
--------------------
schedule_georep.py is a tool to run Geo-replication when required. This can be
used to schedule the Geo-replication to run once in a day using

   # Run daily at 08:30pm
   30 20 * * * root python /usr/share/glusterfs/scripts/schedule_georep.py \
      --no-color gv1 fvm1 gv2 >> /var/log/glusterfs/schedule_georep.log 2>&1

This tool does the following,

1. Stop Geo-replication if Started
2. Start Geo-replication
3. Set Checkpoint
4. Check the Status and see Checkpoint is Complete.(LOOP)
5. If checkpoint complete, Stop Geo-replication

Actual error:
-------------
[root at dhcp41-227 ~]# python /usr/share/glusterfs/scripts/schedule_georep.py
vol0 10.70.42.9 vol1
[    OK] Stopped Geo-replication
[    OK] Set Checkpoint
[    OK] Started Geo-replication and watching Status for Checkpoint completion
[NOT OK] Unable to Remove temp directory /tmp/georepsetup_xO3cms
rmdir: failed to remove ‘/tmp/georepsetup_xO3cms’: Device or resource busy

[root at dhcp41-227 ~]#

df: doesnt show the mount
---

[root at dhcp41-227 ~]# df
Filesystem                           1K-blocks    Used Available Use% Mounted
on
/dev/mapper/rhgs-root                 17811456 2572524  15238932  15% /
devtmpfs                               3992712       0   3992712   0% /dev
tmpfs                                  4004780       0   4004780   0% /dev/shm
tmpfs                                  4004780   17124   3987656   1% /run
tmpfs                                  4004780       0   4004780   0%
/sys/fs/cgroup
/dev/sda1                              1038336  163040    875296  16% /boot
tmpfs                                   800956       0    800956   0%
/run/user/0
/dev/mapper/RHS_vg1-RHS_lv1            8330240   33524   8296716   1%
/rhs/brick1
/dev/mapper/RHS_vg2-RHS_lv2            8330240   33524   8296716   1%
/rhs/brick2
/dev/mapper/RHS_vg3-RHS_lv3            8330240   33524   8296716   1%
/rhs/brick3
10.70.41.227:/gluster_shared_storage  17811456 2756096  15055360  16%
/run/gluster/shared_storage
[root at dhcp41-227 ~]#

process still lists as mounted
------------------------------

[root at dhcp41-227 ~]# ps -eaf | grep glusterfs | grep tmp
root     21976     1  0 11:23 ?        00:00:00 /usr/sbin/glusterfs
--volfile-server localhost --volfile-id vol0 -l
/var/log/glusterfs/geo-replication/schedule_georep.mount.log
/tmp/georepsetup_xO3cms
root     22096     1  0 11:23 ?        00:00:00 /usr/sbin/glusterfs
--aux-gfid-mount --acl
--log-file=/var/log/glusterfs/geo-replication/vol0/ssh%3A%2F%2Froot%4010.70.42.9%3Agluster%3A%2F%2F127.0.0.1%3Avol1.%2Frhs%2Fbrick3%2Fb8.gluster.log
--volfile-server=localhost --volfile-id=vol0 --client-pid=-1
/tmp/gsyncd-aux-mount-gnFTXn
root     22098     1  0 11:23 ?        00:00:00 /usr/sbin/glusterfs
--aux-gfid-mount --acl
--log-file=/var/log/glusterfs/geo-replication/vol0/ssh%3A%2F%2Froot%4010.70.42.9%3Agluster%3A%2F%2F127.0.0.1%3Avol1.%2Frhs%2Fbrick2%2Fb5.gluster.log
--volfile-server=localhost --volfile-id=vol0 --client-pid=-1
/tmp/gsyncd-aux-mount-vce8hM
root     22112     1  0 11:23 ?        00:00:00 /usr/sbin/glusterfs
--aux-gfid-mount --acl
--log-file=/var/log/glusterfs/geo-replication/vol0/ssh%3A%2F%2Froot%4010.70.42.9%3Agluster%3A%2F%2F127.0.0.1%3Avol1.%2Frhs%2Fbrick1%2Fb2.gluster.log
--volfile-server=localhost --volfile-id=vol0 --client-pid=-1
/tmp/gsyncd-aux-mount-CX9Ct9
[root at dhcp41-227 ~]#

Manually umount also fails
--------------------------
[root at dhcp41-227 ~]# umount /tmp/georepsetup_xO3cms
umount: /tmp/georepsetup_xO3cms: not mounted
[root at dhcp41-227 ~]# rmdir /tmp/georepsetup_xO3cms
rmdir: failed to remove ‘/tmp/georepsetup_xO3cms’: Device or resource busy
[root at dhcp41-227 ~]# umount /tmp/georepsetup_xO3cms
umount: /tmp/georepsetup_xO3cms: not mounted
[root at dhcp41-227 ~]# 
[root at dhcp41-227 ~]# echo $?
32
[root at dhcp41-227 ~]# 

Additional information:
-----------------------

1. Script has a check for failure of umount before rmdir which is passing
through. 
2. Manually umount also fails for a said directory, however if the script is
re-executed the earlier directory gets removed successfully but it fails for
new mount directory

Version-Release number of selected component (if applicable):
==============================================================

mainline

How reproducible:
=================

Alwasy on Centos7

Steps to Reproduce:
===================
1. Setup geo-replication between master and slave
2. Run the tool with master, slave host and slave vol parameters

Actual results:
===============
Tool doesn't complete the transition from "touch mount" to "status complete"

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.