[Gluster-devel] Help needed to implement Pause and Resume feature in Geo-replication

Aravinda avishwan at redhat.com
Wed Apr 2 09:12:21 UTC 2014

Hi All,

We are trying to implement pause/resume feature for GlusterFS 
geo-replication, which will be used before taking GlusterFS 
snapshot.(pause geo-rep, take snapshot, resume geo-rep)

Geo-replication involves
1. crawling(xtime based and changelog based)and identifying changes
2. Processing changes and queue for
         a) Entry operations on slave to keep same GFID on replicated files.
         b) Rsync or tarssh to sync files to slave.

As of now the idea is to stop processing on receiving pause signal(entry 
ops and rsync will stop eventually since processing is stopped) but 
crawling and identifying changes will continue. Sent initial 
patch(http://review.gluster.org/#/c/7322/) for the same.

gluster cli will send SIGUSR1 to geo-rep monitor process, then monitor 
will send SIGUSR1 to all the worker processes.
Worker processes uses os.pipe() and select to handle the signal received 
from monitor.

Signal handling is not working in monitor. (No error/traceback), looks 
like python's limitation(http://bugs.python.org/issue5315)
//Alternate solution(Involves lot of changes in existing geo-rep code):/
Moving crawling as separate process(outside the monitor process group), 
glustercli pids SIGSTOP to monitor pid group to pause and SIGCONT to 
monitor pid group to Resume.

Please suggest what can be done to effectively handle signal or 


