[Gluster-devel] Help needed to implement Pause and Resume feature in Geo-replication
Aravinda
avishwan at redhat.com
Wed Apr 2 09:12:21 UTC 2014
Hi All,
We are trying to implement pause/resume feature for GlusterFS
geo-replication, which will be used before taking GlusterFS
snapshot.(pause geo-rep, take snapshot, resume geo-rep)
Geo-replication involves
1. crawling(xtime based and changelog based)and identifying changes
2. Processing changes and queue for
a) Entry operations on slave to keep same GFID on replicated files.
b) Rsync or tarssh to sync files to slave.
As of now the idea is to stop processing on receiving pause signal(entry
ops and rsync will stop eventually since processing is stopped) but
crawling and identifying changes will continue. Sent initial
patch(http://review.gluster.org/#/c/7322/) for the same.
/Plan:/
gluster cli will send SIGUSR1 to geo-rep monitor process, then monitor
will send SIGUSR1 to all the worker processes.
Worker processes uses os.pipe() and select to handle the signal received
from monitor.
/Problem:/
Signal handling is not working in monitor. (No error/traceback), looks
like python's limitation(http://bugs.python.org/issue5315)
/
//Alternate solution(Involves lot of changes in existing geo-rep code):/
Moving crawling as separate process(outside the monitor process group),
glustercli pids SIGSTOP to monitor pid group to pause and SIGCONT to
monitor pid group to Resume.
Please suggest what can be done to effectively handle signal or
pause/resume.
--
regards
Aravinda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140402/ee4335e6/attachment-0001.html>
More information about the Gluster-devel
mailing list