[Gluster-devel] gsyncd deadlocks in log_raise_exception

蒋凯 jiangkai at 360.cn
Sun Jan 26 10:46:41 UTC 2014


Hi,


Generally, when gsyncd encounters exceptions, it can log the exception and restarts. But in some cases, it deadlocks. It happens in my environment about once a week. The replication stops, but geo-replication status command shows OK.

I checked the processes in the master. The gsync process hangs in below backtrace, and the ssh sub process can’t terminate. I kill the ssh sub process use the signal -9 manually, then the geo-replication exits and restarts.

#3 file '/usr/lib64/python2.6/subprocess.py', in '_eintr_retry_call'
#7 file '/usr/lib64/python2.6/subprocess.py', in 'wait'
#11 file '/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py', in 'log_raise_exception'
#14 file '/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py', in 'twrap'
#19 file '/usr/lib64/python2.6/threading.py', in 'run'
#22 file '/usr/lib64/python2.6/threading.py', in '__bootstrap_inner'
#25 file '/usr/lib64/python2.6/threading.py', in '__bootstrap'


I think the problem is it uses Popen.wait here, which may deadlock if the output is larger than the pipe size. See the document http://docs.python.org/2/library/subprocess.html, which recommends to use Popen.communicate instead.



Thanks.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140126/01bb25d6/attachment-0001.html>


More information about the Gluster-devel mailing list