[Gluster-devel] Single-process (server and client) AFR problems

Gordan Bobic gordan at bobich.net
Mon May 19 20:26:51 UTC 2008


Hi,

I'm having rather major problems getting single-process AFR to work 
between two servers. When both servers come up, the GlusterFS on both 
locks up pretty solid. The processes that try to access the FS 
(including ls) seem to get nowhere for a few minutes, and then complete. 
But something gets stuck, and glusterfs cannot be killed even with -9!

Another worrying thing is that fuse kernel module ends up having a 
reference count even after glusterfs process gets killed (sometimes 
killing the remote process that isn't locked up on it's host can break 
the locked-up operations and allow for the local glusterfs process to be 
killed). So fuse then cannot be unloaded.

This error seems to come up in the logs all the time:
2008-05-19 20:57:17 E [afr.c:1985:afr_selfheal] home: none of the 
children are up for locking, returning EIO
2008-05-19 20:57:17 E [fuse-bridge.c:692:fuse_fd_cbk] glusterfs-fuse: 
63: (12) /test => -1 (5)

This implies come kind of a locking issue, but the same error and 
conditions also arise when posix locking module is removed.

The configs for the two servers are attached. They are almost identical 
to the examples on the glusterfs wiki:

http://www.gluster.org/docs/index.php/AFR_single_process

What am I doing wrong? Have I run into another bug?

Gordan
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: home.vol.1
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20080519/64579cf6/attachment-0006.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: home.vol.2
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20080519/64579cf6/attachment-0007.ksh>


More information about the Gluster-devel mailing list