[Gluster-users] Unexpected behaviour during replication heal

Wed Jun 29 10:53:57 UTC 2011

Interesting development....

I must have left the dd process which had hard locked for just under 30 minutes in my previous tests - I've just tested it again and waited a longer time for things to fix themselves...

After exactly 30 minutes since the last write to the file on the cluster, the process starts being able to write data again.
The previously disconnected server suddenly syncs it's partial file up to the point that the second server has, and the client begins throwing data to both servers again.

So.... where is this 30 minute timeout coming from?  My glusterd.vol file has a keepalive-time of 10, and a keepalive-interval of 2 - are there any other settings I can change to reduce the delay in the client being able to talk to the servers again?

If there are no settings that effect that delay, is this still a bug that needs to be investigated?

Cheers,
Darren.

-- 
Darren Austin - Systems Administrator, Widgit Software.
Tel: +44 (0)1926 333680.    Web: http://www.widgit.com/
26 Queen Street, Cubbington, Warwickshire, CV32 7NA.