[Gluster-users] Unexpected behaviour during replication heal

Tue Jun 28 10:41:47 UTC 2011

On Tue, Jun 28, 2011 at 3:19 PM, Darren Austin <darren-lists at widgit.com>wrote:

> ----- Original Message -----
> > It looks like the disconnection happened in the middle of a write
> > transaction (after the lock phase, before the unlock phase). And the
>
> The server was deliberately disconnected after the write had begun, in
> order to test what would happen in that situation and to document a recovery
> procedure for it.
>
> > server's detection of client disconnection (via TCP_KEEPALIVE) seems to
> have
> > not happened before the client reconnected.
>
> I've not configured any special keep alive setting for the server or
> clients - the configuration was an out of the box glusterd.vol file, and a
> "volume create" sequence with standard params (no special settings or
> options applied).
>
> The disconnected server was also in that state for approx 10 minutes - not
> seconds.
>
> I assume the "default" set up is not to hold on to a locked file for over
> 10 minutes when in a disconnected state?
> Surely it shouldn't hold onto a lock *at all* once it's out of the cluster?

The problem here is that the server hasn't even detected the disconnection.
The client has a ping timeout logic checking for inactivity while there are
pending fops and force disconnects a server. At the server side, it should
either encounter a TCP RST or FIN, or, the TCP KEEPALIVE should kick in.
This is the behavior today. The default TCP KEEPALIVE can possibly take over
10 minutes.

> > The client, having witnessed the reconnection has assumed the locks have
> been relinquished by the
> > server. The server, however, having noticed the same client reconnection
> before
> > breakage of the original connection has not released the held locks.
>
> But why is the server still holding the locks WAY past the time it should
> be?
>

Locks are associated with "connections", not a time. Server is holding on
because it believes the client is still connected (it hasn't witnessed a
socket error yet)

> We're not talking seconds here, we're talking minutes of disconnection.
>
> And why, when it is reconnected will it not sync that file back from the
> other servers that have a full copy of it?
>
> > Tuning the server side tcp keepalive to a smaller value should fix
> > this problem. Can you please verify?
>
> Are you talking about the GlusterFS keep alive setting in the vol file, or
> changing the actual TCP keerpalive settings for the *whole* server?
>  Changing the server TCP keepalive is not an option, since it has
> ramifications on other things - and it shouldn't be necessary to solve what
> is, really, a GlusterFS bug...
>
>
Of course not the system TCP keepalive. I was only talking about Gluster's
TCP keepalive. It should have kicked in about 40 secs of inactivity. Can you
check the server (brick) logs to check the order of detected disconnection
and new/reconnection from the client?

Avati
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110628/15a24a85/attachment.html>