[Gluster-users] AFR locking options

Tue Jul 20 20:10:39 UTC 2010

On Jul 20, 2010, at 12:17 PM, Richard de Vries wrote:

> Dear All,

> Is it advisable to increase the number of lock servers to ensure that
> when the first node in the subvolume fails the other node can continue
> to work correctly even if the first node comes back?

No. You don't need to increase the number of lock servers just to deal with this case. As explained below, the second node can write even when the first node has failed after holding a lock.

Increasing the number of lock servers is advised if you have more than one client writing to the same region of the same file. Having more than one lock server eliminates a (small) race in which the following happens:

1) Server 1 goes down.
2) Client 2 holds a lock on the (only remaining) server 2 and starts writing.
3) Server 1 comes back up.
4) Client 1 holds a lock on server 1 (the logic is the lock server(s) start from the "first subvolume that is up" and continue)
5) Client 1 also writes (because it has acquired a lock).

The danger here is that (5) and (2) can happen in different orders on servers 1 and 2, thus leaving them inconsistent with each other. As you can see, the window is quite small, and only comes into play if you have two clients writing to the same region of the same file. 

> 2nd question:
> What happens if the first node holds a lock on a file and fails (power
> down or kernel panic)? What if the second node now wants to modify
> this file?

The lock will automatically be released when the first client fails. The second node can then hold the lock and continue with its write.

------------------------------
Vikas Gorur
Engineer - Gluster, Inc.
------------------------------