[Gluster-users] POSIX locks and disconnections between clients and bricks

Wed Mar 27 09:53:35 UTC 2019

On 3/27/19 12:55 PM, Xavi Hernandez wrote:
> Hi Raghavendra,
> 
> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa 
> <rgowdapp at redhat.com <mailto:rgowdapp at redhat.com>> wrote:
> 
>     All,
> 
>     Glusterfs cleans up POSIX locks held on an fd when the client/mount
>     through which those locks are held disconnects from bricks/server.
>     This helps Glusterfs to not run into a stale lock problem later (For
>     eg., if application unlocks while the connection was still down).
>     However, this means the lock is no longer exclusive as other
>     applications/clients can acquire the same lock. To communicate that
>     locks are no longer valid, we are planning to mark the fd (which has
>     POSIX locks) bad on a disconnect so that any future operations on
>     that fd will fail, forcing the application to re-open the fd and
>     re-acquire locks it needs [1].
> 
> 
> Wouldn't it be better to retake the locks when the brick is reconnected 
> if the lock is still in use ?
> 
> BTW, the referenced bug is not public. Should we open another bug to 
> track this ?
> 
> 
>     Note that with AFR/replicate in picture we can prevent errors to
>     application as long as Quorum number of children "never ever" lost
>     connection with bricks after locks have been acquired. I am using
>     the term "never ever" as locks are not healed back after
>     re-connection and hence first disconnect would've marked the fd bad
>     and the fd remains so even after re-connection happens. So, its not
>     just Quorum number of children "currently online", but Quorum number
>     of children "never having disconnected with bricks after locks are
>     acquired".
> 
> 
> I think this requisite is not feasible. In a distributed file system, 
> sooner or later all bricks will be disconnected. It could be because of 
> failures or because an upgrade is done, but it will happen.
> 
> The difference here is how long are fd's kept open. If applications open 
> and close files frequently enough (i.e. the fd is not kept open more 
> time than it takes to have more than Quorum bricks disconnected) then 
> there's no problem. The problem can only appear on applications that 
> open files for a long time and also use posix locks. In this case, the 
> only good solution I see is to retake the locks on brick reconnection.
> 
> 
>     However, this use case is not affected if the application don't
>     acquire any POSIX locks. So, I am interested in knowing
>     * whether your use cases use POSIX locks?
>     * Is it feasible for your application to re-open fds and re-acquire
>     locks on seeing EBADFD errors?
> 
> 
> I think that many applications are not prepared to handle that.

+1 to all the points mentioned by Xavi. This has been day-1 issue for 
all the applications using locks (like NFS-Ganesha and Samba). Not many 
applications re-open and re-acquire the locks. On receiving EBADFD, that 
error is most likely propagated to application clients.

Agree with Xavi that its better to heal/re-acquire the locks on brick 
reconnects before it accepts any fresh requests. I also suggest to have 
this healing mechanism generic enough (if possible) to heal any 
server-side state (like upcall, leases etc).

Thanks,
Soumya

> 
> Xavi
> 
> 
>     [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7
> 
>     regards,
>     Raghavendra
> 
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     https://lists.gluster.org/mailman/listinfo/gluster-users
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>