[Gluster-devel] [Gluster-users] POSIX locks and disconnections between clients and bricks

Wed Mar 27 10:52:45 UTC 2019

On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez <jahernan at redhat.com> wrote:

> Hi Raghavendra,
>
> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa <rgowdapp at redhat.com>
> wrote:
>
>> All,
>>
>> Glusterfs cleans up POSIX locks held on an fd when the client/mount
>> through which those locks are held disconnects from bricks/server. This
>> helps Glusterfs to not run into a stale lock problem later (For eg., if
>> application unlocks while the connection was still down). However, this
>> means the lock is no longer exclusive as other applications/clients can
>> acquire the same lock. To communicate that locks are no longer valid, we
>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so
>> that any future operations on that fd will fail, forcing the application to
>> re-open the fd and re-acquire locks it needs [1].
>>
>
> Wouldn't it be better to retake the locks when the brick is reconnected if
> the lock is still in use ?
>

There is also  a possibility that clients may never reconnect. That's the
primary reason why bricks assume the worst (client will not reconnect) and
cleanup the locks.

> BTW, the referenced bug is not public. Should we open another bug to track
> this ?
>

I've just opened up the comment to give enough context. I'll open a bug
upstream too.

>
>
>>
>> Note that with AFR/replicate in picture we can prevent errors to
>> application as long as Quorum number of children "never ever" lost
>> connection with bricks after locks have been acquired. I am using the term
>> "never ever" as locks are not healed back after re-connection and hence
>> first disconnect would've marked the fd bad and the fd remains so even
>> after re-connection happens. So, its not just Quorum number of children
>> "currently online", but Quorum number of children "never having
>> disconnected with bricks after locks are acquired".
>>
>
> I think this requisite is not feasible. In a distributed file system,
> sooner or later all bricks will be disconnected. It could be because of
> failures or because an upgrade is done, but it will happen.
>
> The difference here is how long are fd's kept open. If applications open
> and close files frequently enough (i.e. the fd is not kept open more time
> than it takes to have more than Quorum bricks disconnected) then there's no
> problem. The problem can only appear on applications that open files for a
> long time and also use posix locks. In this case, the only good solution I
> see is to retake the locks on brick reconnection.
>

Agree. But lock-healing should be done only by HA layers like AFR/EC as
only they know whether there are enough online bricks to have prevented any
conflicting lock. Protocol/client itself doesn't have enough information to
do that. If its a plain distribute, I don't see a way to heal locks without
loosing the property of exclusivity of locks.

What I proposed is a short term solution. mid to long term solution should
be lock healing feature implemented in AFR/EC. In fact I had this
conversation with +Karampuri, Pranith <pkarampu at redhat.com> before posting
this msg to ML.

>
>> However, this use case is not affected if the application don't acquire
>> any POSIX locks. So, I am interested in knowing
>> * whether your use cases use POSIX locks?
>> * Is it feasible for your application to re-open fds and re-acquire locks
>> on seeing EBADFD errors?
>>
>
> I think that many applications are not prepared to handle that.
>

I too suspected that and in fact not too happy with the solution. But went
ahead with this mail as I heard implementing lock-heal  in AFR will take
time and hence there are no alternative short term solutions.

> Xavi
>
>
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7
>>
>> regards,
>> Raghavendra
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20190327/004d9c4f/attachment-0001.html>