[Gluster-devel] [Gluster-users] POSIX locks and disconnections between clients and bricks

Wed Mar 27 10:54:12 UTC 2019

On Wed, Mar 27, 2019 at 4:22 PM Raghavendra Gowdappa <rgowdapp at redhat.com>
wrote:

>
>
> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez <jahernan at redhat.com>
> wrote:
>
>> Hi Raghavendra,
>>
>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa <rgowdapp at redhat.com>
>> wrote:
>>
>>> All,
>>>
>>> Glusterfs cleans up POSIX locks held on an fd when the client/mount
>>> through which those locks are held disconnects from bricks/server. This
>>> helps Glusterfs to not run into a stale lock problem later (For eg., if
>>> application unlocks while the connection was still down). However, this
>>> means the lock is no longer exclusive as other applications/clients can
>>> acquire the same lock. To communicate that locks are no longer valid, we
>>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so
>>> that any future operations on that fd will fail, forcing the application to
>>> re-open the fd and re-acquire locks it needs [1].
>>>
>>
>> Wouldn't it be better to retake the locks when the brick is reconnected
>> if the lock is still in use ?
>>
>
> There is also  a possibility that clients may never reconnect. That's the
> primary reason why bricks assume the worst (client will not reconnect) and
> cleanup the locks.
>
>
>> BTW, the referenced bug is not public. Should we open another bug to
>> track this ?
>>
>
> I've just opened up the comment to give enough context. I'll open a bug
> upstream too.
>
>
>>
>>
>>>
>>> Note that with AFR/replicate in picture we can prevent errors to
>>> application as long as Quorum number of children "never ever" lost
>>> connection with bricks after locks have been acquired. I am using the term
>>> "never ever" as locks are not healed back after re-connection and hence
>>> first disconnect would've marked the fd bad and the fd remains so even
>>> after re-connection happens. So, its not just Quorum number of children
>>> "currently online", but Quorum number of children "never having
>>> disconnected with bricks after locks are acquired".
>>>
>>
>> I think this requisite is not feasible. In a distributed file system,
>> sooner or later all bricks will be disconnected. It could be because of
>> failures or because an upgrade is done, but it will happen.
>>
>> The difference here is how long are fd's kept open. If applications open
>> and close files frequently enough (i.e. the fd is not kept open more time
>> than it takes to have more than Quorum bricks disconnected) then there's no
>> problem. The problem can only appear on applications that open files for a
>> long time and also use posix locks. In this case, the only good solution I
>> see is to retake the locks on brick reconnection.
>>
>
> Agree. But lock-healing should be done only by HA layers like AFR/EC as
> only they know whether there are enough online bricks to have prevented any
> conflicting lock. Protocol/client itself doesn't have enough information to
> do that. If its a plain distribute, I don't see a way to heal locks without
> loosing the property of exclusivity of locks.
>
> What I proposed is a short term solution. mid to long term solution should
> be lock healing feature implemented in AFR/EC. In fact I had this
> conversation with +Karampuri, Pranith <pkarampu at redhat.com> before
> posting this msg to ML.
>
>
>>
>>> However, this use case is not affected if the application don't acquire
>>> any POSIX locks. So, I am interested in knowing
>>> * whether your use cases use POSIX locks?
>>> * Is it feasible for your application to re-open fds and re-acquire
>>> locks on seeing EBADFD errors?
>>>
>>
>> I think that many applications are not prepared to handle that.
>>
>
> I too suspected that and in fact not too happy with the solution. But went
> ahead with this mail as I heard implementing lock-heal  in AFR will take
> time and hence there are no alternative short term solutions.
>

Also failing loudly is preferred to silently dropping locks.

>
>
>> Xavi
>>
>>
>>>
>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7
>>>
>>> regards,
>>> Raghavendra
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20190327/1b21de2e/attachment.html>