[Gluster-devel] [Gluster-users] POSIX locks and disconnections between clients and bricks

Xavi Hernandez jahernan at redhat.com
Wed Mar 27 11:51:41 UTC 2019


On Wed, Mar 27, 2019 at 11:54 AM Raghavendra Gowdappa <rgowdapp at redhat.com>
wrote:

>
>
> On Wed, Mar 27, 2019 at 4:22 PM Raghavendra Gowdappa <rgowdapp at redhat.com>
> wrote:
>
>>
>>
>> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez <jahernan at redhat.com>
>> wrote:
>>
>>> Hi Raghavendra,
>>>
>>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa <
>>> rgowdapp at redhat.com> wrote:
>>>
>>>> All,
>>>>
>>>> Glusterfs cleans up POSIX locks held on an fd when the client/mount
>>>> through which those locks are held disconnects from bricks/server. This
>>>> helps Glusterfs to not run into a stale lock problem later (For eg., if
>>>> application unlocks while the connection was still down). However, this
>>>> means the lock is no longer exclusive as other applications/clients can
>>>> acquire the same lock. To communicate that locks are no longer valid, we
>>>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so
>>>> that any future operations on that fd will fail, forcing the application to
>>>> re-open the fd and re-acquire locks it needs [1].
>>>>
>>>
>>> Wouldn't it be better to retake the locks when the brick is reconnected
>>> if the lock is still in use ?
>>>
>>
>> There is also  a possibility that clients may never reconnect. That's the
>> primary reason why bricks assume the worst (client will not reconnect) and
>> cleanup the locks.
>>
>>
>>> BTW, the referenced bug is not public. Should we open another bug to
>>> track this ?
>>>
>>
>> I've just opened up the comment to give enough context. I'll open a bug
>> upstream too.
>>
>>
>>>
>>>
>>>>
>>>> Note that with AFR/replicate in picture we can prevent errors to
>>>> application as long as Quorum number of children "never ever" lost
>>>> connection with bricks after locks have been acquired. I am using the term
>>>> "never ever" as locks are not healed back after re-connection and hence
>>>> first disconnect would've marked the fd bad and the fd remains so even
>>>> after re-connection happens. So, its not just Quorum number of children
>>>> "currently online", but Quorum number of children "never having
>>>> disconnected with bricks after locks are acquired".
>>>>
>>>
>>> I think this requisite is not feasible. In a distributed file system,
>>> sooner or later all bricks will be disconnected. It could be because of
>>> failures or because an upgrade is done, but it will happen.
>>>
>>> The difference here is how long are fd's kept open. If applications open
>>> and close files frequently enough (i.e. the fd is not kept open more time
>>> than it takes to have more than Quorum bricks disconnected) then there's no
>>> problem. The problem can only appear on applications that open files for a
>>> long time and also use posix locks. In this case, the only good solution I
>>> see is to retake the locks on brick reconnection.
>>>
>>
>> Agree. But lock-healing should be done only by HA layers like AFR/EC as
>> only they know whether there are enough online bricks to have prevented any
>> conflicting lock. Protocol/client itself doesn't have enough information to
>> do that. If its a plain distribute, I don't see a way to heal locks without
>> loosing the property of exclusivity of locks.
>>
>> What I proposed is a short term solution. mid to long term solution
>> should be lock healing feature implemented in AFR/EC. In fact I had this
>> conversation with +Karampuri, Pranith <pkarampu at redhat.com> before
>> posting this msg to ML.
>>
>>
>>>
>>>> However, this use case is not affected if the application don't acquire
>>>> any POSIX locks. So, I am interested in knowing
>>>> * whether your use cases use POSIX locks?
>>>> * Is it feasible for your application to re-open fds and re-acquire
>>>> locks on seeing EBADFD errors?
>>>>
>>>
>>> I think that many applications are not prepared to handle that.
>>>
>>
>> I too suspected that and in fact not too happy with the solution. But
>> went ahead with this mail as I heard implementing lock-heal  in AFR will
>> take time and hence there are no alternative short term solutions.
>>
>
> Also failing loudly is preferred to silently dropping locks.
>

Yes. Silently dropping locks can cause corruption, which is worse. However
causing application failures doesn't improve user experience either.

Unfortunately I'm not aware of any other short term solution right now.


>
>>
>>
>>> Xavi
>>>
>>>
>>>>
>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7
>>>>
>>>> regards,
>>>> Raghavendra
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20190327/f1659415/attachment.html>


More information about the Gluster-devel mailing list