[Gluster-devel] Need review for client-reopen changes

Wed Jan 23 05:17:41 UTC 2013

On 01/09/2013 03:51 PM, Pranith Kumar K wrote:
> On 01/07/2013 04:46 PM, Raghavendra Gowdappa wrote:
>> Pranith,
>>
>> This comment is on the second patch. While the implementation looks 
>> fine, I've some concerns related to the idea itself. Consider 
>> following situation with a replicate volume of two subvolumes:
>>
>> 1. process 1 (p1) acquires a mandatory lock.
>> 2. stop first server, replace disk
>> 3. reopen of fd opened by p1 fail (since file is not present).
>> 4. self heal completes. parent is notified that child is up. However 
>> fd is not opened yet.
>> 5. now, there is a possibility that another process p2 can 
>> successfully write to another fd opened on the same file (on 
>> server1), since lock (from p1) is not yet acquired on server1.
>>
>> Similar situation can arise even without this patch, but only when p1 
>> and p2 are not running on same mount point. With this patch it can 
>> happen even on single mount point too. I am not sure whether we can 
>> ignore this corner case. Others, please let us know your opinion on 
>> this.
>>
>> regards,
>> Raghavendra.
>>
>> ----- Original Message -----
>>> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>>> To: "devel" <gluster-devel at nongnu.org>
>>> Cc: "Raghavendra Gowdappa" <rgowdapp at redhat.com>, "Krishnan 
>>> Parthasarathi" <kparthas at redhat.com>, "Jeff Darcy"
>>> <jdarcy at redhat.com>, "Amar Tumballi" <atumball at redhat.com>
>>> Sent: Monday, January 7, 2013 10:15:36 AM
>>> Subject: Need review for client-reopen changes
>>>
>>> hi,
>>> http://review.gluster.org/#change,4357
>>> http://review.gluster.org/#change,4358
>>>
>>> are the changes I made to handle re-opens of files in the case where
>>> a disk is replaced while a brick is offline. The idea is to attempt
>>> re-opens after self-heal completes and the file could be opened.
>>> With these changes readv/fxattrop/writev/findelk for fds with
>>> remote-fd -1 are attempted using anon-fds and if the fop succeeds
>>> then the re-open is attempted for every 1024th success. 1024 is an
>>> arbitrary number I used. The re-open of files could fail because of
>>> posix lock re-acquisition failure, that is the reason re-opens are
>>> attempted periodically (for every 1024 successful fops on that fd).
>>>
>>> I think the re-attempt logic could be better.
>>> For instance, we can attempt re-open on the first success on anon-fd
>>> instead of waiting till 1024th success and if this re-open fails we
>>> could fall-back on 'periodic attempts' i.e. for every 1024 successes
>>> on the anon-fd.
>>>
>>> Let me know your thoughts.
>>>
>>> Pranith
>>>
> hi,
>          Appreciate everyone for the code-reviews. I will make the 
> changes suggested to the code. Before that, Do you have any comments 
> on re-open attempts? Are you guys ok with waiting for 1024 successes 
> every time? is 1024 ok? or should it be more. I am not sure how to 
> arrive at a good number for this actually.
>
> Pranith.
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
Hi,
       I posted the patches for review with test cases. Please review 
the following:
http://review.gluster.com/4387
http://review.gluster.com/4386
http://review.gluster.com/4358
http://review.gluster.com/4357

Pranith.