[Gluster-devel] fstat problems when killing with stat prefetch turned on

Thu May 4 13:29:09 UTC 2017

Sorry, missing lines from the attachment.

On 05/04/2017 03:24 PM, Miklós Fokin wrote:
> Hello,
>
> I seem to have discovered what caused half of the problem.
> I did update the bug report with a more detailed description, but the 
> short version is that the attached diff solves the issue when we get 
> an fstat with a size of 0 after killing a brick (not letting the first 
> update to fsync be from an arbiter).
> My question is: should I make a review about it or should further 
> needed changes be investigated first?
>
> Best regards,
> Miklós
>
>
> On 04/26/2017 12:58 PM, Miklós Fokin wrote:
>> Thanks for the response.
>> We didn't have the options set that the first two reviews were about.
>> The third was about changes to performance.readdir-ahead.
>> I turned this feature off today with prefetch being turned on on my 
>> computer, and the bug still appeared, so I would think that the 
>> commit would not fix it either.
>>
>> Best regards,
>> Miklós
>>
>>
>> On 04/25/2017 01:26 PM, Raghavendra Gowdappa wrote:
>>> Recently we had worked on some patches to ensure correct stats are 
>>> returned.
>>>
>>> https://review.gluster.org/15759
>>> https://review.gluster.org/15659
>>> https://review.gluster.org/16419
>>>
>>> Referring to these patches and bugs associated with them might give 
>>> you some insight into the nature of the problem. The major culprit 
>>> was interaction between readdir-ahead and stat-prefetch. So, the 
>>> issue you are seeing might be addressed by these patches.
>>>
>>> ----- Original Message -----
>>>> From: "Miklós Fokin" <miklos.fokin at appeartv.com>
>>>> To: gluster-devel at gluster.org
>>>> Sent: Tuesday, April 25, 2017 3:42:52 PM
>>>> Subject: [Gluster-devel] fstat problems when killing with stat 
>>>> prefetch    turned on
>>>>
>>>> Hello,
>>>>
>>>> I tried reproducing the problem that Mateusz Slupny was experiencing
>>>> before (stat returning bad st_size value on self-healing) on my own
>>>> computer with only 3 bricks (one being an arbiter) on 3.10.0.
>>>> The result with such a small setup was that the bug appeared both on
>>>> killing and during the self-healing process, but only rarely (once in
>>>> hundreds of tries) and only with performance.stat-prefetch turned on.
>>>> This might be a completely different issue as on the setup Matt was
>>>> using, he could reproduce it with the mentioned option being off, it
>>>> always happened but only during recovery, not after killing.
>>>> I did submit a bug report about this:
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1444892.
>>>>
>>>> The problem is as Matt wrote is that this causes data corruption if 
>>>> one
>>>> is to use the returned size on writing.
>>>> Could I get some pointers as to what parts of the gluster code I 
>>>> should
>>>> be looking at to figure out what the problem might be?
>>>>
>>>> Thanks in advance,
>>>> Miklós
>>>>
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170504/8a3e7599/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fsync_update_arbiter_check.diff
Type: text/x-patch
Size: 927 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170504/8a3e7599/attachment-0001.bin>