[Gluster-devel] fstat problems when killing with stat prefetch turned on

Miklós Fokin miklos.fokin at appeartv.com
Thu May 4 13:24:40 UTC 2017


Hello,

I seem to have discovered what caused half of the problem.
I did update the bug report with a more detailed description, but the 
short version is that the attached diff solves the issue when we get an 
fstat with a size of 0 after killing a brick (not letting the first 
update to fsync be from an arbiter).
My question is: should I make a review about it or should further needed 
changes be investigated first?

Best regards,
Miklós


On 04/26/2017 12:58 PM, Miklós Fokin wrote:
> Thanks for the response.
> We didn't have the options set that the first two reviews were about.
> The third was about changes to performance.readdir-ahead.
> I turned this feature off today with prefetch being turned on on my 
> computer, and the bug still appeared, so I would think that the commit 
> would not fix it either.
>
> Best regards,
> Miklós
>
>
> On 04/25/2017 01:26 PM, Raghavendra Gowdappa wrote:
>> Recently we had worked on some patches to ensure correct stats are 
>> returned.
>>
>> https://review.gluster.org/15759
>> https://review.gluster.org/15659
>> https://review.gluster.org/16419
>>
>> Referring to these patches and bugs associated with them might give 
>> you some insight into the nature of the problem. The major culprit 
>> was interaction between readdir-ahead and stat-prefetch. So, the 
>> issue you are seeing might be addressed by these patches.
>>
>> ----- Original Message -----
>>> From: "Miklós Fokin" <miklos.fokin at appeartv.com>
>>> To: gluster-devel at gluster.org
>>> Sent: Tuesday, April 25, 2017 3:42:52 PM
>>> Subject: [Gluster-devel] fstat problems when killing with stat 
>>> prefetch    turned on
>>>
>>> Hello,
>>>
>>> I tried reproducing the problem that Mateusz Slupny was experiencing
>>> before (stat returning bad st_size value on self-healing) on my own
>>> computer with only 3 bricks (one being an arbiter) on 3.10.0.
>>> The result with such a small setup was that the bug appeared both on
>>> killing and during the self-healing process, but only rarely (once in
>>> hundreds of tries) and only with performance.stat-prefetch turned on.
>>> This might be a completely different issue as on the setup Matt was
>>> using, he could reproduce it with the mentioned option being off, it
>>> always happened but only during recovery, not after killing.
>>> I did submit a bug report about this:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1444892.
>>>
>>> The problem is as Matt wrote is that this causes data corruption if one
>>> is to use the returned size on writing.
>>> Could I get some pointers as to what parts of the gluster code I should
>>> be looking at to figure out what the problem might be?
>>>
>>> Thanks in advance,
>>> Miklós
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fsync_update_arbiter_check.diff
Type: text/x-patch
Size: 854 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170504/94f98a06/attachment.bin>


More information about the Gluster-devel mailing list