[Gluster-users] file corruption on Gluster 3.5.1 and Ubuntu 14.04

Pranith Kumar Karampuri pkarampu at redhat.com
Thu Sep 11 18:15:53 UTC 2014


On 09/11/2014 11:38 PM, mike wrote:
> Any more to this thread? I don't mean to nag, but this seems like a 
> pretty serious issue.
Most probably the issue is in write-behind according to my tests. The 
people who know that xlator are Avati/Raghavendra G/Niels
CCed all of them

Pranith
>
> How can I help?
>
> On Sep 7, 2014, at 9:51 AM, mike <mike at luminatewireless.com 
> <mailto:mike at luminatewireless.com>> wrote:
>
>> I don't think I have these enabled. How can I confirm that?
>>
>> On Sep 7, 2014, at 12:57 AM, Anand Avati <avati at gluster.org 
>> <mailto:avati at gluster.org>> wrote:
>>
>>> The only reason O_APPEND gets stripped on the server side, is 
>>> because of one of the following xlators:
>>>
>>> - stripe
>>> - quiesce
>>> - crypt
>>>
>>> If you have any of these, please try unloading/reconfiguring without 
>>> these features and try again.
>>>
>>> Thanks
>>>
>>>
>>> On Sat, Sep 6, 2014 at 3:31 PM, mike <mike at luminatewireless.com 
>>> <mailto:mike at luminatewireless.com>> wrote:
>>>
>>>     I was able to narrow it down to smallish python script.
>>>
>>>     I've attached that to the bug.
>>>
>>>     https://bugzilla.redhat.com/show_bug.cgi?id=1138970
>>>
>>>
>>>     On Sep 6, 2014, at 1:05 PM, Justin Clift <justin at gluster.org
>>>     <mailto:justin at gluster.org>> wrote:
>>>
>>>     > Thanks Mike, this is good stuff. :)
>>>     >
>>>     > + Justin
>>>     >
>>>     >
>>>     > On 06/09/2014, at 8:19 PM, mike wrote:
>>>     >> I upgraded the client to Gluster 3.5.2, but there is no
>>>     difference.
>>>     >>
>>>     >> The bug is almost certainly in the Fuse client. If I remount
>>>     the filesystem with NFS, the problem is no longer observable.
>>>     >>
>>>     >> I spent a little time looking through the xlator/fuse-bridge
>>>     to see where the offsets are coming from, but I'm really not
>>>     familiar enough with the code, so it is slow going.
>>>     >>
>>>     >> Unfortunately, I'm still having trouble reproducing this in a
>>>     python script that could be readily attached to a bug report.
>>>     >>
>>>     >> I'll take a crack at that again, but I will a file a bug
>>>     anyway for completeness.
>>>     >>
>>>     >> On Sep 5, 2014, at 7:10 PM, mike <mike at luminatewireless.com
>>>     <mailto:mike at luminatewireless.com>> wrote:
>>>     >>
>>>     >>> I have narrowed down the source of the bug.
>>>     >>>
>>>     >>> Here is an strace of glusterfsd
>>>     http://fpaste.org/131455/40996378/
>>>     >>>
>>>     >>> The first line represents a write that does *not* make it
>>>     into the underlying file.
>>>     >>>
>>>     >>> The last line is the write that stomps the earlier write.
>>>     >>>
>>>     >>> As I said, the client file is opened in O_APPEND mode, but
>>>     on the glusterfsd side, the file is just O_CREAT|O_WRONLY. The
>>>     means the offsets to pwrite() need to be valid.
>>>     >>>
>>>     >>> I correlated this to a tcpdump I took and I can see that in
>>>     fact, the RPCs being sent have the wrong offset.  Interestingly,
>>>     glusterfs.write-is-append = 0, which I wouldn't have expected.
>>>     >>>
>>>     >>> I think the bug lies in the glusterfs fuse client.
>>>     >>>
>>>     >>> As to your question about Gluster 3.5.2, I may be able to do
>>>     that if I am unable to find the bug in the source.
>>>     >>>
>>>     >>> -Mike
>>>     >>>
>>>     >>> On Sep 5, 2014, at 6:16 PM, Justin Clift <justin at gluster.org
>>>     <mailto:justin at gluster.org>> wrote:
>>>     >>>
>>>     >>>> On 06/09/2014, at 12:10 AM, mike wrote:
>>>     >>>>> I have found that the O_APPEND flag is key to this failure
>>>     - I had overlooked that flag when reading the strace and trying
>>>     to cobble up a minimal reproduction.
>>>     >>>>>
>>>     >>>>> I now have a small pair of python scripts that can
>>>     reliably reproduce this failure.
>>>     >>>>
>>>     >>>>
>>>     >>>> As a thought, is there a reasonable way you can test this
>>>     on GlusterFS 3.5.2?
>>>     >>>>
>>>     >>>> There were some important bug fixes in 3.5.2 (from 3.5.1).
>>>     >>>>
>>>     >>>> Note I'm not saying yours is one of them, I'm just asking
>>>     if it's
>>>     >>>> easy to test and find out. :)
>>>     >>>>
>>>     >>>> Regards and best wishes,
>>>     >>>>
>>>     >>>> Justin Clift
>>>     >>>>
>>>     >>>> --
>>>     >>>> GlusterFS - http://www.gluster.org <http://www.gluster.org/>
>>>     >>>>
>>>     >>>> An open source, distributed file system scaling to several
>>>     >>>> petabytes, and handling thousands of clients.
>>>     >>>>
>>>     >>>> My personal twitter: twitter.com/realjustinclift
>>>     <http://twitter.com/realjustinclift>
>>>     >>>>
>>>     >>>
>>>     >>
>>>     >
>>>     > --
>>>     > GlusterFS - http://www.gluster.org <http://www.gluster.org/>
>>>     >
>>>     > An open source, distributed file system scaling to several
>>>     > petabytes, and handling thousands of clients.
>>>     >
>>>     > My personal twitter: twitter.com/realjustinclift
>>>     <http://twitter.com/realjustinclift>
>>>     >
>>>
>>>     _______________________________________________
>>>     Gluster-users mailing list
>>>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>     http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140911/1456be1d/attachment.html>


More information about the Gluster-users mailing list