[Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume

Soumya Koduri skoduri at redhat.com
Mon Jul 21 16:37:57 UTC 2014



On 07/21/2014 07:33 PM, Anders Blomdell wrote:
> On 2014-07-21 14:36, Soumya Koduri wrote:
>>
>>
>> On 07/21/2014 05:35 PM, Xavier Hernandez wrote:
>>> On Monday 21 July 2014 13:53:19 Anders Blomdell wrote:
>>>> On 2014-07-21 13:49, Pranith Kumar Karampuri wrote:
>>>>> On 07/21/2014 05:17 PM, Anders Blomdell wrote:
>>>>>> On 2014-07-21 13:36, Pranith Kumar Karampuri wrote:
>>>>>>> On 07/21/2014 05:03 PM, Anders Blomdell wrote:
>>>>>>>> On 2014-07-19 04:43, Pranith Kumar Karampuri wrote:
>>>>>>>>> On 07/18/2014 07:57 PM, Anders Blomdell wrote:
>>>>>>>>>> During testing of a 3*4 gluster (from master as of
>>>>>>>>>> yesterday), I encountered>>>>>> two major
>>>>>>>>>> weirdnesses: 1. A 'rm -rf <some_dir>' needed several
>>>>>>>>>> invocations to finish, each time
>>>>>>>>>>
>>>>>>>>>> reporting a number of lines like these: rm: cannot
>>>>>>>>>> remove ‘a/b/c/d/e/f’: Directory not empty
>>>>>>>>>>
>>>>>>>>>> 2. After having successfully deleted all files from
>>>>>>>>>> the volume,
>>>>>>>>>>
>>>>>>>>>> i have a single directory that is duplicated in
>>>>>>>>>> gluster-fuse,
>>>>>>>>>>
>>>>>>>>>> like this: # ls -l /mnt/gluster
>>>>>>>>>>
>>>>>>>>>> total 24 drwxr-xr-x 2 root root 12288 18 jul 16.17
>>>>>>>>>> work2/ drwxr-xr-x 2 root root 12288 18 jul 16.17
>>>>>>>>>> work2/
>>>>>>>>>>
>>>>>>>>>> any idea on how to debug this issue?
>>>>>>>>>
>>>>>>>>> What are the steps to recreate? We need to first find
>>>>>>>>> what lead to this. Then probably which xlator leads to
>>>>>>>>> this.>>>>
>>>>>>>> Would a pcap network dump + the result from 'tar -c
>>>>>>>> --xattrs /brick/a/gluster' on all the hosts before and
>>>>>>>> after the following commands are run be of>>>> any help:
>>>>>>>> # mount -t glusterfs gluster-host:/test /mnt/gluster #
>>>>>>>> mkdir /mnt/gluster/work2 ; # ls /mnt/gluster work2
>>>>>>>> work2
>>>>>>>
>>>>>>> Are you using ext4?
>>>>>>
>>>>>> Yes
>>>>>>
>>>>>>> Is this on latest upstream?
>>>>>>
>>>>>> kernel is 3.14.9-200.fc20.x86_64, if that is latest upstream,
>>>>>> I don't know. gluster is from master as of end of last week
>>>>>>
>>>>>> If there are known issues with ext4 i could switch to
>>>>>> something else, but during the last 15 years or so, I have
>>>>>> had very little problems with ext2/3/4, thats the reason for
>>>>>> choosing it.
>>>>>
>>>>> The problem is afrv2 + dht + ext4 offsets. Soumya and Xavier
>>>>> were working on it last I heard(CCed)
>>>> Should I switch to xfs or be guinea pig for testing a fixed
>>>> version?
>>>
>>> There is a patch for this [1]. It should work for this particular
>>> configuration, but there are some limitations in the general case,
>>> specially for future scalability, that we tried to solve but it
>>> seems quite difficult. Maybe Soumya has newer information about
>>> that.
>>>
>>> XFS should work without problems if you need it.
> As long as it does not start using 64-bit offsets as well :-)
> Sounds like I should go for XFS right now? Tell me if you need testers.

Sure. yes :)
  XFS doesn't have this issue. It still seems to use 32-bit offset.

>
>> Thats right. This patch works fine with the current supported/limited
>> configuration. But we need a much more generalized approach or maybe
>> a design change as Xavi had suggested to make it more scalable.
> Is that the patch in [1] you are referring to?
yes [1] is a possible solution for the current issue. This change is 
still under review.

>
>> The problem in short -- 'ext4' uses large offsets/the bits which even
>> GlusterFS may need to store subvol id along with the offset. This
>> could be end up in few offsets being modified when given back to the
>> filesystem resulting in missing files etc. Avati has proposed a
>> solution to overcome this issue based on the assumption that "both
>> EXT4/XFS are tolerant in terms of the accuracy of the value presented
>> back in seekdir(). i.e, a seekdir(val) actually seeks to the entry
>> which has the "closest" true offset. For more info, please check
>> http://review.gluster.org/#/c/4711/.
> This is AFAICT already in the version that failed, as commit
> e0616e9314c8323dc59fca7cad6972f08d72b936
>
That's right. This change was done by Anand Avati in the dht translator 
and it works as expected had AFR not come into picture. When the same 
change was done in the AFR(v2) translator, it resulted in the loss of 
brick-id.
[1] is a potential fix for now. Had to change the transform-logic in 
these two translators.
But as Xavi had mentioned, our goal is to come up with a solution, to 
make it uniform across all the translators without any loss of subvol-id 
and keep the offset gaps to the minimal.

>> But this offset gap widens as and when more translators (which need
>> to store subvol-id) get added to the gluster stack which may
>> eventually result in the similar issue which you are facing now.
>>
>> Thanks, Soumya
>>
>>> Xavi
>>>
>>> [1] http://review.gluster.org/8201/
>>>
> Thanks!
>
> /Anders
>


More information about the Gluster-devel mailing list