[Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume

Mon Jul 21 12:36:24 UTC 2014

On 07/21/2014 05:35 PM, Xavier Hernandez wrote:
> On Monday 21 July 2014 13:53:19 Anders Blomdell wrote:
>> On 2014-07-21 13:49, Pranith Kumar Karampuri wrote:
>>> On 07/21/2014 05:17 PM, Anders Blomdell wrote:
>>>> On 2014-07-21 13:36, Pranith Kumar Karampuri wrote:
>>>>> On 07/21/2014 05:03 PM, Anders Blomdell wrote:
>>>>>> On 2014-07-19 04:43, Pranith Kumar Karampuri wrote:
>>>>>>> On 07/18/2014 07:57 PM, Anders Blomdell wrote:
>>>>>>>> During testing of a 3*4 gluster (from master as of yesterday), I
>>>>>>>> encountered>>>>>>
>>>>>>>> two major weirdnesses:
>>>>>>>>       1. A 'rm -rf <some_dir>' needed several invocations to finish,
>>>>>>>>       each time
>>>>>>>>
>>>>>>>>          reporting a number of lines like these:
>>>>>>>>                rm: cannot remove â€˜a/b/c/d/e/fâ€™: Directory not empty
>>>>>>>>
>>>>>>>>       2. After having successfully deleted all files from the volume,
>>>>>>>>
>>>>>>>>          i have a single directory that is duplicated in gluster-fuse,
>>>>>>>>
>>>>>>>>          like this:
>>>>>>>>        # ls -l /mnt/gluster
>>>>>>>>
>>>>>>>>             total 24
>>>>>>>>             drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/
>>>>>>>>             drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/
>>>>>>>>
>>>>>>>> any idea on how to debug this issue?
>>>>>>>
>>>>>>> What are the steps to recreate? We need to first find what lead to
>>>>>>> this. Then probably which xlator leads to this.>>>>
>>>>>> Would a pcap network dump + the result from 'tar -c --xattrs
>>>>>> /brick/a/gluster' on all the hosts before and after the following
>>>>>> commands are run be of>>>>
>>>>>> any help:
>>>>>>      # mount -t glusterfs gluster-host:/test /mnt/gluster
>>>>>>      # mkdir /mnt/gluster/work2 ;
>>>>>>      # ls /mnt/gluster
>>>>>>      work2  work2
>>>>>
>>>>> Are you using ext4?
>>>>
>>>> Yes
>>>>
>>>>> Is this on latest upstream?
>>>>
>>>> kernel is 3.14.9-200.fc20.x86_64, if that is latest upstream, I don't
>>>> know.
>>>> gluster is from master as of end of last week
>>>>
>>>> If there are known issues with ext4 i could switch to something else, but
>>>> during the last 15 years or so, I have had very little problems with
>>>> ext2/3/4, thats the reason for choosing it.
>>>
>>> The problem is afrv2 + dht + ext4 offsets. Soumya and Xavier were working
>>> on it last I heard(CCed)
>> Should I switch to xfs or be guinea pig for testing a fixed version?
>
> There is a patch for this [1]. It should work for this particular
> configuration, but there are some limitations in the general case, specially
> for future scalability, that we tried to solve but it seems quite difficult.
> Maybe Soumya has newer information about that.
>
> XFS should work without problems if you need it.
>

Thats right. This patch works fine with the current supported/limited 
configuration. But we need a much more generalized approach or maybe a 
design change as Xavi had suggested to make it more scalable.

The problem in short --
'ext4' uses large offsets/the bits which even GlusterFS may need to 
store subvol id along with the offset. This could be end up in few 
offsets being modified when given back to the filesystem resulting in 
missing files etc.
Avati has proposed a solution to overcome this issue based on the 
assumption that "both EXT4/XFS are tolerant in terms of the accuracy of 
the value presented back in seekdir(). i.e, a seekdir(val) actually 
seeks to the entry which has the "closest" true offset. For more info, 
please check http://review.gluster.org/#/c/4711/.

But this offset gap widens as and when more translators (which need to 
store subvol-id) get added to the gluster stack which may eventually 
result in the similar issue which you are facing now.

Thanks,
Soumya

> Xavi
>
> [1] http://review.gluster.org/8201/
>