[Gluster-users] Hundreds of duplicate files
    Olav Peeters 
    opeeters at gmail.com
       
    Sun Feb 22 20:27:35 UTC 2015
    
    
  
Hi Joe,
I tried deleting both 0-bit versions of one of the dublicated file, like so:
[root at gluster01 ~]# getfattr -m . -d -e hex 
/export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file: 
export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-34=0x000000000000000000000000
trusted.afr.sr_vol01-client-35=0x000000000000000000000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
[root at gluster03 ~]# getfattr -m . -d -e hex 
/export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file: 
export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-34=0x000000000000000000000000
trusted.afr.sr_vol01-client-35=0x000000000000000000000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
[root at gluster01 ~]# rm -f 
/export/brick14gfs01/.glusterfs/ae/fd/aefd1845-0841-4a8f-8408-f1ab8aa7a417 
/export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
[root at gluster03 ~]# rm -f 
/export/brick14gfs03/.glusterfs/ae/fd/aefd1845-0841-4a8f-8408-f1ab8aa7a417 
/export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
An "ls" showed that this was successful..
10 minutes later these deleted files are back (presumably after a 
self-heal had passed):
[root at gluster01 ~]# find /export/*/27* -size 0 -name 
'3009f448-cf6e-413f-baec-c3b9f0cf9d72*' -exec ls -la {} \;
-rw-r--r--. 2 root root 0 Feb 18 00:51 
/export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
Notice how the last modification date is the same as the deleted files 
(the moment I rebooted the machines and the duplication-misery was 
triggered). Do you have any idea what this means?
About you blog (http://joejulian.name/blog/dht-misses-are-expensive/), I 
don't quiet understand how I can use the hash:
[root at gluster01 ~]# python gf_dm_hash.py 
3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
0x2b3634edL
.. to locate/identify the good replica pair of that file. I also still 
have these versions (with actual data):
[root at gluster01 ~]# getfattr -m . -d -e hex 
/export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file: 
export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-32=0x000000000000000000000000
trusted.afr.sr_vol01-client-33=0x000000000000000000000000
trusted.afr.sr_vol01-client-34=0x000000000000000000000000
trusted.afr.sr_vol01-client-35=0x000000010000000100000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
[root at gluster02 ~]# getfattr -m . -d -e hex 
/export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file: 
export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-32=0x000000000000000000000000
trusted.afr.sr_vol01-client-33=0x000000000000000000000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
[root at gluster02 ~]# getfattr -m . -d -e hex 
/export/brick15gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file: 
export/brick15gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-40=0x000000000000000000000000
trusted.afr.sr_vol01-client-41=0x000000000000000000000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
[root at gluster03 ~]# getfattr -m . -d -e hex 
/export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file: 
export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-40=0x000000000000000000000000
trusted.afr.sr_vol01-client-41=0x000000000000000000000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
My bet would be that I can delete the first two of these files.
For the rest they look identical:
[root at gluster01 ~]# ls -al 
/export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
-rw-r--r--. 2 root root 44332659200 Feb 17 23:55 
/export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
[root at gluster02 ~]# ls -al 
/export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
-rw-r--r--. 2 root root 44332659200 Feb 17 23:55 
/export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
[root at gluster02 ~]# ls -al 
/export/brick15gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
-rw-r--r--. 2 root root 44332659200 Feb 17 23:55 
/export/brick15gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
[root at gluster03 ~]# ls -al 
/export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
-rw-r--r--. 2 root root 44332659200 Feb 17 23:55 
/export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
Cheers,
Olav
On 21/02/15 01:37, Olav Peeters wrote:
> It look even worse than I had feared.. :-(
> This really is a crazy bug.
>
> If I understand you correctly, the only sane pairing of the xattrs is 
> of the two 0-bit files, since this is the full list of bricks:
>
> root at gluster01 ~]# gluster volume info
>
> Volume Name: sr_vol01
> Type: Distributed-Replicate
> Volume ID: c6d6147e-2d91-4d98-b8d9-ba05ec7e4ad6
> Status: Started
> Number of Bricks: 21 x 2 = 42
> Transport-type: tcp
> Bricks:
> Brick1: gluster01:/export/brick1gfs01
> Brick2: gluster02:/export/brick1gfs02
> Brick3: gluster01:/export/brick4gfs01
> Brick4: gluster03:/export/brick4gfs03
> Brick5: gluster02:/export/brick4gfs02
> Brick6: gluster03:/export/brick1gfs03
> Brick7: gluster01:/export/brick2gfs01
> Brick8: gluster02:/export/brick2gfs02
> Brick9: gluster01:/export/brick5gfs01
> Brick10: gluster03:/export/brick5gfs03
> Brick11: gluster02:/export/brick5gfs02
> Brick12: gluster03:/export/brick2gfs03
> Brick13: gluster01:/export/brick3gfs01
> Brick14: gluster02:/export/brick3gfs02
> Brick15: gluster01:/export/brick6gfs01
> Brick16: gluster03:/export/brick6gfs03
> Brick17: gluster02:/export/brick6gfs02
> Brick18: gluster03:/export/brick3gfs03
> Brick19: gluster01:/export/brick8gfs01
> Brick20: gluster02:/export/brick8gfs02
> Brick21: gluster01:/export/brick9gfs01
> Brick22: gluster02:/export/brick9gfs02
> Brick23: gluster01:/export/brick10gfs01
> Brick24: gluster03:/export/brick10gfs03
> Brick25: gluster01:/export/brick11gfs01
> Brick26: gluster03:/export/brick11gfs03
> Brick27: gluster02:/export/brick10gfs02
> Brick28: gluster03:/export/brick8gfs03
> Brick29: gluster02:/export/brick11gfs02
> Brick30: gluster03:/export/brick9gfs03
> Brick31: gluster01:/export/brick12gfs01
> Brick32: gluster02:/export/brick12gfs02
> Brick33: gluster01:/export/brick13gfs01
> Brick34: gluster02:/export/brick13gfs02
> Brick35: gluster01:/export/brick14gfs01
> Brick36: gluster03:/export/brick14gfs03
> Brick37: gluster01:/export/brick15gfs01
> Brick38: gluster03:/export/brick15gfs03
> Brick39: gluster02:/export/brick14gfs02
> Brick40: gluster03:/export/brick12gfs03
> Brick41: gluster02:/export/brick15gfs02
> Brick42: gluster03:/export/brick13gfs03
>
>
> The two 0-bit files are on brick 35 and 36 as the getfattr correctly 
> lists.
>
> Another sane pairing could be this (if the first file did not also 
> refer to client-34 and client-35):
>
> [root at gluster01 ~]# getfattr -m . -d -e hex 
> /export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> getfattr: Removing leading '/' from absolute path names
> # file: 
> export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.sr_vol01-client-32=0x000000000000000000000000
> trusted.afr.sr_vol01-client-33=0x000000000000000000000000
> trusted.afr.sr_vol01-client-34=0x000000000000000000000000
> trusted.afr.sr_vol01-client-35=0x000000010000000100000000
> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>
> [root at gluster02 ~]# getfattr -m . -d -e hex 
> /export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> getfattr: Removing leading '/' from absolute path names
> # file: 
> export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.sr_vol01-client-32=0x000000000000000000000000
> trusted.afr.sr_vol01-client-33=0x000000000000000000000000
> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>
> But why is the security.selinux hash different?
>
>
> You mention hostname changes..
> I noticed that if I do a listing of available shared storages on one 
> of the XenServer I get:
> uuid ( RO)                : 272b2366-dfbf-ad47-2a0f-5d5cc40863e3
>           name-label ( RW): gluster_store
>     name-description ( RW): NFS SR [gluster01.irceline.be:/sr_vol01]
>                 host ( RO): <shared>
>                 type ( RO): nfs
>         content-type ( RO):
>
>
> if I do normal general linux:
> [root at same_story_on_both_xenserver ~]# mount
> gluster02.irceline.be:/sr_vol01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3 
> on /var/run/sr-mount/272b2366-dfbf-ad47-2a0f-5d5cc40863e3 type nfs 
> (rw,soft,timeo=133,retrans=2147483647,tcp,noac,addr=192.168.0.72)
>
> Originally the mount was done on gluster01 (ip 192.168.0.71) as the 
> name-description of the xe sr-list indicates..
> It is as though when gluster01 was not available for a couple of 
> minutes, the NFS mount internally was somehow automatically 
> reconfigured to gluster02, but NFS cannot do this as far as I know 
> (unless there is some fail-over mechanism - I never configured this). 
> There also is no load-balancing between client and server.
> If gluster01 is not available, the gluster volume should not have been 
> available, end of story.. But from perspective of a client the NFS 
> could be to any one of the three gluster nodes. The client should see 
> exactly the same data..
>
> So a rebalance in the current state could do more harm than good?
> I launched a second rebalance in the hope that the system would mend 
> itself after all...
>
> Thanks a million for your support in this darkest hour of my time as a 
> glusterfs user :-)
>
> Cheers,
> Olav
>
> On 20/02/15 23:10, Joe Julian wrote:
>>
>> On 02/20/2015 01:47 PM, Olav Peeters wrote:
>>> Thanks Joe,
>>> for the answers!
>>>
>>> I was not clear enough about the set up apparently.
>>> The Gluster cluster consist of 3 nodes with each 14 bricks. The 
>>> bricks are formatted as xfs, mounted locally as xfs. There is one 
>>> volume, type: Distributed-Replicate (replica 2). The configuration 
>>> is so that bricks are mirrored on two different nodes.
>>>
>>> The NFS mount which was alive but not used during reboot when the 
>>> problem started are from clients (2 XenServer machines configured as 
>>> a pool - a shared storage set-up). The comparisons I give below are 
>>> between (other) clients mounting via either glusterfs or NFS. 
>>> Similar problem with the exception that the first listing (via ls) 
>>> after a fresh mount via NFS actually does find the files with data. 
>>> A second listing only finds the 0 bit file with the same name.
>>>
>>> So all the 0bit files in mode 0644 can be safely removed?
>> Probably? Is it likely that you have any empty files? I don't know.
>>>
>>> Why do I see three files with the same name (and modification 
>>> timestamp etc.) via either a glusterfs or NFS mount from a client? 
>>> Deleting one of the three will probably not solve the issue either.. 
>>> this seems to me an indexing issue in the gluster cluster.
>> Very good question. I don't know. The xattrs tell a strange story 
>> that I haven't seen before. One legit file shows sr_vol01-client-32 
>> and 33. This would be normal, assuming the filename hash would put it 
>> on that replica pair (we can't tell since the rebalance has changed 
>> the hash map). Another file shows sr_vol01-client-32, 33, 34, and 35 
>> with pending updates scheduled for 35. I have no idea which brick 
>> this is (see "gluster volume info" and map the digits (35) with the 
>> bricks offset by 1 (client-35 is brick 36). That last one is on 40,41.
>>
>> I don't know how these files all got on different replica sets. My 
>> speculations include hostname changes, long-running net-split 
>> conditions with different dht maps (failed rebalances), moved bricks, 
>> load balancers between client and server, mercury in retrograde (lol)...
>>
>>> How do I get Gluster to replicate the files correctly, only 2 
>>> versions of the same file, not three, and on two bricks on different 
>>> machines?
>>>
>>
>> Identify which replica is correct by using the little python script 
>> at http://joejulian.name/blog/dht-misses-are-expensive/ to get the 
>> hash of the filename. Examine the dht map to see which replica pair 
>> *should* have that hash and remove the others (and their hardlink in 
>> .glusterfs). There is no 1-liner that's going to do this. I would 
>> probably script the logic in python, have it print out what it was 
>> going to do, check that for sanity and, if sane, execute it.
>>
>> But mostly figure out how Bricks 32 and/or 33 can become 34 and/or 35 
>> and/or 40 and/or 41. That's the root of the whole problem.
>>
>>> Cheers,
>>> Olav
>>>
>>>
>>>
>>>
>>> On 20/02/15 21:51, Joe Julian wrote:
>>>>
>>>> On 02/20/2015 12:21 PM, Olav Peeters wrote:
>>>>> Let's take one file (3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd) as 
>>>>> an example...
>>>>> On the 3 nodes where all bricks are formatted as XFS and mounted 
>>>>> in /export and 272b2366-dfbf-ad47-2a0f-5d5cc40863e3 is the 
>>>>> mounting point of a NFS shared storage connection from XenServer 
>>>>> machines:
>>>> Did I just read this correctly? Your bricks are NFS mounts? ie, 
>>>> GlusterFS Client <-> GlusterFS Server <-> NFS <-> XFS
>>>>>
>>>>> [root at gluster01 ~]# find 
>>>>> /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name '300*' -exec 
>>>>> ls -la {} \;
>>>>> -rw-r--r--. 2 root root 44332659200 Feb 17 23:55 
>>>>> /export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> Supposedly, this is the actual file.
>>>>> -rw-r--r--. 2 root root 0 Feb 18 00:51 
>>>>> /export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> This is not a linkfile. Note it's mode 0644. How it got there with 
>>>> those permissions would be a matter of history and would require 
>>>> information that's probably lost.
>>>>>
>>>>> root at gluster02 ~]# find 
>>>>> /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name '300*' -exec 
>>>>> ls -la {} \;
>>>>> -rw-r--r--. 2 root root 44332659200 Feb 17 23:55 
>>>>> /export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>>
>>>>> [root at gluster03 ~]# find 
>>>>> /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name '300*' -exec 
>>>>> ls -la {} \;
>>>>> -rw-r--r--. 2 root root 44332659200 Feb 17 23:55 
>>>>> /export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> -rw-r--r--. 2 root root 0 Feb 18 00:51 
>>>>> /export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> Same analysis as above.
>>>>>
>>>>> 3 files with information, 2 x a 0-bit file with the same name
>>>>>
>>>>> Checking the 0-bit files:
>>>>> [root at gluster01 ~]# getfattr -m . -d -e hex 
>>>>> /export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> getfattr: Removing leading '/' from absolute path names
>>>>> # file: 
>>>>> export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-34=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-35=0x000000000000000000000000
>>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>>
>>>>> [root at gluster03 ~]# getfattr -m . -d -e hex 
>>>>> /export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> getfattr: Removing leading '/' from absolute path names
>>>>> # file: 
>>>>> export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-34=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-35=0x000000000000000000000000
>>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>>
>>>>> This is not a glusterfs link file since there is no 
>>>>> "trusted.glusterfs.dht.linkto", am I correct?
>>>> You are correct.
>>>>>
>>>>> And checking the "good" files:
>>>>>
>>>>> # file: 
>>>>> export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-32=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-33=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-34=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-35=0x000000010000000100000000
>>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>>
>>>>> [root at gluster02 ~]# getfattr -m . -d -e hex 
>>>>> /export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> getfattr: Removing leading '/' from absolute path names
>>>>> # file: 
>>>>> export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-32=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-33=0x000000000000000000000000
>>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>>
>>>>> [root at gluster03 ~]# getfattr -m . -d -e hex 
>>>>> /export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> getfattr: Removing leading '/' from absolute path names
>>>>> # file: 
>>>>> export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-40=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-41=0x000000000000000000000000
>>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>>
>>>>>
>>>>>
>>>>> Seen from a client via a glusterfs mount:
>>>>> [root at client ~]# ls -al 
>>>>> /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
>>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>>>>> /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>>>>> /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>>>>> /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>>
>>>>>
>>>>>
>>>>> Via NFS (just after performing a umount and mount the volume again):
>>>>> [root at client ~]# ls -al 
>>>>> /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
>>>>> -rw-r--r--. 1 root root 44332659200 Feb 17 23:55 
>>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> -rw-r--r--. 1 root root 44332659200 Feb 17 23:55 
>>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> -rw-r--r--. 1 root root 44332659200 Feb 17 23:55 
>>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>>
>>>>> Doing the same list a couple of seconds later:
>>>>> [root at client ~]# ls -al 
>>>>> /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
>>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> And again, and again, and again:
>>>>> [root at client ~]# ls -al 
>>>>> /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
>>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>>
>>>>> This really seems odd. Why do we get to see "real data file" once 
>>>>> only?
>>>>>
>>>>> It seems more and more that this crazy file duplication (and 
>>>>> writing of sticky bit files) was actually triggered when rebooting 
>>>>> one of the three nodes while there still is an active (even when 
>>>>> there is no data exchange at all) NFS connection, since all 0-bit 
>>>>> files (of the non Sticky bit type) were either created at 00:51 or 
>>>>> 00:41, the exact moment one of the three nodes in the cluster were 
>>>>> rebooted. This would mean that replication currently with 
>>>>> GlusterFS creates hardly any redundancy. Quiet the opposite, if 
>>>>> one of the machines goes down, all of your data seriously gets 
>>>>> disorganised. I am buzzy configuring a test installation to see 
>>>>> how this can be best reproduced for a bug report..
>>>>>
>>>>> Does anyone have a suggestion how to best get rid of the 
>>>>> duplicates, or rather get this mess organised the way it should be?
>>>>> This is a cluster with millions of files. A rebalance does not fix 
>>>>> the issue, neither does a rebalance fix-layout help. Since this is 
>>>>> a replicated volume all files should be their 2x, not 3x. Can I 
>>>>> safely just remove all the 0 bit files outside of the .glusterfs 
>>>>> directory including the sticky bit files?
>>>>>
>>>>> The empty 0 bit files outside of .glusterfs on every brick I can 
>>>>> probably safely removed like this:
>>>>> find /export/* -path */.glusterfs -prune -o -type f -size 0 -perm 
>>>>> 1000 -exec rm {} \;
>>>>> not?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Cheers,
>>>>> Olav
>>>>> On 18/02/15 22:10, Olav Peeters wrote:
>>>>>> Thanks Tom and Joe,
>>>>>> for the fast response!
>>>>>>
>>>>>> Before I started my upgrade I stopped all clients using the 
>>>>>> volume and stopped all VM's with VHD on the volume, but I guess, 
>>>>>> and this may be the missing thing to reproduce this in a lab, I 
>>>>>> did not detach a NFS shared storage mount from a XenServer pool 
>>>>>> to this volume, since this is an extremely risky business. I also 
>>>>>> did not stop the volume. This I guess was a bit stupid, but since 
>>>>>> I did upgrades in the past this way without any issues I skipped 
>>>>>> this step (a really bad habit). I'll make amends and file a 
>>>>>> proper bug report :-). I agree with you Joe, this should never 
>>>>>> happen, even when someone ignores the advice of stopping the 
>>>>>> volume. If it would also be nessessary to detach shared storage 
>>>>>> NFS connections to a volume, than franky, glusterfs is unusable 
>>>>>> in a private cloud. No one can afford downtime of the whole 
>>>>>> infrastructure just for a glusterfs upgrade. Ideally a replicated 
>>>>>> gluster volume should even be able to remain online and used 
>>>>>> during (at least a minor version) upgrade.
>>>>>>
>>>>>> I don't know whether a heal was maybe buzzy when I started the 
>>>>>> upgrade. I forgot to check. I did check the CPU activity on the 
>>>>>> gluster nodes which were very low (in the 0.0X range via top), so 
>>>>>> I doubt it. I will add this to the bug report as a suggestion 
>>>>>> should they not be able to reproduce with an open NFS connection.
>>>>>>
>>>>>> By the way, is it sufficient to do:
>>>>>> service glusterd stop
>>>>>> service glusterfsd stop
>>>>>> and do a:
>>>>>> ps aux | gluster*
>>>>>> to see if everything has stopped and kill any leftovers should 
>>>>>> this be necessary?
>>>>>>
>>>>>> For the fix, do you agree that if I run e.g.:
>>>>>> find /export/* -type f -size 0 -perm 1000 -exec /bin/rm {} \;
>>>>>> on every node if /export is the location of all my bricks, also 
>>>>>> in a replicated set-up, this will be save?
>>>>>> No necessary 0bit files will be deleted in e.g. the .glusterfs of 
>>>>>> every brick?
>>>>>>
>>>>>> Thanks for your support!
>>>>>>
>>>>>> Cheers,
>>>>>> Olav
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 18/02/15 20:51, Joe Julian wrote:
>>>>>>>
>>>>>>> On 02/18/2015 11:43 AM, tbenzvi at 3vgeomatics.com wrote:
>>>>>>>> Hi Olav,
>>>>>>>>
>>>>>>>> I have a hunch that our problem was caused by improper 
>>>>>>>> unmounting of the gluster volume, and have since found that the 
>>>>>>>> proper order should be: kill all jobs using volume -> unmount 
>>>>>>>> volume on clients -> gluster volume stop -> stop gluster 
>>>>>>>> service (if necessary)
>>>>>>>> In my case, I wrote a Python script to find duplicate files on 
>>>>>>>> the mounted volume, then delete the corresponding link files on 
>>>>>>>> the bricks (making sure to also delete files in the .glusterfs 
>>>>>>>> directory)
>>>>>>>> However, your find command was also suggested to me and I think 
>>>>>>>> it's a simpler solution. I believe removing all link files 
>>>>>>>> (even ones that are not causing duplicates) is fine since the 
>>>>>>>> next file access gluster will do a lookup on all bricks and 
>>>>>>>> recreate any link files if necessary. Hopefully a gluster 
>>>>>>>> expert can chime in on this point as I'm not completely sure.
>>>>>>>
>>>>>>> You are correct.
>>>>>>>
>>>>>>>> Keep in mind your setup is somewhat different than mine as I 
>>>>>>>> have only 5 bricks with no replication.
>>>>>>>> Regards,
>>>>>>>> Tom
>>>>>>>>
>>>>>>>>     --------- Original Message ---------
>>>>>>>>     Subject: Re: [Gluster-users] Hundreds of duplicate files
>>>>>>>>     From: "Olav Peeters" <opeeters at gmail.com>
>>>>>>>>     Date: 2/18/15 10:52 am
>>>>>>>>     To: gluster-users at gluster.org, tbenzvi at 3vgeomatics.com
>>>>>>>>
>>>>>>>>     Hi all,
>>>>>>>>     I'm have this problem after upgrading from 3.5.3 to 3.6.2.
>>>>>>>>     At the moment I am still waiting for a heal to finish (on a
>>>>>>>>     31TB volume with 42 bricks, replicated over three nodes).
>>>>>>>>
>>>>>>>>     Tom,
>>>>>>>>     how did you remove the duplicates?
>>>>>>>>     with 42 bricks I will not be able to do this manually..
>>>>>>>>     Did a:
>>>>>>>>     find $brick_root -type f -size 0 -perm 1000 -exec /bin/rm {} \;
>>>>>>>>     work for you?
>>>>>>>>
>>>>>>>>     Should this type of thing ideally not be checked and mended
>>>>>>>>     by a heal?
>>>>>>>>
>>>>>>>>     Does anyone have an idea yet how this happens in the first
>>>>>>>>     place? Can it be connected to upgrading?
>>>>>>>>
>>>>>>>>     Cheers,
>>>>>>>>     Olav
>>>>>>>>
>>>>>>>>       
>>>>>>>>
>>>>>>>>     On 01/01/15 03:07, tbenzvi at 3vgeomatics.com wrote:
>>>>>>>>
>>>>>>>>         No, the files can be read on a newly mounted client! I
>>>>>>>>         went ahead and deleted all of the link files associated
>>>>>>>>         with these duplicates, and then remounted the volume.
>>>>>>>>         The problem is fixed!
>>>>>>>>         Thanks again for the help, Joe and Vijay.
>>>>>>>>         Tom
>>>>>>>>
>>>>>>>>             --------- Original Message ---------
>>>>>>>>             Subject: Re: [Gluster-users] Hundreds of duplicate
>>>>>>>>             files
>>>>>>>>             From: "Vijay Bellur" <vbellur at redhat.com>
>>>>>>>>             Date: 12/28/14 3:23 am
>>>>>>>>             To: tbenzvi at 3vgeomatics.com, gluster-users at gluster.org
>>>>>>>>
>>>>>>>>             On 12/28/2014 01:20 PM, tbenzvi at 3vgeomatics.com wrote:
>>>>>>>>             > Hi Vijay,
>>>>>>>>             > Yes the files are still readable from the
>>>>>>>>             .glusterfs path.
>>>>>>>>             > There is no explicit error. However, trying to
>>>>>>>>             read a text file in
>>>>>>>>             > python simply gives me null characters:
>>>>>>>>             >
>>>>>>>>             > >>> open('ott_mf_itab').readlines()
>>>>>>>>             >
>>>>>>>>             ['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']
>>>>>>>>             >
>>>>>>>>             > And reading binary files does the same
>>>>>>>>             >
>>>>>>>>
>>>>>>>>             Is this behavior seen with a freshly mounted client
>>>>>>>>             too?
>>>>>>>>
>>>>>>>>             -Vijay
>>>>>>>>
>>>>>>>>             > --------- Original Message ---------
>>>>>>>>             > Subject: Re: [Gluster-users] Hundreds of
>>>>>>>>             duplicate files
>>>>>>>>             > From: "Vijay Bellur" <vbellur at redhat.com>
>>>>>>>>             > Date: 12/27/14 9:57 pm
>>>>>>>>             > To: tbenzvi at 3vgeomatics.com,
>>>>>>>>             gluster-users at gluster.org
>>>>>>>>             >
>>>>>>>>             > On 12/28/2014 10:13 AM, tbenzvi at 3vgeomatics.com
>>>>>>>>             wrote:
>>>>>>>>             > > Thanks Joe, I've read your blog post as well as
>>>>>>>>             your post
>>>>>>>>             > regarding the
>>>>>>>>             > > .glusterfs directory.
>>>>>>>>             > > I found some unneeded duplicate files which
>>>>>>>>             were not being read
>>>>>>>>             > > properly. I then deleted the link file from the
>>>>>>>>             brick. This always
>>>>>>>>             > > removes the duplicate file from the listing,
>>>>>>>>             but the file does not
>>>>>>>>             > > always become readable. If I also delete the
>>>>>>>>             associated file in the
>>>>>>>>             > > .glusterfs directory on that brick, then some
>>>>>>>>             more files become
>>>>>>>>             > > readable. However this solution still doesn't
>>>>>>>>             work for all files.
>>>>>>>>             > > I know the file on the brick is not corrupt as
>>>>>>>>             it can be read
>>>>>>>>             > directly
>>>>>>>>             > > from the brick directory.
>>>>>>>>             >
>>>>>>>>             > For files that are not readable from the client,
>>>>>>>>             can you check if the
>>>>>>>>             > file is readable from the .glusterfs/ path?
>>>>>>>>             >
>>>>>>>>             > What is the specific error that is seen while
>>>>>>>>             trying to read one such
>>>>>>>>             > file from the client?
>>>>>>>>             >
>>>>>>>>             > Thanks,
>>>>>>>>             > Vijay
>>>>>>>>             >
>>>>>>>>             >
>>>>>>>>             >
>>>>>>>>             > _______________________________________________
>>>>>>>>             > Gluster-users mailing list
>>>>>>>>             > Gluster-users at gluster.org
>>>>>>>>             > http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>             >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>         _______________________________________________
>>>>>>>>         Gluster-users mailing list
>>>>>>>>         Gluster-users at gluster.org
>>>>>>>>         http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150222/47cdf156/attachment.html>
    
    
More information about the Gluster-users
mailing list