[Gluster-users] Hundreds of duplicate files
Olav Peeters
opeeters at gmail.com
Sun Feb 22 20:27:35 UTC 2015
Hi Joe,
I tried deleting both 0-bit versions of one of the dublicated file, like so:
[root at gluster01 ~]# getfattr -m . -d -e hex
/export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file:
export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-34=0x000000000000000000000000
trusted.afr.sr_vol01-client-35=0x000000000000000000000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
[root at gluster03 ~]# getfattr -m . -d -e hex
/export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file:
export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-34=0x000000000000000000000000
trusted.afr.sr_vol01-client-35=0x000000000000000000000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
[root at gluster01 ~]# rm -f
/export/brick14gfs01/.glusterfs/ae/fd/aefd1845-0841-4a8f-8408-f1ab8aa7a417
/export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
[root at gluster03 ~]# rm -f
/export/brick14gfs03/.glusterfs/ae/fd/aefd1845-0841-4a8f-8408-f1ab8aa7a417
/export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
An "ls" showed that this was successful..
10 minutes later these deleted files are back (presumably after a
self-heal had passed):
[root at gluster01 ~]# find /export/*/27* -size 0 -name
'3009f448-cf6e-413f-baec-c3b9f0cf9d72*' -exec ls -la {} \;
-rw-r--r--. 2 root root 0 Feb 18 00:51
/export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
Notice how the last modification date is the same as the deleted files
(the moment I rebooted the machines and the duplication-misery was
triggered). Do you have any idea what this means?
About you blog (http://joejulian.name/blog/dht-misses-are-expensive/), I
don't quiet understand how I can use the hash:
[root at gluster01 ~]# python gf_dm_hash.py
3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
0x2b3634edL
.. to locate/identify the good replica pair of that file. I also still
have these versions (with actual data):
[root at gluster01 ~]# getfattr -m . -d -e hex
/export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file:
export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-32=0x000000000000000000000000
trusted.afr.sr_vol01-client-33=0x000000000000000000000000
trusted.afr.sr_vol01-client-34=0x000000000000000000000000
trusted.afr.sr_vol01-client-35=0x000000010000000100000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
[root at gluster02 ~]# getfattr -m . -d -e hex
/export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file:
export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-32=0x000000000000000000000000
trusted.afr.sr_vol01-client-33=0x000000000000000000000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
[root at gluster02 ~]# getfattr -m . -d -e hex
/export/brick15gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file:
export/brick15gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-40=0x000000000000000000000000
trusted.afr.sr_vol01-client-41=0x000000000000000000000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
[root at gluster03 ~]# getfattr -m . -d -e hex
/export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file:
export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-40=0x000000000000000000000000
trusted.afr.sr_vol01-client-41=0x000000000000000000000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
My bet would be that I can delete the first two of these files.
For the rest they look identical:
[root at gluster01 ~]# ls -al
/export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
-rw-r--r--. 2 root root 44332659200 Feb 17 23:55
/export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
[root at gluster02 ~]# ls -al
/export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
-rw-r--r--. 2 root root 44332659200 Feb 17 23:55
/export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
[root at gluster02 ~]# ls -al
/export/brick15gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
-rw-r--r--. 2 root root 44332659200 Feb 17 23:55
/export/brick15gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
[root at gluster03 ~]# ls -al
/export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
-rw-r--r--. 2 root root 44332659200 Feb 17 23:55
/export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
Cheers,
Olav
On 21/02/15 01:37, Olav Peeters wrote:
> It look even worse than I had feared.. :-(
> This really is a crazy bug.
>
> If I understand you correctly, the only sane pairing of the xattrs is
> of the two 0-bit files, since this is the full list of bricks:
>
> root at gluster01 ~]# gluster volume info
>
> Volume Name: sr_vol01
> Type: Distributed-Replicate
> Volume ID: c6d6147e-2d91-4d98-b8d9-ba05ec7e4ad6
> Status: Started
> Number of Bricks: 21 x 2 = 42
> Transport-type: tcp
> Bricks:
> Brick1: gluster01:/export/brick1gfs01
> Brick2: gluster02:/export/brick1gfs02
> Brick3: gluster01:/export/brick4gfs01
> Brick4: gluster03:/export/brick4gfs03
> Brick5: gluster02:/export/brick4gfs02
> Brick6: gluster03:/export/brick1gfs03
> Brick7: gluster01:/export/brick2gfs01
> Brick8: gluster02:/export/brick2gfs02
> Brick9: gluster01:/export/brick5gfs01
> Brick10: gluster03:/export/brick5gfs03
> Brick11: gluster02:/export/brick5gfs02
> Brick12: gluster03:/export/brick2gfs03
> Brick13: gluster01:/export/brick3gfs01
> Brick14: gluster02:/export/brick3gfs02
> Brick15: gluster01:/export/brick6gfs01
> Brick16: gluster03:/export/brick6gfs03
> Brick17: gluster02:/export/brick6gfs02
> Brick18: gluster03:/export/brick3gfs03
> Brick19: gluster01:/export/brick8gfs01
> Brick20: gluster02:/export/brick8gfs02
> Brick21: gluster01:/export/brick9gfs01
> Brick22: gluster02:/export/brick9gfs02
> Brick23: gluster01:/export/brick10gfs01
> Brick24: gluster03:/export/brick10gfs03
> Brick25: gluster01:/export/brick11gfs01
> Brick26: gluster03:/export/brick11gfs03
> Brick27: gluster02:/export/brick10gfs02
> Brick28: gluster03:/export/brick8gfs03
> Brick29: gluster02:/export/brick11gfs02
> Brick30: gluster03:/export/brick9gfs03
> Brick31: gluster01:/export/brick12gfs01
> Brick32: gluster02:/export/brick12gfs02
> Brick33: gluster01:/export/brick13gfs01
> Brick34: gluster02:/export/brick13gfs02
> Brick35: gluster01:/export/brick14gfs01
> Brick36: gluster03:/export/brick14gfs03
> Brick37: gluster01:/export/brick15gfs01
> Brick38: gluster03:/export/brick15gfs03
> Brick39: gluster02:/export/brick14gfs02
> Brick40: gluster03:/export/brick12gfs03
> Brick41: gluster02:/export/brick15gfs02
> Brick42: gluster03:/export/brick13gfs03
>
>
> The two 0-bit files are on brick 35 and 36 as the getfattr correctly
> lists.
>
> Another sane pairing could be this (if the first file did not also
> refer to client-34 and client-35):
>
> [root at gluster01 ~]# getfattr -m . -d -e hex
> /export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> getfattr: Removing leading '/' from absolute path names
> # file:
> export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.sr_vol01-client-32=0x000000000000000000000000
> trusted.afr.sr_vol01-client-33=0x000000000000000000000000
> trusted.afr.sr_vol01-client-34=0x000000000000000000000000
> trusted.afr.sr_vol01-client-35=0x000000010000000100000000
> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>
> [root at gluster02 ~]# getfattr -m . -d -e hex
> /export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> getfattr: Removing leading '/' from absolute path names
> # file:
> export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.sr_vol01-client-32=0x000000000000000000000000
> trusted.afr.sr_vol01-client-33=0x000000000000000000000000
> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>
> But why is the security.selinux hash different?
>
>
> You mention hostname changes..
> I noticed that if I do a listing of available shared storages on one
> of the XenServer I get:
> uuid ( RO) : 272b2366-dfbf-ad47-2a0f-5d5cc40863e3
> name-label ( RW): gluster_store
> name-description ( RW): NFS SR [gluster01.irceline.be:/sr_vol01]
> host ( RO): <shared>
> type ( RO): nfs
> content-type ( RO):
>
>
> if I do normal general linux:
> [root at same_story_on_both_xenserver ~]# mount
> gluster02.irceline.be:/sr_vol01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3
> on /var/run/sr-mount/272b2366-dfbf-ad47-2a0f-5d5cc40863e3 type nfs
> (rw,soft,timeo=133,retrans=2147483647,tcp,noac,addr=192.168.0.72)
>
> Originally the mount was done on gluster01 (ip 192.168.0.71) as the
> name-description of the xe sr-list indicates..
> It is as though when gluster01 was not available for a couple of
> minutes, the NFS mount internally was somehow automatically
> reconfigured to gluster02, but NFS cannot do this as far as I know
> (unless there is some fail-over mechanism - I never configured this).
> There also is no load-balancing between client and server.
> If gluster01 is not available, the gluster volume should not have been
> available, end of story.. But from perspective of a client the NFS
> could be to any one of the three gluster nodes. The client should see
> exactly the same data..
>
> So a rebalance in the current state could do more harm than good?
> I launched a second rebalance in the hope that the system would mend
> itself after all...
>
> Thanks a million for your support in this darkest hour of my time as a
> glusterfs user :-)
>
> Cheers,
> Olav
>
> On 20/02/15 23:10, Joe Julian wrote:
>>
>> On 02/20/2015 01:47 PM, Olav Peeters wrote:
>>> Thanks Joe,
>>> for the answers!
>>>
>>> I was not clear enough about the set up apparently.
>>> The Gluster cluster consist of 3 nodes with each 14 bricks. The
>>> bricks are formatted as xfs, mounted locally as xfs. There is one
>>> volume, type: Distributed-Replicate (replica 2). The configuration
>>> is so that bricks are mirrored on two different nodes.
>>>
>>> The NFS mount which was alive but not used during reboot when the
>>> problem started are from clients (2 XenServer machines configured as
>>> a pool - a shared storage set-up). The comparisons I give below are
>>> between (other) clients mounting via either glusterfs or NFS.
>>> Similar problem with the exception that the first listing (via ls)
>>> after a fresh mount via NFS actually does find the files with data.
>>> A second listing only finds the 0 bit file with the same name.
>>>
>>> So all the 0bit files in mode 0644 can be safely removed?
>> Probably? Is it likely that you have any empty files? I don't know.
>>>
>>> Why do I see three files with the same name (and modification
>>> timestamp etc.) via either a glusterfs or NFS mount from a client?
>>> Deleting one of the three will probably not solve the issue either..
>>> this seems to me an indexing issue in the gluster cluster.
>> Very good question. I don't know. The xattrs tell a strange story
>> that I haven't seen before. One legit file shows sr_vol01-client-32
>> and 33. This would be normal, assuming the filename hash would put it
>> on that replica pair (we can't tell since the rebalance has changed
>> the hash map). Another file shows sr_vol01-client-32, 33, 34, and 35
>> with pending updates scheduled for 35. I have no idea which brick
>> this is (see "gluster volume info" and map the digits (35) with the
>> bricks offset by 1 (client-35 is brick 36). That last one is on 40,41.
>>
>> I don't know how these files all got on different replica sets. My
>> speculations include hostname changes, long-running net-split
>> conditions with different dht maps (failed rebalances), moved bricks,
>> load balancers between client and server, mercury in retrograde (lol)...
>>
>>> How do I get Gluster to replicate the files correctly, only 2
>>> versions of the same file, not three, and on two bricks on different
>>> machines?
>>>
>>
>> Identify which replica is correct by using the little python script
>> at http://joejulian.name/blog/dht-misses-are-expensive/ to get the
>> hash of the filename. Examine the dht map to see which replica pair
>> *should* have that hash and remove the others (and their hardlink in
>> .glusterfs). There is no 1-liner that's going to do this. I would
>> probably script the logic in python, have it print out what it was
>> going to do, check that for sanity and, if sane, execute it.
>>
>> But mostly figure out how Bricks 32 and/or 33 can become 34 and/or 35
>> and/or 40 and/or 41. That's the root of the whole problem.
>>
>>> Cheers,
>>> Olav
>>>
>>>
>>>
>>>
>>> On 20/02/15 21:51, Joe Julian wrote:
>>>>
>>>> On 02/20/2015 12:21 PM, Olav Peeters wrote:
>>>>> Let's take one file (3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd) as
>>>>> an example...
>>>>> On the 3 nodes where all bricks are formatted as XFS and mounted
>>>>> in /export and 272b2366-dfbf-ad47-2a0f-5d5cc40863e3 is the
>>>>> mounting point of a NFS shared storage connection from XenServer
>>>>> machines:
>>>> Did I just read this correctly? Your bricks are NFS mounts? ie,
>>>> GlusterFS Client <-> GlusterFS Server <-> NFS <-> XFS
>>>>>
>>>>> [root at gluster01 ~]# find
>>>>> /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name '300*' -exec
>>>>> ls -la {} \;
>>>>> -rw-r--r--. 2 root root 44332659200 Feb 17 23:55
>>>>> /export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> Supposedly, this is the actual file.
>>>>> -rw-r--r--. 2 root root 0 Feb 18 00:51
>>>>> /export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> This is not a linkfile. Note it's mode 0644. How it got there with
>>>> those permissions would be a matter of history and would require
>>>> information that's probably lost.
>>>>>
>>>>> root at gluster02 ~]# find
>>>>> /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name '300*' -exec
>>>>> ls -la {} \;
>>>>> -rw-r--r--. 2 root root 44332659200 Feb 17 23:55
>>>>> /export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>>
>>>>> [root at gluster03 ~]# find
>>>>> /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name '300*' -exec
>>>>> ls -la {} \;
>>>>> -rw-r--r--. 2 root root 44332659200 Feb 17 23:55
>>>>> /export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> -rw-r--r--. 2 root root 0 Feb 18 00:51
>>>>> /export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> Same analysis as above.
>>>>>
>>>>> 3 files with information, 2 x a 0-bit file with the same name
>>>>>
>>>>> Checking the 0-bit files:
>>>>> [root at gluster01 ~]# getfattr -m . -d -e hex
>>>>> /export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> getfattr: Removing leading '/' from absolute path names
>>>>> # file:
>>>>> export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-34=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-35=0x000000000000000000000000
>>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>>
>>>>> [root at gluster03 ~]# getfattr -m . -d -e hex
>>>>> /export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> getfattr: Removing leading '/' from absolute path names
>>>>> # file:
>>>>> export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-34=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-35=0x000000000000000000000000
>>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>>
>>>>> This is not a glusterfs link file since there is no
>>>>> "trusted.glusterfs.dht.linkto", am I correct?
>>>> You are correct.
>>>>>
>>>>> And checking the "good" files:
>>>>>
>>>>> # file:
>>>>> export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-32=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-33=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-34=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-35=0x000000010000000100000000
>>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>>
>>>>> [root at gluster02 ~]# getfattr -m . -d -e hex
>>>>> /export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> getfattr: Removing leading '/' from absolute path names
>>>>> # file:
>>>>> export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-32=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-33=0x000000000000000000000000
>>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>>
>>>>> [root at gluster03 ~]# getfattr -m . -d -e hex
>>>>> /export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> getfattr: Removing leading '/' from absolute path names
>>>>> # file:
>>>>> export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-40=0x000000000000000000000000
>>>>> trusted.afr.sr_vol01-client-41=0x000000000000000000000000
>>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>>
>>>>>
>>>>>
>>>>> Seen from a client via a glusterfs mount:
>>>>> [root at client ~]# ls -al
>>>>> /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
>>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51
>>>>> /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51
>>>>> /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51
>>>>> /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>>
>>>>>
>>>>>
>>>>> Via NFS (just after performing a umount and mount the volume again):
>>>>> [root at client ~]# ls -al
>>>>> /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
>>>>> -rw-r--r--. 1 root root 44332659200 Feb 17 23:55
>>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> -rw-r--r--. 1 root root 44332659200 Feb 17 23:55
>>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> -rw-r--r--. 1 root root 44332659200 Feb 17 23:55
>>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>>
>>>>> Doing the same list a couple of seconds later:
>>>>> [root at client ~]# ls -al
>>>>> /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
>>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51
>>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51
>>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51
>>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> And again, and again, and again:
>>>>> [root at client ~]# ls -al
>>>>> /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
>>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51
>>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51
>>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51
>>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>>
>>>>> This really seems odd. Why do we get to see "real data file" once
>>>>> only?
>>>>>
>>>>> It seems more and more that this crazy file duplication (and
>>>>> writing of sticky bit files) was actually triggered when rebooting
>>>>> one of the three nodes while there still is an active (even when
>>>>> there is no data exchange at all) NFS connection, since all 0-bit
>>>>> files (of the non Sticky bit type) were either created at 00:51 or
>>>>> 00:41, the exact moment one of the three nodes in the cluster were
>>>>> rebooted. This would mean that replication currently with
>>>>> GlusterFS creates hardly any redundancy. Quiet the opposite, if
>>>>> one of the machines goes down, all of your data seriously gets
>>>>> disorganised. I am buzzy configuring a test installation to see
>>>>> how this can be best reproduced for a bug report..
>>>>>
>>>>> Does anyone have a suggestion how to best get rid of the
>>>>> duplicates, or rather get this mess organised the way it should be?
>>>>> This is a cluster with millions of files. A rebalance does not fix
>>>>> the issue, neither does a rebalance fix-layout help. Since this is
>>>>> a replicated volume all files should be their 2x, not 3x. Can I
>>>>> safely just remove all the 0 bit files outside of the .glusterfs
>>>>> directory including the sticky bit files?
>>>>>
>>>>> The empty 0 bit files outside of .glusterfs on every brick I can
>>>>> probably safely removed like this:
>>>>> find /export/* -path */.glusterfs -prune -o -type f -size 0 -perm
>>>>> 1000 -exec rm {} \;
>>>>> not?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Cheers,
>>>>> Olav
>>>>> On 18/02/15 22:10, Olav Peeters wrote:
>>>>>> Thanks Tom and Joe,
>>>>>> for the fast response!
>>>>>>
>>>>>> Before I started my upgrade I stopped all clients using the
>>>>>> volume and stopped all VM's with VHD on the volume, but I guess,
>>>>>> and this may be the missing thing to reproduce this in a lab, I
>>>>>> did not detach a NFS shared storage mount from a XenServer pool
>>>>>> to this volume, since this is an extremely risky business. I also
>>>>>> did not stop the volume. This I guess was a bit stupid, but since
>>>>>> I did upgrades in the past this way without any issues I skipped
>>>>>> this step (a really bad habit). I'll make amends and file a
>>>>>> proper bug report :-). I agree with you Joe, this should never
>>>>>> happen, even when someone ignores the advice of stopping the
>>>>>> volume. If it would also be nessessary to detach shared storage
>>>>>> NFS connections to a volume, than franky, glusterfs is unusable
>>>>>> in a private cloud. No one can afford downtime of the whole
>>>>>> infrastructure just for a glusterfs upgrade. Ideally a replicated
>>>>>> gluster volume should even be able to remain online and used
>>>>>> during (at least a minor version) upgrade.
>>>>>>
>>>>>> I don't know whether a heal was maybe buzzy when I started the
>>>>>> upgrade. I forgot to check. I did check the CPU activity on the
>>>>>> gluster nodes which were very low (in the 0.0X range via top), so
>>>>>> I doubt it. I will add this to the bug report as a suggestion
>>>>>> should they not be able to reproduce with an open NFS connection.
>>>>>>
>>>>>> By the way, is it sufficient to do:
>>>>>> service glusterd stop
>>>>>> service glusterfsd stop
>>>>>> and do a:
>>>>>> ps aux | gluster*
>>>>>> to see if everything has stopped and kill any leftovers should
>>>>>> this be necessary?
>>>>>>
>>>>>> For the fix, do you agree that if I run e.g.:
>>>>>> find /export/* -type f -size 0 -perm 1000 -exec /bin/rm {} \;
>>>>>> on every node if /export is the location of all my bricks, also
>>>>>> in a replicated set-up, this will be save?
>>>>>> No necessary 0bit files will be deleted in e.g. the .glusterfs of
>>>>>> every brick?
>>>>>>
>>>>>> Thanks for your support!
>>>>>>
>>>>>> Cheers,
>>>>>> Olav
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 18/02/15 20:51, Joe Julian wrote:
>>>>>>>
>>>>>>> On 02/18/2015 11:43 AM, tbenzvi at 3vgeomatics.com wrote:
>>>>>>>> Hi Olav,
>>>>>>>>
>>>>>>>> I have a hunch that our problem was caused by improper
>>>>>>>> unmounting of the gluster volume, and have since found that the
>>>>>>>> proper order should be: kill all jobs using volume -> unmount
>>>>>>>> volume on clients -> gluster volume stop -> stop gluster
>>>>>>>> service (if necessary)
>>>>>>>> In my case, I wrote a Python script to find duplicate files on
>>>>>>>> the mounted volume, then delete the corresponding link files on
>>>>>>>> the bricks (making sure to also delete files in the .glusterfs
>>>>>>>> directory)
>>>>>>>> However, your find command was also suggested to me and I think
>>>>>>>> it's a simpler solution. I believe removing all link files
>>>>>>>> (even ones that are not causing duplicates) is fine since the
>>>>>>>> next file access gluster will do a lookup on all bricks and
>>>>>>>> recreate any link files if necessary. Hopefully a gluster
>>>>>>>> expert can chime in on this point as I'm not completely sure.
>>>>>>>
>>>>>>> You are correct.
>>>>>>>
>>>>>>>> Keep in mind your setup is somewhat different than mine as I
>>>>>>>> have only 5 bricks with no replication.
>>>>>>>> Regards,
>>>>>>>> Tom
>>>>>>>>
>>>>>>>> --------- Original Message ---------
>>>>>>>> Subject: Re: [Gluster-users] Hundreds of duplicate files
>>>>>>>> From: "Olav Peeters" <opeeters at gmail.com>
>>>>>>>> Date: 2/18/15 10:52 am
>>>>>>>> To: gluster-users at gluster.org, tbenzvi at 3vgeomatics.com
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>> I'm have this problem after upgrading from 3.5.3 to 3.6.2.
>>>>>>>> At the moment I am still waiting for a heal to finish (on a
>>>>>>>> 31TB volume with 42 bricks, replicated over three nodes).
>>>>>>>>
>>>>>>>> Tom,
>>>>>>>> how did you remove the duplicates?
>>>>>>>> with 42 bricks I will not be able to do this manually..
>>>>>>>> Did a:
>>>>>>>> find $brick_root -type f -size 0 -perm 1000 -exec /bin/rm {} \;
>>>>>>>> work for you?
>>>>>>>>
>>>>>>>> Should this type of thing ideally not be checked and mended
>>>>>>>> by a heal?
>>>>>>>>
>>>>>>>> Does anyone have an idea yet how this happens in the first
>>>>>>>> place? Can it be connected to upgrading?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Olav
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 01/01/15 03:07, tbenzvi at 3vgeomatics.com wrote:
>>>>>>>>
>>>>>>>> No, the files can be read on a newly mounted client! I
>>>>>>>> went ahead and deleted all of the link files associated
>>>>>>>> with these duplicates, and then remounted the volume.
>>>>>>>> The problem is fixed!
>>>>>>>> Thanks again for the help, Joe and Vijay.
>>>>>>>> Tom
>>>>>>>>
>>>>>>>> --------- Original Message ---------
>>>>>>>> Subject: Re: [Gluster-users] Hundreds of duplicate
>>>>>>>> files
>>>>>>>> From: "Vijay Bellur" <vbellur at redhat.com>
>>>>>>>> Date: 12/28/14 3:23 am
>>>>>>>> To: tbenzvi at 3vgeomatics.com, gluster-users at gluster.org
>>>>>>>>
>>>>>>>> On 12/28/2014 01:20 PM, tbenzvi at 3vgeomatics.com wrote:
>>>>>>>> > Hi Vijay,
>>>>>>>> > Yes the files are still readable from the
>>>>>>>> .glusterfs path.
>>>>>>>> > There is no explicit error. However, trying to
>>>>>>>> read a text file in
>>>>>>>> > python simply gives me null characters:
>>>>>>>> >
>>>>>>>> > >>> open('ott_mf_itab').readlines()
>>>>>>>> >
>>>>>>>> ['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']
>>>>>>>> >
>>>>>>>> > And reading binary files does the same
>>>>>>>> >
>>>>>>>>
>>>>>>>> Is this behavior seen with a freshly mounted client
>>>>>>>> too?
>>>>>>>>
>>>>>>>> -Vijay
>>>>>>>>
>>>>>>>> > --------- Original Message ---------
>>>>>>>> > Subject: Re: [Gluster-users] Hundreds of
>>>>>>>> duplicate files
>>>>>>>> > From: "Vijay Bellur" <vbellur at redhat.com>
>>>>>>>> > Date: 12/27/14 9:57 pm
>>>>>>>> > To: tbenzvi at 3vgeomatics.com,
>>>>>>>> gluster-users at gluster.org
>>>>>>>> >
>>>>>>>> > On 12/28/2014 10:13 AM, tbenzvi at 3vgeomatics.com
>>>>>>>> wrote:
>>>>>>>> > > Thanks Joe, I've read your blog post as well as
>>>>>>>> your post
>>>>>>>> > regarding the
>>>>>>>> > > .glusterfs directory.
>>>>>>>> > > I found some unneeded duplicate files which
>>>>>>>> were not being read
>>>>>>>> > > properly. I then deleted the link file from the
>>>>>>>> brick. This always
>>>>>>>> > > removes the duplicate file from the listing,
>>>>>>>> but the file does not
>>>>>>>> > > always become readable. If I also delete the
>>>>>>>> associated file in the
>>>>>>>> > > .glusterfs directory on that brick, then some
>>>>>>>> more files become
>>>>>>>> > > readable. However this solution still doesn't
>>>>>>>> work for all files.
>>>>>>>> > > I know the file on the brick is not corrupt as
>>>>>>>> it can be read
>>>>>>>> > directly
>>>>>>>> > > from the brick directory.
>>>>>>>> >
>>>>>>>> > For files that are not readable from the client,
>>>>>>>> can you check if the
>>>>>>>> > file is readable from the .glusterfs/ path?
>>>>>>>> >
>>>>>>>> > What is the specific error that is seen while
>>>>>>>> trying to read one such
>>>>>>>> > file from the client?
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> > Vijay
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > _______________________________________________
>>>>>>>> > Gluster-users mailing list
>>>>>>>> > Gluster-users at gluster.org
>>>>>>>> > http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150222/47cdf156/attachment.html>
More information about the Gluster-users
mailing list