[Gluster-users] Hundreds of duplicate files

Olav Peeters opeeters at gmail.com
Sat Feb 21 00:37:20 UTC 2015


It look even worse than I had feared.. :-(
This really is a crazy bug.

If I understand you correctly, the only sane pairing of the xattrs is of 
the two 0-bit files, since this is the full list of bricks:

root at gluster01 ~]# gluster volume info

Volume Name: sr_vol01
Type: Distributed-Replicate
Volume ID: c6d6147e-2d91-4d98-b8d9-ba05ec7e4ad6
Status: Started
Number of Bricks: 21 x 2 = 42
Transport-type: tcp
Bricks:
Brick1: gluster01:/export/brick1gfs01
Brick2: gluster02:/export/brick1gfs02
Brick3: gluster01:/export/brick4gfs01
Brick4: gluster03:/export/brick4gfs03
Brick5: gluster02:/export/brick4gfs02
Brick6: gluster03:/export/brick1gfs03
Brick7: gluster01:/export/brick2gfs01
Brick8: gluster02:/export/brick2gfs02
Brick9: gluster01:/export/brick5gfs01
Brick10: gluster03:/export/brick5gfs03
Brick11: gluster02:/export/brick5gfs02
Brick12: gluster03:/export/brick2gfs03
Brick13: gluster01:/export/brick3gfs01
Brick14: gluster02:/export/brick3gfs02
Brick15: gluster01:/export/brick6gfs01
Brick16: gluster03:/export/brick6gfs03
Brick17: gluster02:/export/brick6gfs02
Brick18: gluster03:/export/brick3gfs03
Brick19: gluster01:/export/brick8gfs01
Brick20: gluster02:/export/brick8gfs02
Brick21: gluster01:/export/brick9gfs01
Brick22: gluster02:/export/brick9gfs02
Brick23: gluster01:/export/brick10gfs01
Brick24: gluster03:/export/brick10gfs03
Brick25: gluster01:/export/brick11gfs01
Brick26: gluster03:/export/brick11gfs03
Brick27: gluster02:/export/brick10gfs02
Brick28: gluster03:/export/brick8gfs03
Brick29: gluster02:/export/brick11gfs02
Brick30: gluster03:/export/brick9gfs03
Brick31: gluster01:/export/brick12gfs01
Brick32: gluster02:/export/brick12gfs02
Brick33: gluster01:/export/brick13gfs01
Brick34: gluster02:/export/brick13gfs02
Brick35: gluster01:/export/brick14gfs01
Brick36: gluster03:/export/brick14gfs03
Brick37: gluster01:/export/brick15gfs01
Brick38: gluster03:/export/brick15gfs03
Brick39: gluster02:/export/brick14gfs02
Brick40: gluster03:/export/brick12gfs03
Brick41: gluster02:/export/brick15gfs02
Brick42: gluster03:/export/brick13gfs03


The two 0-bit files are on brick 35 and 36 as the getfattr correctly lists.

Another sane pairing could be this (if the first file did not also refer 
to client-34 and client-35):

[root at gluster01 ~]# getfattr -m . -d -e hex 
/export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file: 
export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-32=0x000000000000000000000000
trusted.afr.sr_vol01-client-33=0x000000000000000000000000
trusted.afr.sr_vol01-client-34=0x000000000000000000000000
trusted.afr.sr_vol01-client-35=0x000000010000000100000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417

[root at gluster02 ~]# getfattr -m . -d -e hex 
/export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file: 
export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-32=0x000000000000000000000000
trusted.afr.sr_vol01-client-33=0x000000000000000000000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417

But why is the security.selinux hash different?


You mention hostname changes..
I noticed that if I do a listing of available shared storages on one of 
the XenServer I get:
uuid ( RO)                : 272b2366-dfbf-ad47-2a0f-5d5cc40863e3
           name-label ( RW): gluster_store
     name-description ( RW): NFS SR [gluster01.irceline.be:/sr_vol01]
                 host ( RO): <shared>
                 type ( RO): nfs
         content-type ( RO):


if I do normal general linux:
[root at same_story_on_both_xenserver ~]# mount
gluster02.irceline.be:/sr_vol01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3 on 
/var/run/sr-mount/272b2366-dfbf-ad47-2a0f-5d5cc40863e3 type nfs 
(rw,soft,timeo=133,retrans=2147483647,tcp,noac,addr=192.168.0.72)

Originally the mount was done on gluster01 (ip 192.168.0.71) as the 
name-description of the xe sr-list indicates..
It is as though when gluster01 was not available for a couple of 
minutes, the NFS mount internally was somehow automatically reconfigured 
to gluster02, but NFS cannot do this as far as I know (unless there is 
some fail-over mechanism - I never configured this). There also is no 
load-balancing between client and server.
If gluster01 is not available, the gluster volume should not have been 
available, end of story.. But from perspective of a client the NFS could 
be to any one of the three gluster nodes. The client should see exactly 
the same data..

So a rebalance in the current state could do more harm than good?
I launched a second rebalance in the hope that the system would mend 
itself after all...

Thanks a million for your support in this darkest hour of my time as a 
glusterfs user :-)

Cheers,
Olav



On 20/02/15 23:10, Joe Julian wrote:
>
> On 02/20/2015 01:47 PM, Olav Peeters wrote:
>> Thanks Joe,
>> for the answers!
>>
>> I was not clear enough about the set up apparently.
>> The Gluster cluster consist of 3 nodes with each 14 bricks. The 
>> bricks are formatted as xfs, mounted locally as xfs. There is one 
>> volume, type: Distributed-Replicate (replica 2). The configuration is 
>> so that bricks are mirrored on two different nodes.
>>
>> The NFS mount which was alive but not used during reboot when the 
>> problem started are from clients (2 XenServer machines configured as 
>> a pool - a shared storage set-up). The comparisons I give below are 
>> between (other) clients mounting via either glusterfs or NFS. Similar 
>> problem with the exception that the first listing (via ls) after a 
>> fresh mount via NFS actually does find the files with data. A second 
>> listing only finds the 0 bit file with the same name.
>>
>> So all the 0bit files in mode 0644 can be safely removed?
> Probably? Is it likely that you have any empty files? I don't know.
>>
>> Why do I see three files with the same name (and modification 
>> timestamp etc.) via either a glusterfs or NFS mount from a client? 
>> Deleting one of the three will probably not solve the issue either.. 
>> this seems to me an indexing issue in the gluster cluster.
> Very good question. I don't know. The xattrs tell a strange story that 
> I haven't seen before. One legit file shows sr_vol01-client-32 and 33. 
> This would be normal, assuming the filename hash would put it on that 
> replica pair (we can't tell since the rebalance has changed the hash 
> map). Another file shows sr_vol01-client-32, 33, 34, and 35 with 
> pending updates scheduled for 35. I have no idea which brick this is 
> (see "gluster volume info" and map the digits (35) with the bricks 
> offset by 1 (client-35 is brick 36). That last one is on 40,41.
>
> I don't know how these files all got on different replica sets. My 
> speculations include hostname changes, long-running net-split 
> conditions with different dht maps (failed rebalances), moved bricks, 
> load balancers between client and server, mercury in retrograde (lol)...
>
>> How do I get Gluster to replicate the files correctly, only 2 
>> versions of the same file, not three, and on two bricks on different 
>> machines?
>>
>
> Identify which replica is correct by using the little python script at 
> http://joejulian.name/blog/dht-misses-are-expensive/ to get the hash 
> of the filename. Examine the dht map to see which replica pair 
> *should* have that hash and remove the others (and their hardlink in 
> .glusterfs). There is no 1-liner that's going to do this. I would 
> probably script the logic in python, have it print out what it was 
> going to do, check that for sanity and, if sane, execute it.
>
> But mostly figure out how Bricks 32 and/or 33 can become 34 and/or 35 
> and/or 40 and/or 41. That's the root of the whole problem.
>
>> Cheers,
>> Olav
>>
>>
>>
>>
>> On 20/02/15 21:51, Joe Julian wrote:
>>>
>>> On 02/20/2015 12:21 PM, Olav Peeters wrote:
>>>> Let's take one file (3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd) as 
>>>> an example...
>>>> On the 3 nodes where all bricks are formatted as XFS and mounted in 
>>>> /export and 272b2366-dfbf-ad47-2a0f-5d5cc40863e3 is the mounting 
>>>> point of a NFS shared storage connection from XenServer machines:
>>> Did I just read this correctly? Your bricks are NFS mounts? ie, 
>>> GlusterFS Client <-> GlusterFS Server <-> NFS <-> XFS
>>>>
>>>> [root at gluster01 ~]# find 
>>>> /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name '300*' -exec 
>>>> ls -la {} \;
>>>> -rw-r--r--. 2 root root 44332659200 Feb 17 23:55 
>>>> /export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>> Supposedly, this is the actual file.
>>>> -rw-r--r--. 2 root root 0 Feb 18 00:51 
>>>> /export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>> This is not a linkfile. Note it's mode 0644. How it got there with 
>>> those permissions would be a matter of history and would require 
>>> information that's probably lost.
>>>>
>>>> root at gluster02 ~]# find 
>>>> /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name '300*' -exec 
>>>> ls -la {} \;
>>>> -rw-r--r--. 2 root root 44332659200 Feb 17 23:55 
>>>> /export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>
>>>> [root at gluster03 ~]# find 
>>>> /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name '300*' -exec 
>>>> ls -la {} \;
>>>> -rw-r--r--. 2 root root 44332659200 Feb 17 23:55 
>>>> /export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> -rw-r--r--. 2 root root 0 Feb 18 00:51 
>>>> /export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>> Same analysis as above.
>>>>
>>>> 3 files with information, 2 x a 0-bit file with the same name
>>>>
>>>> Checking the 0-bit files:
>>>> [root at gluster01 ~]# getfattr -m . -d -e hex 
>>>> /export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> getfattr: Removing leading '/' from absolute path names
>>>> # file: 
>>>> export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-34=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-35=0x000000000000000000000000
>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>
>>>> [root at gluster03 ~]# getfattr -m . -d -e hex 
>>>> /export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> getfattr: Removing leading '/' from absolute path names
>>>> # file: 
>>>> export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-34=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-35=0x000000000000000000000000
>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>
>>>> This is not a glusterfs link file since there is no 
>>>> "trusted.glusterfs.dht.linkto", am I correct?
>>> You are correct.
>>>>
>>>> And checking the "good" files:
>>>>
>>>> # file: 
>>>> export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-32=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-33=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-34=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-35=0x000000010000000100000000
>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>
>>>> [root at gluster02 ~]# getfattr -m . -d -e hex 
>>>> /export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> getfattr: Removing leading '/' from absolute path names
>>>> # file: 
>>>> export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-32=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-33=0x000000000000000000000000
>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>
>>>> [root at gluster03 ~]# getfattr -m . -d -e hex 
>>>> /export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> getfattr: Removing leading '/' from absolute path names
>>>> # file: 
>>>> export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-40=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-41=0x000000000000000000000000
>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>
>>>>
>>>>
>>>> Seen from a client via a glusterfs mount:
>>>> [root at client ~]# ls -al 
>>>> /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>>>> /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>>>> /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>>>> /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>
>>>>
>>>>
>>>> Via NFS (just after performing a umount and mount the volume again):
>>>> [root at client ~]# ls -al 
>>>> /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
>>>> -rw-r--r--. 1 root root 44332659200 Feb 17 23:55 
>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> -rw-r--r--. 1 root root 44332659200 Feb 17 23:55 
>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> -rw-r--r--. 1 root root 44332659200 Feb 17 23:55 
>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>
>>>> Doing the same list a couple of seconds later:
>>>> [root at client ~]# ls -al 
>>>> /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> And again, and again, and again:
>>>> [root at client ~]# ls -al 
>>>> /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51 
>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>
>>>> This really seems odd. Why do we get to see "real data file" once only?
>>>>
>>>> It seems more and more that this crazy file duplication (and 
>>>> writing of sticky bit files) was actually triggered when rebooting 
>>>> one of the three nodes while there still is an active (even when 
>>>> there is no data exchange at all) NFS connection, since all 0-bit 
>>>> files (of the non Sticky bit type) were either created at 00:51 or 
>>>> 00:41, the exact moment one of the three nodes in the cluster were 
>>>> rebooted. This would mean that replication currently with GlusterFS 
>>>> creates hardly any redundancy. Quiet the opposite, if one of the 
>>>> machines goes down, all of your data seriously gets disorganised. I 
>>>> am buzzy configuring a test installation to see how this can be 
>>>> best reproduced for a bug report..
>>>>
>>>> Does anyone have a suggestion how to best get rid of the 
>>>> duplicates, or rather get this mess organised the way it should be?
>>>> This is a cluster with millions of files. A rebalance does not fix 
>>>> the issue, neither does a rebalance fix-layout help. Since this is 
>>>> a replicated volume all files should be their 2x, not 3x. Can I 
>>>> safely just remove all the 0 bit files outside of the .glusterfs 
>>>> directory including the sticky bit files?
>>>>
>>>> The empty 0 bit files outside of .glusterfs on every brick I can 
>>>> probably safely removed like this:
>>>> find /export/* -path */.glusterfs -prune -o -type f -size 0 -perm 
>>>> 1000 -exec rm {} \;
>>>> not?
>>>>
>>>> Thanks!
>>>>
>>>> Cheers,
>>>> Olav
>>>> On 18/02/15 22:10, Olav Peeters wrote:
>>>>> Thanks Tom and Joe,
>>>>> for the fast response!
>>>>>
>>>>> Before I started my upgrade I stopped all clients using the volume 
>>>>> and stopped all VM's with VHD on the volume, but I guess, and this 
>>>>> may be the missing thing to reproduce this in a lab, I did not 
>>>>> detach a NFS shared storage mount from a XenServer pool to this 
>>>>> volume, since this is an extremely risky business. I also did not 
>>>>> stop the volume. This I guess was a bit stupid, but since I did 
>>>>> upgrades in the past this way without any issues I skipped this 
>>>>> step (a really bad habit). I'll make amends and file a proper bug 
>>>>> report :-). I agree with you Joe, this should never happen, even 
>>>>> when someone ignores the advice of stopping the volume. If it 
>>>>> would also be nessessary to detach shared storage NFS connections 
>>>>> to a volume, than franky, glusterfs is unusable in a private 
>>>>> cloud. No one can afford downtime of the whole infrastructure just 
>>>>> for a glusterfs upgrade. Ideally a replicated gluster volume 
>>>>> should even be able to remain online and used during (at least a 
>>>>> minor version) upgrade.
>>>>>
>>>>> I don't know whether a heal was maybe buzzy when I started the 
>>>>> upgrade. I forgot to check. I did check the CPU activity on the 
>>>>> gluster nodes which were very low (in the 0.0X range via top), so 
>>>>> I doubt it. I will add this to the bug report as a suggestion 
>>>>> should they not be able to reproduce with an open NFS connection.
>>>>>
>>>>> By the way, is it sufficient to do:
>>>>> service glusterd stop
>>>>> service glusterfsd stop
>>>>> and do a:
>>>>> ps aux | gluster*
>>>>> to see if everything has stopped and kill any leftovers should 
>>>>> this be necessary?
>>>>>
>>>>> For the fix, do you agree that if I run e.g.:
>>>>> find /export/* -type f -size 0 -perm 1000 -exec /bin/rm {} \;
>>>>> on every node if /export is the location of all my bricks, also in 
>>>>> a replicated set-up, this will be save?
>>>>> No necessary 0bit files will be deleted in e.g. the .glusterfs of 
>>>>> every brick?
>>>>>
>>>>> Thanks for your support!
>>>>>
>>>>> Cheers,
>>>>> Olav
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 18/02/15 20:51, Joe Julian wrote:
>>>>>>
>>>>>> On 02/18/2015 11:43 AM, tbenzvi at 3vgeomatics.com wrote:
>>>>>>> Hi Olav,
>>>>>>>
>>>>>>> I have a hunch that our problem was caused by improper 
>>>>>>> unmounting of the gluster volume, and have since found that the 
>>>>>>> proper order should be: kill all jobs using volume -> unmount 
>>>>>>> volume on clients -> gluster volume stop -> stop gluster service 
>>>>>>> (if necessary)
>>>>>>> In my case, I wrote a Python script to find duplicate files on 
>>>>>>> the mounted volume, then delete the corresponding link files on 
>>>>>>> the bricks (making sure to also delete files in the .glusterfs 
>>>>>>> directory)
>>>>>>> However, your find command was also suggested to me and I think 
>>>>>>> it's a simpler solution. I believe removing all link files (even 
>>>>>>> ones that are not causing duplicates) is fine since the next 
>>>>>>> file access gluster will do a lookup on all bricks and recreate 
>>>>>>> any link files if necessary. Hopefully a gluster expert can 
>>>>>>> chime in on this point as I'm not completely sure.
>>>>>>
>>>>>> You are correct.
>>>>>>
>>>>>>> Keep in mind your setup is somewhat different than mine as I 
>>>>>>> have only 5 bricks with no replication.
>>>>>>> Regards,
>>>>>>> Tom
>>>>>>>
>>>>>>>     --------- Original Message ---------
>>>>>>>     Subject: Re: [Gluster-users] Hundreds of duplicate files
>>>>>>>     From: "Olav Peeters" <opeeters at gmail.com>
>>>>>>>     Date: 2/18/15 10:52 am
>>>>>>>     To: gluster-users at gluster.org, tbenzvi at 3vgeomatics.com
>>>>>>>
>>>>>>>     Hi all,
>>>>>>>     I'm have this problem after upgrading from 3.5.3 to 3.6.2.
>>>>>>>     At the moment I am still waiting for a heal to finish (on a
>>>>>>>     31TB volume with 42 bricks, replicated over three nodes).
>>>>>>>
>>>>>>>     Tom,
>>>>>>>     how did you remove the duplicates?
>>>>>>>     with 42 bricks I will not be able to do this manually..
>>>>>>>     Did a:
>>>>>>>     find $brick_root -type f -size 0 -perm 1000 -exec /bin/rm {} \;
>>>>>>>     work for you?
>>>>>>>
>>>>>>>     Should this type of thing ideally not be checked and mended
>>>>>>>     by a heal?
>>>>>>>
>>>>>>>     Does anyone have an idea yet how this happens in the first
>>>>>>>     place? Can it be connected to upgrading?
>>>>>>>
>>>>>>>     Cheers,
>>>>>>>     Olav
>>>>>>>
>>>>>>>       
>>>>>>>
>>>>>>>     On 01/01/15 03:07, tbenzvi at 3vgeomatics.com wrote:
>>>>>>>
>>>>>>>         No, the files can be read on a newly mounted client! I
>>>>>>>         went ahead and deleted all of the link files associated
>>>>>>>         with these duplicates, and then remounted the volume.
>>>>>>>         The problem is fixed!
>>>>>>>         Thanks again for the help, Joe and Vijay.
>>>>>>>         Tom
>>>>>>>
>>>>>>>             --------- Original Message ---------
>>>>>>>             Subject: Re: [Gluster-users] Hundreds of duplicate files
>>>>>>>             From: "Vijay Bellur" <vbellur at redhat.com>
>>>>>>>             Date: 12/28/14 3:23 am
>>>>>>>             To: tbenzvi at 3vgeomatics.com, gluster-users at gluster.org
>>>>>>>
>>>>>>>             On 12/28/2014 01:20 PM, tbenzvi at 3vgeomatics.com wrote:
>>>>>>>             > Hi Vijay,
>>>>>>>             > Yes the files are still readable from the
>>>>>>>             .glusterfs path.
>>>>>>>             > There is no explicit error. However, trying to
>>>>>>>             read a text file in
>>>>>>>             > python simply gives me null characters:
>>>>>>>             >
>>>>>>>             > >>> open('ott_mf_itab').readlines()
>>>>>>>             >
>>>>>>>             ['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']
>>>>>>>             >
>>>>>>>             > And reading binary files does the same
>>>>>>>             >
>>>>>>>
>>>>>>>             Is this behavior seen with a freshly mounted client too?
>>>>>>>
>>>>>>>             -Vijay
>>>>>>>
>>>>>>>             > --------- Original Message ---------
>>>>>>>             > Subject: Re: [Gluster-users] Hundreds of duplicate
>>>>>>>             files
>>>>>>>             > From: "Vijay Bellur" <vbellur at redhat.com>
>>>>>>>             > Date: 12/27/14 9:57 pm
>>>>>>>             > To: tbenzvi at 3vgeomatics.com, gluster-users at gluster.org
>>>>>>>             >
>>>>>>>             > On 12/28/2014 10:13 AM, tbenzvi at 3vgeomatics.com wrote:
>>>>>>>             > > Thanks Joe, I've read your blog post as well as
>>>>>>>             your post
>>>>>>>             > regarding the
>>>>>>>             > > .glusterfs directory.
>>>>>>>             > > I found some unneeded duplicate files which were
>>>>>>>             not being read
>>>>>>>             > > properly. I then deleted the link file from the
>>>>>>>             brick. This always
>>>>>>>             > > removes the duplicate file from the listing, but
>>>>>>>             the file does not
>>>>>>>             > > always become readable. If I also delete the
>>>>>>>             associated file in the
>>>>>>>             > > .glusterfs directory on that brick, then some
>>>>>>>             more files become
>>>>>>>             > > readable. However this solution still doesn't
>>>>>>>             work for all files.
>>>>>>>             > > I know the file on the brick is not corrupt as
>>>>>>>             it can be read
>>>>>>>             > directly
>>>>>>>             > > from the brick directory.
>>>>>>>             >
>>>>>>>             > For files that are not readable from the client,
>>>>>>>             can you check if the
>>>>>>>             > file is readable from the .glusterfs/ path?
>>>>>>>             >
>>>>>>>             > What is the specific error that is seen while
>>>>>>>             trying to read one such
>>>>>>>             > file from the client?
>>>>>>>             >
>>>>>>>             > Thanks,
>>>>>>>             > Vijay
>>>>>>>             >
>>>>>>>             >
>>>>>>>             >
>>>>>>>             > _______________________________________________
>>>>>>>             > Gluster-users mailing list
>>>>>>>             > Gluster-users at gluster.org
>>>>>>>             > http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>             >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         _______________________________________________
>>>>>>>         Gluster-users mailing list
>>>>>>>         Gluster-users at gluster.org
>>>>>>>         http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150221/279f2599/attachment.html>


More information about the Gluster-users mailing list