[Gluster-users] Hundreds of duplicate files
Olav Peeters
opeeters at gmail.com
Sat Feb 21 00:37:20 UTC 2015
It look even worse than I had feared.. :-(
This really is a crazy bug.
If I understand you correctly, the only sane pairing of the xattrs is of
the two 0-bit files, since this is the full list of bricks:
root at gluster01 ~]# gluster volume info
Volume Name: sr_vol01
Type: Distributed-Replicate
Volume ID: c6d6147e-2d91-4d98-b8d9-ba05ec7e4ad6
Status: Started
Number of Bricks: 21 x 2 = 42
Transport-type: tcp
Bricks:
Brick1: gluster01:/export/brick1gfs01
Brick2: gluster02:/export/brick1gfs02
Brick3: gluster01:/export/brick4gfs01
Brick4: gluster03:/export/brick4gfs03
Brick5: gluster02:/export/brick4gfs02
Brick6: gluster03:/export/brick1gfs03
Brick7: gluster01:/export/brick2gfs01
Brick8: gluster02:/export/brick2gfs02
Brick9: gluster01:/export/brick5gfs01
Brick10: gluster03:/export/brick5gfs03
Brick11: gluster02:/export/brick5gfs02
Brick12: gluster03:/export/brick2gfs03
Brick13: gluster01:/export/brick3gfs01
Brick14: gluster02:/export/brick3gfs02
Brick15: gluster01:/export/brick6gfs01
Brick16: gluster03:/export/brick6gfs03
Brick17: gluster02:/export/brick6gfs02
Brick18: gluster03:/export/brick3gfs03
Brick19: gluster01:/export/brick8gfs01
Brick20: gluster02:/export/brick8gfs02
Brick21: gluster01:/export/brick9gfs01
Brick22: gluster02:/export/brick9gfs02
Brick23: gluster01:/export/brick10gfs01
Brick24: gluster03:/export/brick10gfs03
Brick25: gluster01:/export/brick11gfs01
Brick26: gluster03:/export/brick11gfs03
Brick27: gluster02:/export/brick10gfs02
Brick28: gluster03:/export/brick8gfs03
Brick29: gluster02:/export/brick11gfs02
Brick30: gluster03:/export/brick9gfs03
Brick31: gluster01:/export/brick12gfs01
Brick32: gluster02:/export/brick12gfs02
Brick33: gluster01:/export/brick13gfs01
Brick34: gluster02:/export/brick13gfs02
Brick35: gluster01:/export/brick14gfs01
Brick36: gluster03:/export/brick14gfs03
Brick37: gluster01:/export/brick15gfs01
Brick38: gluster03:/export/brick15gfs03
Brick39: gluster02:/export/brick14gfs02
Brick40: gluster03:/export/brick12gfs03
Brick41: gluster02:/export/brick15gfs02
Brick42: gluster03:/export/brick13gfs03
The two 0-bit files are on brick 35 and 36 as the getfattr correctly lists.
Another sane pairing could be this (if the first file did not also refer
to client-34 and client-35):
[root at gluster01 ~]# getfattr -m . -d -e hex
/export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file:
export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-32=0x000000000000000000000000
trusted.afr.sr_vol01-client-33=0x000000000000000000000000
trusted.afr.sr_vol01-client-34=0x000000000000000000000000
trusted.afr.sr_vol01-client-35=0x000000010000000100000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
[root at gluster02 ~]# getfattr -m . -d -e hex
/export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
getfattr: Removing leading '/' from absolute path names
# file:
export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.sr_vol01-client-32=0x000000000000000000000000
trusted.afr.sr_vol01-client-33=0x000000000000000000000000
trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
But why is the security.selinux hash different?
You mention hostname changes..
I noticed that if I do a listing of available shared storages on one of
the XenServer I get:
uuid ( RO) : 272b2366-dfbf-ad47-2a0f-5d5cc40863e3
name-label ( RW): gluster_store
name-description ( RW): NFS SR [gluster01.irceline.be:/sr_vol01]
host ( RO): <shared>
type ( RO): nfs
content-type ( RO):
if I do normal general linux:
[root at same_story_on_both_xenserver ~]# mount
gluster02.irceline.be:/sr_vol01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3 on
/var/run/sr-mount/272b2366-dfbf-ad47-2a0f-5d5cc40863e3 type nfs
(rw,soft,timeo=133,retrans=2147483647,tcp,noac,addr=192.168.0.72)
Originally the mount was done on gluster01 (ip 192.168.0.71) as the
name-description of the xe sr-list indicates..
It is as though when gluster01 was not available for a couple of
minutes, the NFS mount internally was somehow automatically reconfigured
to gluster02, but NFS cannot do this as far as I know (unless there is
some fail-over mechanism - I never configured this). There also is no
load-balancing between client and server.
If gluster01 is not available, the gluster volume should not have been
available, end of story.. But from perspective of a client the NFS could
be to any one of the three gluster nodes. The client should see exactly
the same data..
So a rebalance in the current state could do more harm than good?
I launched a second rebalance in the hope that the system would mend
itself after all...
Thanks a million for your support in this darkest hour of my time as a
glusterfs user :-)
Cheers,
Olav
On 20/02/15 23:10, Joe Julian wrote:
>
> On 02/20/2015 01:47 PM, Olav Peeters wrote:
>> Thanks Joe,
>> for the answers!
>>
>> I was not clear enough about the set up apparently.
>> The Gluster cluster consist of 3 nodes with each 14 bricks. The
>> bricks are formatted as xfs, mounted locally as xfs. There is one
>> volume, type: Distributed-Replicate (replica 2). The configuration is
>> so that bricks are mirrored on two different nodes.
>>
>> The NFS mount which was alive but not used during reboot when the
>> problem started are from clients (2 XenServer machines configured as
>> a pool - a shared storage set-up). The comparisons I give below are
>> between (other) clients mounting via either glusterfs or NFS. Similar
>> problem with the exception that the first listing (via ls) after a
>> fresh mount via NFS actually does find the files with data. A second
>> listing only finds the 0 bit file with the same name.
>>
>> So all the 0bit files in mode 0644 can be safely removed?
> Probably? Is it likely that you have any empty files? I don't know.
>>
>> Why do I see three files with the same name (and modification
>> timestamp etc.) via either a glusterfs or NFS mount from a client?
>> Deleting one of the three will probably not solve the issue either..
>> this seems to me an indexing issue in the gluster cluster.
> Very good question. I don't know. The xattrs tell a strange story that
> I haven't seen before. One legit file shows sr_vol01-client-32 and 33.
> This would be normal, assuming the filename hash would put it on that
> replica pair (we can't tell since the rebalance has changed the hash
> map). Another file shows sr_vol01-client-32, 33, 34, and 35 with
> pending updates scheduled for 35. I have no idea which brick this is
> (see "gluster volume info" and map the digits (35) with the bricks
> offset by 1 (client-35 is brick 36). That last one is on 40,41.
>
> I don't know how these files all got on different replica sets. My
> speculations include hostname changes, long-running net-split
> conditions with different dht maps (failed rebalances), moved bricks,
> load balancers between client and server, mercury in retrograde (lol)...
>
>> How do I get Gluster to replicate the files correctly, only 2
>> versions of the same file, not three, and on two bricks on different
>> machines?
>>
>
> Identify which replica is correct by using the little python script at
> http://joejulian.name/blog/dht-misses-are-expensive/ to get the hash
> of the filename. Examine the dht map to see which replica pair
> *should* have that hash and remove the others (and their hardlink in
> .glusterfs). There is no 1-liner that's going to do this. I would
> probably script the logic in python, have it print out what it was
> going to do, check that for sanity and, if sane, execute it.
>
> But mostly figure out how Bricks 32 and/or 33 can become 34 and/or 35
> and/or 40 and/or 41. That's the root of the whole problem.
>
>> Cheers,
>> Olav
>>
>>
>>
>>
>> On 20/02/15 21:51, Joe Julian wrote:
>>>
>>> On 02/20/2015 12:21 PM, Olav Peeters wrote:
>>>> Let's take one file (3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd) as
>>>> an example...
>>>> On the 3 nodes where all bricks are formatted as XFS and mounted in
>>>> /export and 272b2366-dfbf-ad47-2a0f-5d5cc40863e3 is the mounting
>>>> point of a NFS shared storage connection from XenServer machines:
>>> Did I just read this correctly? Your bricks are NFS mounts? ie,
>>> GlusterFS Client <-> GlusterFS Server <-> NFS <-> XFS
>>>>
>>>> [root at gluster01 ~]# find
>>>> /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name '300*' -exec
>>>> ls -la {} \;
>>>> -rw-r--r--. 2 root root 44332659200 Feb 17 23:55
>>>> /export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>> Supposedly, this is the actual file.
>>>> -rw-r--r--. 2 root root 0 Feb 18 00:51
>>>> /export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>> This is not a linkfile. Note it's mode 0644. How it got there with
>>> those permissions would be a matter of history and would require
>>> information that's probably lost.
>>>>
>>>> root at gluster02 ~]# find
>>>> /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name '300*' -exec
>>>> ls -la {} \;
>>>> -rw-r--r--. 2 root root 44332659200 Feb 17 23:55
>>>> /export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>
>>>> [root at gluster03 ~]# find
>>>> /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name '300*' -exec
>>>> ls -la {} \;
>>>> -rw-r--r--. 2 root root 44332659200 Feb 17 23:55
>>>> /export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> -rw-r--r--. 2 root root 0 Feb 18 00:51
>>>> /export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>> Same analysis as above.
>>>>
>>>> 3 files with information, 2 x a 0-bit file with the same name
>>>>
>>>> Checking the 0-bit files:
>>>> [root at gluster01 ~]# getfattr -m . -d -e hex
>>>> /export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> getfattr: Removing leading '/' from absolute path names
>>>> # file:
>>>> export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-34=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-35=0x000000000000000000000000
>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>
>>>> [root at gluster03 ~]# getfattr -m . -d -e hex
>>>> /export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> getfattr: Removing leading '/' from absolute path names
>>>> # file:
>>>> export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-34=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-35=0x000000000000000000000000
>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>
>>>> This is not a glusterfs link file since there is no
>>>> "trusted.glusterfs.dht.linkto", am I correct?
>>> You are correct.
>>>>
>>>> And checking the "good" files:
>>>>
>>>> # file:
>>>> export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-32=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-33=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-34=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-35=0x000000010000000100000000
>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>
>>>> [root at gluster02 ~]# getfattr -m . -d -e hex
>>>> /export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> getfattr: Removing leading '/' from absolute path names
>>>> # file:
>>>> export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-32=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-33=0x000000000000000000000000
>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>
>>>> [root at gluster03 ~]# getfattr -m . -d -e hex
>>>> /export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> getfattr: Removing leading '/' from absolute path names
>>>> # file:
>>>> export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-40=0x000000000000000000000000
>>>> trusted.afr.sr_vol01-client-41=0x000000000000000000000000
>>>> trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417
>>>>
>>>>
>>>>
>>>> Seen from a client via a glusterfs mount:
>>>> [root at client ~]# ls -al
>>>> /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51
>>>> /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51
>>>> /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51
>>>> /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>
>>>>
>>>>
>>>> Via NFS (just after performing a umount and mount the volume again):
>>>> [root at client ~]# ls -al
>>>> /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
>>>> -rw-r--r--. 1 root root 44332659200 Feb 17 23:55
>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> -rw-r--r--. 1 root root 44332659200 Feb 17 23:55
>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> -rw-r--r--. 1 root root 44332659200 Feb 17 23:55
>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>
>>>> Doing the same list a couple of seconds later:
>>>> [root at client ~]# ls -al
>>>> /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51
>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51
>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51
>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> And again, and again, and again:
>>>> [root at client ~]# ls -al
>>>> /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*
>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51
>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51
>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>> -rw-r--r--. 1 root root 0 Feb 18 00:51
>>>> /mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd
>>>>
>>>> This really seems odd. Why do we get to see "real data file" once only?
>>>>
>>>> It seems more and more that this crazy file duplication (and
>>>> writing of sticky bit files) was actually triggered when rebooting
>>>> one of the three nodes while there still is an active (even when
>>>> there is no data exchange at all) NFS connection, since all 0-bit
>>>> files (of the non Sticky bit type) were either created at 00:51 or
>>>> 00:41, the exact moment one of the three nodes in the cluster were
>>>> rebooted. This would mean that replication currently with GlusterFS
>>>> creates hardly any redundancy. Quiet the opposite, if one of the
>>>> machines goes down, all of your data seriously gets disorganised. I
>>>> am buzzy configuring a test installation to see how this can be
>>>> best reproduced for a bug report..
>>>>
>>>> Does anyone have a suggestion how to best get rid of the
>>>> duplicates, or rather get this mess organised the way it should be?
>>>> This is a cluster with millions of files. A rebalance does not fix
>>>> the issue, neither does a rebalance fix-layout help. Since this is
>>>> a replicated volume all files should be their 2x, not 3x. Can I
>>>> safely just remove all the 0 bit files outside of the .glusterfs
>>>> directory including the sticky bit files?
>>>>
>>>> The empty 0 bit files outside of .glusterfs on every brick I can
>>>> probably safely removed like this:
>>>> find /export/* -path */.glusterfs -prune -o -type f -size 0 -perm
>>>> 1000 -exec rm {} \;
>>>> not?
>>>>
>>>> Thanks!
>>>>
>>>> Cheers,
>>>> Olav
>>>> On 18/02/15 22:10, Olav Peeters wrote:
>>>>> Thanks Tom and Joe,
>>>>> for the fast response!
>>>>>
>>>>> Before I started my upgrade I stopped all clients using the volume
>>>>> and stopped all VM's with VHD on the volume, but I guess, and this
>>>>> may be the missing thing to reproduce this in a lab, I did not
>>>>> detach a NFS shared storage mount from a XenServer pool to this
>>>>> volume, since this is an extremely risky business. I also did not
>>>>> stop the volume. This I guess was a bit stupid, but since I did
>>>>> upgrades in the past this way without any issues I skipped this
>>>>> step (a really bad habit). I'll make amends and file a proper bug
>>>>> report :-). I agree with you Joe, this should never happen, even
>>>>> when someone ignores the advice of stopping the volume. If it
>>>>> would also be nessessary to detach shared storage NFS connections
>>>>> to a volume, than franky, glusterfs is unusable in a private
>>>>> cloud. No one can afford downtime of the whole infrastructure just
>>>>> for a glusterfs upgrade. Ideally a replicated gluster volume
>>>>> should even be able to remain online and used during (at least a
>>>>> minor version) upgrade.
>>>>>
>>>>> I don't know whether a heal was maybe buzzy when I started the
>>>>> upgrade. I forgot to check. I did check the CPU activity on the
>>>>> gluster nodes which were very low (in the 0.0X range via top), so
>>>>> I doubt it. I will add this to the bug report as a suggestion
>>>>> should they not be able to reproduce with an open NFS connection.
>>>>>
>>>>> By the way, is it sufficient to do:
>>>>> service glusterd stop
>>>>> service glusterfsd stop
>>>>> and do a:
>>>>> ps aux | gluster*
>>>>> to see if everything has stopped and kill any leftovers should
>>>>> this be necessary?
>>>>>
>>>>> For the fix, do you agree that if I run e.g.:
>>>>> find /export/* -type f -size 0 -perm 1000 -exec /bin/rm {} \;
>>>>> on every node if /export is the location of all my bricks, also in
>>>>> a replicated set-up, this will be save?
>>>>> No necessary 0bit files will be deleted in e.g. the .glusterfs of
>>>>> every brick?
>>>>>
>>>>> Thanks for your support!
>>>>>
>>>>> Cheers,
>>>>> Olav
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 18/02/15 20:51, Joe Julian wrote:
>>>>>>
>>>>>> On 02/18/2015 11:43 AM, tbenzvi at 3vgeomatics.com wrote:
>>>>>>> Hi Olav,
>>>>>>>
>>>>>>> I have a hunch that our problem was caused by improper
>>>>>>> unmounting of the gluster volume, and have since found that the
>>>>>>> proper order should be: kill all jobs using volume -> unmount
>>>>>>> volume on clients -> gluster volume stop -> stop gluster service
>>>>>>> (if necessary)
>>>>>>> In my case, I wrote a Python script to find duplicate files on
>>>>>>> the mounted volume, then delete the corresponding link files on
>>>>>>> the bricks (making sure to also delete files in the .glusterfs
>>>>>>> directory)
>>>>>>> However, your find command was also suggested to me and I think
>>>>>>> it's a simpler solution. I believe removing all link files (even
>>>>>>> ones that are not causing duplicates) is fine since the next
>>>>>>> file access gluster will do a lookup on all bricks and recreate
>>>>>>> any link files if necessary. Hopefully a gluster expert can
>>>>>>> chime in on this point as I'm not completely sure.
>>>>>>
>>>>>> You are correct.
>>>>>>
>>>>>>> Keep in mind your setup is somewhat different than mine as I
>>>>>>> have only 5 bricks with no replication.
>>>>>>> Regards,
>>>>>>> Tom
>>>>>>>
>>>>>>> --------- Original Message ---------
>>>>>>> Subject: Re: [Gluster-users] Hundreds of duplicate files
>>>>>>> From: "Olav Peeters" <opeeters at gmail.com>
>>>>>>> Date: 2/18/15 10:52 am
>>>>>>> To: gluster-users at gluster.org, tbenzvi at 3vgeomatics.com
>>>>>>>
>>>>>>> Hi all,
>>>>>>> I'm have this problem after upgrading from 3.5.3 to 3.6.2.
>>>>>>> At the moment I am still waiting for a heal to finish (on a
>>>>>>> 31TB volume with 42 bricks, replicated over three nodes).
>>>>>>>
>>>>>>> Tom,
>>>>>>> how did you remove the duplicates?
>>>>>>> with 42 bricks I will not be able to do this manually..
>>>>>>> Did a:
>>>>>>> find $brick_root -type f -size 0 -perm 1000 -exec /bin/rm {} \;
>>>>>>> work for you?
>>>>>>>
>>>>>>> Should this type of thing ideally not be checked and mended
>>>>>>> by a heal?
>>>>>>>
>>>>>>> Does anyone have an idea yet how this happens in the first
>>>>>>> place? Can it be connected to upgrading?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Olav
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 01/01/15 03:07, tbenzvi at 3vgeomatics.com wrote:
>>>>>>>
>>>>>>> No, the files can be read on a newly mounted client! I
>>>>>>> went ahead and deleted all of the link files associated
>>>>>>> with these duplicates, and then remounted the volume.
>>>>>>> The problem is fixed!
>>>>>>> Thanks again for the help, Joe and Vijay.
>>>>>>> Tom
>>>>>>>
>>>>>>> --------- Original Message ---------
>>>>>>> Subject: Re: [Gluster-users] Hundreds of duplicate files
>>>>>>> From: "Vijay Bellur" <vbellur at redhat.com>
>>>>>>> Date: 12/28/14 3:23 am
>>>>>>> To: tbenzvi at 3vgeomatics.com, gluster-users at gluster.org
>>>>>>>
>>>>>>> On 12/28/2014 01:20 PM, tbenzvi at 3vgeomatics.com wrote:
>>>>>>> > Hi Vijay,
>>>>>>> > Yes the files are still readable from the
>>>>>>> .glusterfs path.
>>>>>>> > There is no explicit error. However, trying to
>>>>>>> read a text file in
>>>>>>> > python simply gives me null characters:
>>>>>>> >
>>>>>>> > >>> open('ott_mf_itab').readlines()
>>>>>>> >
>>>>>>> ['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']
>>>>>>> >
>>>>>>> > And reading binary files does the same
>>>>>>> >
>>>>>>>
>>>>>>> Is this behavior seen with a freshly mounted client too?
>>>>>>>
>>>>>>> -Vijay
>>>>>>>
>>>>>>> > --------- Original Message ---------
>>>>>>> > Subject: Re: [Gluster-users] Hundreds of duplicate
>>>>>>> files
>>>>>>> > From: "Vijay Bellur" <vbellur at redhat.com>
>>>>>>> > Date: 12/27/14 9:57 pm
>>>>>>> > To: tbenzvi at 3vgeomatics.com, gluster-users at gluster.org
>>>>>>> >
>>>>>>> > On 12/28/2014 10:13 AM, tbenzvi at 3vgeomatics.com wrote:
>>>>>>> > > Thanks Joe, I've read your blog post as well as
>>>>>>> your post
>>>>>>> > regarding the
>>>>>>> > > .glusterfs directory.
>>>>>>> > > I found some unneeded duplicate files which were
>>>>>>> not being read
>>>>>>> > > properly. I then deleted the link file from the
>>>>>>> brick. This always
>>>>>>> > > removes the duplicate file from the listing, but
>>>>>>> the file does not
>>>>>>> > > always become readable. If I also delete the
>>>>>>> associated file in the
>>>>>>> > > .glusterfs directory on that brick, then some
>>>>>>> more files become
>>>>>>> > > readable. However this solution still doesn't
>>>>>>> work for all files.
>>>>>>> > > I know the file on the brick is not corrupt as
>>>>>>> it can be read
>>>>>>> > directly
>>>>>>> > > from the brick directory.
>>>>>>> >
>>>>>>> > For files that are not readable from the client,
>>>>>>> can you check if the
>>>>>>> > file is readable from the .glusterfs/ path?
>>>>>>> >
>>>>>>> > What is the specific error that is seen while
>>>>>>> trying to read one such
>>>>>>> > file from the client?
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Vijay
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > _______________________________________________
>>>>>>> > Gluster-users mailing list
>>>>>>> > Gluster-users at gluster.org
>>>>>>> > http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150221/279f2599/attachment.html>
More information about the Gluster-users
mailing list