[Gluster-users] cleaning up duplicate files

Dan Bretherton d.a.bretherton at reading.ac.uk
Mon Feb 27 11:52:36 UTC 2012

Hello Todd and Gluster-users,

The same thing happened to one of my volumes the last time I tried a 
rebalance...migrate-data operation.  I reported it to the list here: 

Fortunately it happened to a volume I was using mainly for backups, so I 
decided to start again from scratch rather than try to clean up the 
volume.  I would really like to have a working migrate-data feature 
because my volumes have all been expanded many times without 
migrate-data being performed.  I am worried that it might never be 
possible to do it successfully now that most of the files are on the 
wrong bricks.

I came across "multiple subvolumes" errors on another occasion when 
migrate-data had not been performed, and that time only a handful of 
files were affected so I was able to clean up the errors manually.  One 
version of each duplicated file was zero bytes, so it was easy to decide 
which were the correct versions.  I have no idea what caused the zero 
byte versions to be created, but I thought it might have been the legacy 
of GFID related bugs in earlier versions of GlusterFS. There were 
several occasions when I had problems running fix-layout after expanding 
a volume, and I thought this might have messed up the extended 
attributes enough to end up with files of the same name on different 
bricks. I did also wonder if the zero byte duplicates might have been 
created because glusterd crashed or stopped responding, but I couldn't 
find anything in the logs to support this theory.


On 02/26/2012 07:00 PM, gluster-users-request at gluster.org wrote:
> Date: Sun, 26 Feb 2012 11:17:53 -0500 (EST)
> From: Todd Pfaff<pfaff at rhpcs.mcmaster.ca>
> Subject: [Gluster-users] cleaning up duplicate files
> To:gluster-users at gluster.org
> Message-ID:
> 	<alpine.LMD.2.00.1202261043320.29413 at rhpcserv.rhpcs.mcmaster.ca>
> Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
> I'm using gluster 3.2.5.  I have a situation where I've somehow gotten
> multiple copies of some files on back-end bricks that are members of the
> same distribute volume set.  Accessing these files from the front-end
> volume results in an Input/Output error.  I don't know how I got into
> this situation and I don't really care about that at the moment.  I'd
> just like to fix the problem now without having to go to the extreme
> of removing everything from the bricks.
> I'd do the fixing manually if it were a small number of files but there
> are thousands.
> Is there any gluster operation that can automatically fix such cases?
> Alternatively, short of removing everything from back-end bricks and
> starting from a clean slate, has anyone written code to find and fix such
> duplicate files?
> Fortunately these files are backups so if I do have to remove them
> completely the primary copy still exists elsewhere.
> Regards,
> Todd

More information about the Gluster-users mailing list