[Gluster-users] cleaning up duplicate files
d.a.bretherton at reading.ac.uk
Mon Feb 27 11:52:36 UTC 2012
Hello Todd and Gluster-users,
The same thing happened to one of my volumes the last time I tried a
rebalance...migrate-data operation. I reported it to the list here:
Fortunately it happened to a volume I was using mainly for backups, so I
decided to start again from scratch rather than try to clean up the
volume. I would really like to have a working migrate-data feature
because my volumes have all been expanded many times without
migrate-data being performed. I am worried that it might never be
possible to do it successfully now that most of the files are on the
I came across "multiple subvolumes" errors on another occasion when
migrate-data had not been performed, and that time only a handful of
files were affected so I was able to clean up the errors manually. One
version of each duplicated file was zero bytes, so it was easy to decide
which were the correct versions. I have no idea what caused the zero
byte versions to be created, but I thought it might have been the legacy
of GFID related bugs in earlier versions of GlusterFS. There were
several occasions when I had problems running fix-layout after expanding
a volume, and I thought this might have messed up the extended
attributes enough to end up with files of the same name on different
bricks. I did also wonder if the zero byte duplicates might have been
created because glusterd crashed or stopped responding, but I couldn't
find anything in the logs to support this theory.
On 02/26/2012 07:00 PM, gluster-users-request at gluster.org wrote:
> Date: Sun, 26 Feb 2012 11:17:53 -0500 (EST)
> From: Todd Pfaff<pfaff at rhpcs.mcmaster.ca>
> Subject: [Gluster-users] cleaning up duplicate files
> To:gluster-users at gluster.org
> <alpine.LMD.2.00.1202261043320.29413 at rhpcserv.rhpcs.mcmaster.ca>
> Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
> I'm using gluster 3.2.5. I have a situation where I've somehow gotten
> multiple copies of some files on back-end bricks that are members of the
> same distribute volume set. Accessing these files from the front-end
> volume results in an Input/Output error. I don't know how I got into
> this situation and I don't really care about that at the moment. I'd
> just like to fix the problem now without having to go to the extreme
> of removing everything from the bricks.
> I'd do the fixing manually if it were a small number of files but there
> are thousands.
> Is there any gluster operation that can automatically fix such cases?
> Alternatively, short of removing everything from back-end bricks and
> starting from a clean slate, has anyone written code to find and fix such
> duplicate files?
> Fortunately these files are backups so if I do have to remove them
> completely the primary copy still exists elsewhere.
More information about the Gluster-users