[Gluster-users] "mismatching layouts" errors after expanding volume
Dan Bretherton
d.a.bretherton at reading.ac.uk
Wed Feb 22 12:22:08 UTC 2012
Hello All-
I would really appreciate a quick Yes/No answer to the most important
question - is it safe to create, modify and delete files in a volume
during a fix-layout operation after an expansion?
The users are champing at the bit waiting for me to let them have write
access, but fix-layout is likely to take several days based on previous
experience.
-Dan
On 02/22/2012 02:52 AM, Dan Bretherton wrote:
> Dear All-
> There are a lot of the following type of errors in my client and NFS
> logs following a recent volume expansion.
>
> [2012-02-16 22:59:42.504907] I
> [dht-layout.c:682:dht_layout_dir_mismatch] 0-atmos-dht: subvol:
> atmos-replicate-0; inode layout - 0 - 0; disk layout - 9203501
> 34 - 1227133511
> [2012-02-16 22:59:42.534399] I [dht-common.c:524:dht_revalidate_cbk]
> 0-atmos-dht: mismatching layouts for /users/rle/TRACKTEMP/TRACKS
> [2012-02-16 22:59:42.534521] I
> [dht-layout.c:682:dht_layout_dir_mismatch] 0-atmos-dht: subvol:
> atmos-replicate-1; inode layout - 0 - 0; disk layout - 1227133
> 512 - 1533916889
>
> I have expanded the volume successfully many times in the past. I can
> think of several possible reasons why this one might have gone wrong,
> but without expert advice I am just guessing.
>
> 1) I did precautionary ext4 filesystem checks on all the bricks and
> found errors on some of them, mostly things like this:
>
> Pass 1: Checking inodes, blocks, and sizes
> Inode 104386076, i_blocks is 3317792, should be 3317800. Fix? yes
>
> 2) I always use hostname.domain for new GlusterFS servers when doing
> "gluster peer probe HOSTNAME" (e.g. gluster peer probe
> bdan14.nerc-essc.ac.uk). I normally use hostname.domain (e.g.
> bdan14.nerc-essc.ac.uk) when creating volumes or adding bricks as
> well, but for the last brick I added I just used the hostname
> (bdan14). I can do "ping bdan14" from all the servers and clients,
> and the only access to the volume from outside my subnetwork is via NFS.
>
> 3) I found some old GlusterFS client processes still running, probably
> left over from previous occasions when the volume was auto-mounted. I
> have seen this before and I don't know why it happens, but normally I
> just kill unwanted glusterfs processes without affecting the mount.
>
> 4) I recently started using more than one server to export the volume
> via NFS in order to spread the load. In other words, two NFS clients
> may mount the same volume exported from two different servers. I don't
> remember reading anywhere that this is not allowed, but as this is a
> recent change I thought it would be worth checking.
>
> 5) I normally let people carry on using a volume while a fix-layout
> process is going on in the background. I don't remember reading that
> this is not allowed but I thought it worth checking. I don't do
> migrate-data after fix-layout because it doesn't work on my cluster.
> Normally the fix-layout completes without error and no "mismatching
> layout" errors are observed. However the volume is now so large that
> fix-layout usually takes several days to complete, and that means that
> a lot more files are created and modified during fix-layout than
> before. Could the continued use of the volume during the lengthy
> fix-layout be causing the layout errors?
>
> I have run fix-layout 3 times now and the second attempt crashed. All
> I can think of doing is to try again now that several back-end
> filesystems have been repaired. Could any of the above factors have
> caused the layout errors, and can anyone suggest a better way to
> remove them? All comments and suggestions would be much appreciated.
>
> Regards
> Dan.
More information about the Gluster-users
mailing list