[Gluster-users] Targeted fix-layout?

Dan Bretherton d.a.bretherton at reading.ac.uk
Mon Jan 28 14:02:03 UTC 2013


> On 01/16/2013 02:56 PM, Dan Bretherton wrote:
>> On 01/15/2013 08:17 PM, gluster-users-request at gluster.org wrote:
>>> Date: Tue, 15 Jan 2013 15:17:00 -0500
>>> From: Jeff Darcy <jdarcy at redhat.com>
>>> To: gluster-users at gluster.org
>>> Subject: Re: [Gluster-users] Targeted fix-layout?
>>> Message-ID: <50F5B93C.5040802 at redhat.com>
>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>
>>> On 01/15/2013 01:10 PM, Dan Bretherton wrote:
>>>> I am running a fix-layout operation on a volume after seeing errors 
>>>> mentioning
>>>> "anomalies" and "holes" in the logs.  There is a particular 
>>>> directory that is
>>>> giving trouble and I would like to be able to run the layout fix on 
>>>> that
>>>> first.  Users are experiencing various I/O errors including 
>>>> "invalid argument"
>>>> and "Unknown error 526", but after running for a week the volume wide
>>>> fix-layout doesn't seem to have reached this particular directory yet.
>>>> Fix-layout takes a long time because there are millions of files in 
>>>> the volume
>>>> and the CPU load is consistently very high on all the servers while 
>>>> it is
>>>> running, sometimes over 20. Therefore I really need to find a way 
>>>> to target
>>>> particular directories or speed up the volume wide fix-layout.
>>> You should be able to do the following command on a client to fix 
>>> the layout
>>> for just one directory (it's the same xattr used by the rebalance 
>>> command).
>>>
>>>     setfattr -n distribute.fix.layout -v "anything" /bad/directory
>>>> I have no idea what caused these errors but it could be related to 
>>>> the previous
>>>> fix-layout operation, which I started following the addition of a 
>>>> new pair of
>>>> bricks, not having completed successfully.  The problem is that the 
>>>> rebalance
>>>> operation on one or more servers often fails before completing and 
>>>> there is no
>>>> way (that I know of) to restart or resume the process on one 
>>>> server.  Every
>>>> time this happens I stop the fix-layout and start it again, but it 
>>>> has never
>>>> completed successfully on every server despite sometimes running 
>>>> for several
>>>> weeks.
>>>>
>>>> One other possible cause I can think of is my recent policy of 
>>>> using XFS for
>>>> new bricks instead of ext4.  The reason I think this might be 
>>>> causing the
>>>> problem is that none of the other volumes have any XFS bricks yet 
>>>> and they
>>>> aren't experiencing any I/O errors.  Are there any special mount 
>>>> options
>>>> required for XFS, and is there any reason why a volume shouldn't 
>>>> contain a
>>>> mixture of ext4 and XFS bricks?
>>> It doesn't seem like that should be a problem, but maybe someone 
>>> else knows
>>> something about ext4/XFS differences that could shed some light.
>> Thanks Jeff, I'll give that a try.
>>
>> Should the xattr name be trusted.distribute.fix.layout by the way? 
>> When I try with distribute.fix.layout I get the error "Operation not 
>> supported".
>>
>> -Dan.
>
> On 01/16/2013 09:56 AM, Dan Bretherton wrote:
> >/  Should the xattr name be trusted.distribute.fix.layout by the way? When
> />/  I try with distribute.fix.layout I get the error "Operation not supported".
> /
> I just re-examined and re-ran the code, and distribute.fix.layout does
> seem to be correct.  You're doing this on the client side, right?  The
> other thing to check would be versions, since I hardly ever run a
> version that's more than a day old and that often means I'm using
> features that haven't made it into a release yet.  I think that one has
> been there for a while, though.
>
Thanks for checking the code for me.  I wasn't doing it on the client - 
thanks for pointing out my mistake.  I have tested he targeted fix 
layout on a test volume and verified that there weren't any detrimental 
effects, but I don't know how to reproduce the layout errors I am seeing 
in the production volume in order find out if the targeted layout fix 
actually works.  Unfortunately I haven't been able to try it on the 
production volume because the owner doesn't want to risk it.  He is also 
concerned about performance deteriorating any more, given that the 
volume wide layout fix is still going and still slowing things down a lot.

I had to extend another volume a couple of weeks ago and the layout fix 
for that one is now running at the same time.  One server now has a load 
of over 70 most of the time (mostly glusterfsd), but none of the others 
seem to be particularly busy.  I restarted the server in question but 
the CPU load quickly went up to 70 again.  I can't see any particular 
reason why this one server should be so badly affected by the layout 
fixing processes.  It isn't a particularly big server, with only five 
3TB bricks involved in the two volumes that were extended.  I can't help 
thinking that this is the reason why the volume layout fixes are taking 
such a long time, even though the rebalance processes run on all the 
servers simultaneously.  Can anyone suggest a way to troubleshoot this 
problem?  The rebalance logs don't show anything unusual but 
glustershd.log has a lot of metadata split-brain warnings.   The brick 
logs are full of scary looking warnings but none flagged 'E' or 'C'.  
The trouble is that I see messages like these on all the servers, and I 
can find nothing unusual about the server with a CPU load of 70.

-Dan.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130128/b1e5534e/attachment.html>


More information about the Gluster-users mailing list