[Gluster-users] [SPAM?] Re: strange error hangs hangs any access to gluster mount

Mon Apr 4 19:02:48 UTC 2011

Sadly, this did not fix things. <sigh>

My brick xattrs now look like this:

http://pastebin.com/2p4iaZq3

And here is the debug output from a client where I restarted the gluster client while the diagnostics.client-log-level DEBUG was set

http://pastebin.com/5pjwxwsj

I'm at somewhat of a loss. Any help would be greatly appreciated.

Thanks in advance to all.

James Burnash, Unix Engineering

-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash, James
Sent: Thursday, March 31, 2011 6:03 AM
To: 'amar at gluster.com'
Cc: 'gluster-users at gluster.org'
Subject: [SPAM?] Re: [Gluster-users] strange error hangs hangs any access to gluster mount
Importance: Low

Amar,

Thank you so much! I have a big mEeting today with the customer, and having this solved will go a long way towards making them happier.

James

From: Amar Tumballi [mailto:amar at gluster.com]
Sent: Thursday, March 31, 2011 02:33 AM
To: Burnash, James
Cc: Jeff Darcy <jdarcy at redhat.com>; gluster-users at gluster.org <gluster-users at gluster.org>
Subject: Re: [Gluster-users] strange error hangs hangs any access to gluster mount

Hi James,

To fix this, you can go to any one pair backend and run below commands on the directories where the layout has issues:

bash# setfattr -x trusted.glusterfs.dht <directory>

[ pair backend means, in replica set's volumes ]

and then from the client machine (ie, where you have mount point), run below commands,

 bash# echo 3 > /proc/sys/vm/drop_caches  bash# stat <directory> # through the mount point.

In this step, the layout will get fixed again automatically, which should solve this issue.

Regards,
Amar

On Tue, Mar 29, 2011 at 12:45 AM, Burnash, James <jburnash at knight.com<mailto:jburnash at knight.com>> wrote:
Thanks Jeff. That at least gives me shot at figuring out some similar problems.

It's possible that in the course of bringing up the mirrors initially I futzed something up. I'll have to check the read-write servers as well.

James Burnash, Unix Engineering

-----Original Message-----
From: Jeff Darcy [mailto:jdarcy at redhat.com<mailto:jdarcy at redhat.com>]
Sent: Monday, March 28, 2011 3:09 PM
To: Burnash, James
Cc: gluster-users at gluster.org<mailto:gluster-users at gluster.org>
Subject: Re: [Gluster-users] strange error hangs hangs any access to gluster mount

On 03/28/2011 02:29 PM, Burnash, James wrote:
> Sorry - paste went awry.
>
> Updated here:
>
> http://pastebin.com/M74LAYej

OK, that definitely shows a problem.  Here's the whole map of which nodes are claiming which ranges:

00000000 0ccccccb: g07 on gfs17/gfs18
0ccccccc 19999997: g08 on gfs17/gfs18
19999998 26666663: g09 on gfs17/gfs18
26666664 3333332f: g10 on gfs17/gfs18
33333330 3ffffffb: g01 on gfs17/gfs18
3ffffffc 4cccccc7: g02 on gfs17/gfs18
4cccccc8 59999993: g01 on gfs14/gfs14
59999994 6666665f: g02 on gfs14/gfs14
66666660 7333332b: g03 on gfs14/gfs14
7333332c 7ffffff7: g04 on gfs14/gfs14
7ffffff8 8cccccc3: g05 on gfs14/gfs14
8cccccc4 9999998f: g06 on gfs14/gfs14
99999990 a666665b: g07 on gfs14/gfs14
a666665c b3333327: g08 on gfs14/gfs14
b3333328 b333332e: g09 on gfs14/gfs14
b333332f bffffff3: g09 on gfs14/gfs14
                  *** AND g04 on gfs17/18
bffffff4 ccccccbf: g10 on gfs14/gfs14
                  *** AND g04 on gfs17/18 ccccccc0 ccccccc7: g03 on gfs17/gfs18
                  *** AND g04 on gfs17/18
ccccccc8 d999998b: g03 on gfs17/gfs18
d999998c e6666657: *** GAP ***
e6666658 f3333323: g05 on gfs17/gfs18
f3333324 ffffffff: g06 on gfs17/gfs18

I know this all seems like numerology, but bear with me.  Note that all of the problems seem to involve g04 on gfs17/gfs18 claiming the wrong range, and that the range it's claiming is almost exactly twice the size of all the other ranges.  In fact, it's the range it would have been assigned if there had been ten nodes instead of twenty.  For example, if that filesystem had been restored to an earlier state on gfs17/gfs18, and then self-healed in the wrong direction (self-mangled?) you would get exactly this set of symptoms.  I'm not saying that's what happened; it's just a way to illustrate what these values mean and the consequences of their being out of sync with each other.

So, why only one client?  Since you're reporting values on the servers, I'd guess it's because only that client has remounted.  The others are probably still operating from cached (and apparently correct) layout information.  This is a very precarious state, I'd have to say.  You
*might* be able to fix this by fixing the xattr values on that one filesystem, but I really can't recommend trying that without some input from Gluster themselves.

DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com _______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users