[Gluster-users] [SPAM?] Re: strange error hangs hangs any access to gluster mount

Tue Apr 5 13:07:57 UTC 2011

On 04/04/2011 03:02 PM, Burnash, James wrote:
> Sadly, this did not fix things. <sigh>
> 
> My brick xattrs now look like this:
> 
> http://pastebin.com/2p4iaZq3
> 
> And here is the debug output from a client where I restarted the
> gluster client while the diagnostics.client-log-level DEBUG was set
> 
> http://pastebin.com/5pjwxwsj
> 
> I'm at somewhat of a loss. Any help would be greatly appreciated.

Now it looks like g04 on gfs17/gfs18 has no DHT xattrs at all, leaving a
hole from d999998c to e6666657. From the log, the "background meta-data
self-heal" messages are probably related to that, though the failure
messages about non-blocking inodelks (line 713) and possible split brain
(e.g. line 777) still seem a bit odd. There are also some messages about
timeouts (e.g. line 851) that are probably unrelated but might be worth
investigating. I can suggest a few possible courses of action:

(1) Do a "getfattr -n trusted.distribute.fix.layout" on the root (from
the client side) to force the layouts to be recalculated. This is the
same hook that's used by the first part of the rebalance code, but only
does this one part on one directory. OTOH, it's also the same thing the
self-heal should have done, so I kind of expect it will fail
(harmlessly) in the same way.

(2) Manually set the xattr on gfs{17,18}:/.../g04 to the "correct"
value, like so:

	setfattr -n trusted.glusterfs.dht -v \
	0x0000000100000000d999998ce6666657 g04

(3) Migrate the data off that volume to others, remove/nuke/rebuild it,
then add it back in a pristine state and rebalance.