[Gluster-users] Files present on	the	backend	but	have	become invisible from clients
    Jeff Darcy 
    jdarcy at redhat.com
       
    Wed Jun 22 20:40:55 UTC 2011
    
    
  
On 06/22/2011 02:44 PM, Burnash, James wrote:
> g01/pfs-ro1-client-0=0x000000000000000000000000 jc1letgfs17 
> g01/pfs-ro1-client-0=0x000000000600000800000000 jc1letgfs18 
> g01/pfs-ro1-client-20=0x000000000000000000000000 jc1letgfs14 
> g01/pfs-ro1-client-20=0x000000000200000000000000 jc1letgfs15 
> g02/pfs-ro1-client-2=0x000000000000000000000000 jc1letgfs17 
> g02/pfs-ro1-client-2=0x000000004500000400000000 jc1letgfs18 
> g02/pfs-ro1-client-22=0x000000000000000000000000 jc1letgfs14 
> g02/pfs-ro1-client-22=0x000000000200000000000000 jc1letgfs15
> 
> Would anybody have any insights as to what is going on here? I'm 
> seeing attributes in my sleep these days ... that cannot be good!
When I look at this, a few things occur to me.  First, those are some
pretty big metadata-change numbers.  For g02 on fs18, 0x45000004 is
actually 0x04000045 - about 67M - after byte swapping.  The other
thing that seems strange is that it always seems to be the second
member of a replica pair "indicting" the first. Lastly, if you don't
see any non-zero xattrs besides those above then this is not a normal
split-brain situation.  It might be a more exotic kind, based on
*missing* xattrs.  Here's the sequence:
* Lookup for '/' on client-0 returns a zero opcounts for both client-0
  and client-1
* Lookup for '/' on client-1 returns a non-zero opcount for client-0 and
  *no xattr at all* (i.e. missing) for client-1
* Client-1 is declared "ignorant" in afr_sh_build_pending_matrix
* Client-0's value for client-1 is incremented, again in
  afr_sh_build_pending_matrix
* Client-0 and client-1 are both marked "wise" in afr_sh_mark_sources
* Client-0 and client-1 both receive wisdom=0 in afr_sh_compute_wisdom
* No wisdom=1 nodes are found in afr_sh_wise_nodes_conflict
* The "Unable to self-heal" messages come from afr_sh_metadata_fix
So, what I'd do is check whether the following xattrs are missing:
	fs18/g01/pfs-ro1-client-1
	fs15/g01/pfs-ro1-client-21
	fs18/g02/pfs-ro1-client-3
	fs15/g02/pfs-ro1-client-23
There might be other "false split-brain" scenarios lurking in this code.
 I don't fully understand it, but it seems to go through a lot of paths
that might not have been fully tested.
    
    
More information about the Gluster-users
mailing list