[Gluster-devel] self heal fails

Robert Hassing rhassing at service2media.com
Mon Jun 11 11:18:08 UTC 2012


On the brick that was online all the time:
[2012-06-08 16:42:14.192132] W [client3_1-fops.c:2457:client3_1_link_cbk] 0-replicated-data-client-0: remote operation failed: File exists (00000000-0000-0000-0000-000000000000 -> <gfid:00000000-0000-0000-0000-000000000001>/config)
[2012-06-08 16:42:14.192591] E [afr-self-heal-common.c:2156:afr_self_heal_completion_cbk] 0-replicated-data-replicate-0: background  entry self-heal failed on <gfid:00000000-0000-0000-0000-000000000001>

/config is a symbolic link
Put some files in the directory at the same level (/) which are auto healed correctly
/config was not changed at all

Thanks



-----Original Message-----
From: Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] 
Sent: maandag 11 juni 2012 12:27
To: Robert Hassing
Cc: gluster-devel at nongnu.org
Subject: Re: [Gluster-devel] self heal fails

>>After bringing the brick back online self heal fails with the error below. 
<where is the error?>
>>From what I can tell, self heal only fails on symlinks (that didn’t change) in the folder where the changes have been mad.

Could you please provide the logs.

Pranith
----- Original Message -----
From: "Robert Hassing" <rhassing at service2media.com>
To: gluster-devel at nongnu.org
Sent: Monday, June 11, 2012 3:37:39 PM
Subject: Re: [Gluster-devel] self heal fails





Hi 



Did someone find the cause of this problem? 

I got into the same situation. 



Setup: 



Brink1 

Brick2 

Several clients 



After shutting doen one of the bricks, we write some data to the shared filesystem 

After bringing the brick back online self heal fails with the error below. 

From what I can tell, self heal only fails on symlinks (that didn’t change) in the folder where the changes have been made. 



Regards 

Robert Hassing 





From : 	

Pranith Kumar Karampuri 


Subject : 	

Re: [Gluster-devel] self heal fails 


Date : 	

Tue, 05 Jun 2012 10:19:15 -0400 (EDT) 



Emmanuel, 

For some reason /manu/netbsd/usr/src/lib/libkafs/libkafs.so.9 and its 

parent dir have trusted.gfid all zero, this is worse. This is brand new for me. 

Do let me know if you have a test case to get into this situation. 



Pranith 

----- Original Message ----- 

From: "Emmanuel Dreyfus" <address at hidden> 

To: "Pranith Kumar Karampuri" <address at hidden> 

Cc: "Emmanuel Dreyfus" <address at hidden>, address at hidden 

Sent: Tuesday, June 5, 2012 7:01:24 PM 

Subject: Re: [Gluster-devel] self heal fails 



On Tue, Jun 05, 2012 at 07:57:16AM -0400, Pranith Kumar Karampuri wrote: 

> If lookup triggers self-heal and the self-heal fails, lookup 

> wont fail unless it is a splitbrain on the entry i.e. gfid mismatch. 

> There seems to be a problem in the logs you have mentioned. For 

> some reason the gfid is all zeros, I wonder how you hit this case. 

> Do you have a testcase that can re-create this case. 



It keeps going on for now, but I do not know how I got this situation. 



> Could you post the output of 

> 'getfattr -d -m . -e hex' for /manu/netbsd/usr/src/lib/libkafs, 

> /manu/netbsd/usr/src/lib/libkafs/libkafs.so.9, 

> /manu/netbsd/usr/src/lib/libkafs/libkafs.so On both the bricks. 



The commands are a bit different, but here is the info: 

brick0 

manu/netbsd/usr/src/lib/libkafs/ 

trusted.afr.pfs-client-1 

00 00 00 00 00 00 00 00 00 00 00 03 00 

trusted.afr.pfs-client-0 

00 00 00 00 00 00 00 00 00 00 00 00 00 

trusted.gfid 

00 00 00 00 00 00 00 00 00 00 00 00 00 

manu/netbsd/usr/src/lib/libkafs/libkafs.so.9 

trusted.afr.pfs-client-1 

00 00 00 00 00 00 00 00 00 00 00 00 00 

trusted.afr.pfs-client-0 

00 00 00 00 00 00 00 00 00 00 00 00 00 

trusted.gfid 

00 00 00 00 00 00 00 00 00 00 00 00 00 

manu/netbsd/usr/src/lib/libkafs/libkafs.so 

trusted.afr.pfs-client-1 

be 77 68 6e ba d2 45 d2 8c c2 1a 0e 37 9a 44 0a 

trusted.afr.pfs-client-0 

a4 19 75 e7 f9 be 44 09 bb e8 70 76 6a 04 95 46 

trusted.gfid 

a4 19 75 e7 f9 be 44 09 bb e8 70 76 6a 04 95 46 



brick1 

manu/netbsd/usr/src/lib/libkafs/ 

trusted.afr.pfs-client-1 

ENODATA 

trusted.afr.pfs-client-0 

ENODATA 

trusted.gfid 

be 77 68 6e ba d2 45 d2 8c c2 1a 0e 37 9a 44 0a 

manu/netbsd/usr/src/lib/libkafs/libkafs.so.9 

trusted.afr.pfs-client-1 

00 00 00 00 00 00 00 00 00 00 00 00 00 

trusted.afr.pfs-client-0 

00 00 00 00 00 00 00 00 00 00 00 00 00 

trusted.gfid 

a4 19 75 e7 f9 be 44 09 bb e8 70 76 6a 04 95 46 

manu/netbsd/usr/src/lib/libkafs/libkafs.so 

trusted.afr.pfs-client-1 

00 00 00 00 00 00 00 00 00 00 00 00 00 

trusted.afr.pfs-client-0 

00 00 00 00 00 00 00 00 00 00 00 00 00 

trusted.gfid 

a4 19 75 e7 f9 be 44 09 bb e8 70 76 6a 04 95 46 



I am a bit suprised that libkafs.so and libkafs.so.9.0 have the 

same gfid: They are just symlinks to the same node. Bug? 

Here is ls -lid on brick1: 



17407737 drwxr-xr-x 3 manu manu 1024 Jun 5 13:31 

manu/netbsd/usr/src/lib/libkafs/ 

17434245 lrwxrwxrwx 2 manu manu 14 Jun 4 07:38 

manu/netbsd/usr/src/lib/libkafs/libkafs.so -> libkafs.so.9.0 

17433620 lrwxrwxrwx 2 manu manu 14 Jun 4 07:38 

manu/netbsd/usr/src/lib/libkafs/libkafs.so.9 -> libkafs.so.9.0 



I wonder if my recent chang with linkat could have introduced a bug. 



-- 

Emmanuel Dreyfus 

address at hidden 






_______________________________________________
Gluster-devel mailing list
Gluster-devel at nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel


More information about the Gluster-devel mailing list