[Gluster-devel] question on self-heal

Emmanuel Dreyfus manu at netbsd.org
Mon Jul 30 12:33:35 UTC 2012


Hi

A question on self heal: As I understand, when a lookup occurs, the client
checks if self heal must be done, it heals if required, the proceed with 
the lookup.

I encounter rare situation where self heal is done but I still get the
non healed-result. For instance, I do read a file, get no result as if it 
were empty, then attempt to read it again and get the correct file content.

Here is an example. I am building in a release-3.3 glusterfs volume,
and the build fails because of an empty Makefile. The client log 
shows that this is a replication problem:

includes ===> external/intel-fw-eula/ipw2100
nbmake: don't know how to make includes. Stop

client log:
[2012-07-30 10:09:54.756766] E 
  [afr-self-heal-common.c:1087:afr_sh_common_lookup_resp_handler] 
  0-pfs-replicate-0: path /manu/netbsd/usr/src/external/intel-fw-eula/ipw2100
  on subvolume pfs-client-1 => -1 (No such file or directory)
[2012-07-30 10:09:55.056577] I [afr-common.c:1340:afr_launch_self_heal] 
  0-pfs-replicate-0:   entry self-heal triggered. 
  path: /manu/netbsd/usr/src/external/intel-fw-eula/ipw2100, 
  reason: checksums of directory differ
[2012-07-30 10:09:55.062865] E 
  [afr-self-heal-common.c:1087:afr_sh_common_lookup_resp_handler] 
  0-pfs-replicate-0: path 
  /manu/netbsd/usr/src/external/intel-fw-eula/ipw2100/CVS on 
  subvolume pfs-client-1 => -1 (No such file or directory)
[2012-07-30 10:09:55.063069] E 
  [afr-self-heal-common.c:1087:afr_sh_common_lookup_resp_handler] 
  0-pfs-replicate-0: path 
  /manu/netbsd/usr/src/external/intel-fw-eula/ipw2100/Makefile on 
  subvolume pfs-client-1 => -1 (No such file or directory)
[2012-07-30 10:09:55.063268] E 
  [afr-self-heal-common.c:1087:afr_sh_common_lookup_resp_handler] 
  0-pfs-replicate-0: path 
  /manu/netbsd/usr/src/external/intel-fw-eula/ipw2100/dist on 
  subvolume pfs-client-1 => -1 (No such file or directory)
[2012-07-30 10:09:55.480500] I 
  [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 
  0-pfs-replicate-0: background  entry self-heal completed on 
  /manu/netbsd/usr/src/external/intel-fw-eula/ipw2100


And if I run ls -l the file will finally be healed:
$ ls -l external/intel-fw-eula/ipw2100/Makefile
-rw-r--r-- 1 manu manu 224 Oct 30 2008 external/intel-fw-eula/ipw2100/Makefile

client log:
[2012-07-30 14:30:05.058560] I [afr-common.c:1340:afr_launch_self_heal]
  0-pfs-replicate-0: background  meta-data self-heal triggered. path:
  /manu/netbsd/usr/src/external/intel-fw-eula/ipw2100, reason: lookup
  detected pending operations
[2012-07-30 14:30:05.086289] I
  [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk]
  0-pfs-replicate-0: background  meta-data self-heal completed on
  /manu/netbsd/usr/src/external/intel-fw-eula/ipw2100
[2012-07-30 14:30:05.527602] I
  [afr-common.c:1189:afr_detect_self_heal_by_iatt] 0-pfs-replicate-0:
  size differs for
  /manu/netbsd/usr/src/external/intel-fw-eula/ipw2100/Makefile
[2012-07-30 14:30:05.527655] I [afr-common.c:1340:afr_launch_self_heal]
  0-pfs-replicate-0: background  meta-data data self-heal triggered.
  path: /manu/netbsd/usr/src/external/intel-fw-eula/ipw2100/Makefile,
  reason: lookup detected pending operations
[2012-07-30 14:30:05.580709] I
  [afr-self-heal-algorithm.c:116:sh_loop_driver_done] 0-pfs-replicate-0:
  full self-heal completed on
  /manu/netbsd/usr/src/external/intel-fw-eula/ipw2100/Makefile
[2012-07-30 14:30:05.615283] I
  [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk]
  0-pfs-replicate-0: background  meta-data data self-heal completed
  on /manu/netbsd/usr/src/external/intel-fw-eula/ipw2100/Makefile

This is a bug, right?

--
Emmanuel Dreyfus
manu at netbsd.org




More information about the Gluster-devel mailing list