[Gluster-devel] self heal problem
Stephan von Krawczynski
skraw at ithnet.com
Wed Mar 24 12:23:43 UTC 2010
When searching for possible causes of the wrong self healing I ran into this
code:
/xlators/cluster/afr/src/afr-self-heal-common.c:
if (type == AFR_SELF_HEAL_DATA) {
size_differs = afr_sh_mark_if_size_differs (sh, child_count);
}
if (afr_sh_all_nodes_innocent (characters, child_count)) {
if (size_differs) {
nsources = afr_sh_mark_biggest_as_source (sh,
child_count);
}
} else if (afr_sh_wise_nodes_exist (characters, child_count)) {
afr_sh_compute_wisdom (pending_matrix, characters,
child_count);
if (afr_sh_wise_nodes_conflict (characters, child_count)) {
/* split-brain */
nsources = -1;
goto out;
} else {
nsources = afr_sh_mark_wisest_as_sources (sources,
characters,
child_count);
}
} else {
nsources = afr_sh_mark_biggest_fool_as_source (sh, characters,
child_count);
}
afr_sh_mark_biggest_as_source seems to be doing exactly what it says, it looks
at the filesize. Can someone with more brain please elaborate what kind of a
healing case can depend on the file size? Really, I can see no way how this
can work out. The latest copy of a file can be either bigger or smaller in
size, nevertheless the only valid way of choosing is its modification date, and
never ever the size. Is there some general misunderstanding in my thinking and
reading the code?
--
Regards,
Stephan
On Tue, 23 Mar 2010 15:03:17 +0100
Stephan von Krawczynski <skraw at ithnet.com> wrote:
> Let me show you this further information for one file falsly self-healed:
>
> server1:
>
> # getfattr -d -m '.*' -e hex <filename>
> getfattr: Removing leading '/' from absolute path names
> # file: <filename>
> trusted.afr.remote1=0x000000000000000000000000
> trusted.afr.remote2=0x000000000000000000000000
> trusted.posix.gen=0x4b9bb33c00001be6
>
> # stat <filename>
> File: <filename>
> Size: 4509 Blocks: 16 IO Block: 4096 reguläre Datei
> Device: 804h/2052d Inode: 16560280 Links: 1
> Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
> Access: 2010-03-23 11:10:36.000000000 +0100
> Modify: 2010-03-23 00:32:25.000000000 +0100
> Change: 2010-03-23 12:36:40.000000000 +0100
>
>
> server2:
>
> # getfattr -d -m '.*' -e hex <filename>
> getfattr: Removing leading '/' from absolute path names
> # file: <filename>
> trusted.afr.remote1=0x000000000000000000000000
> trusted.afr.remote2=0x000000000000000000000000
> trusted.posix.gen=0x4b9bb2f600001be6
>
> # stat <filename>
> File: <filename>
> Size: 4024 Blocks: 8 IO Block: 4096 reguläre Datei
> Device: 804h/2052d Inode: 42762291 Links: 1
> Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
> Access: 2010-03-23 11:10:36.000000000 +0100
> Modify: 2010-03-23 14:32:23.000000000 +0100
> Change: 2010-03-23 14:32:23.000000000 +0100
>
>
> As you can see the latest file version is on server2 (modify date) and is _smaller_ in size.
>
> Now on client 2 a ls shows interesting values:
>
> # ls -l <filename>
> -rw-r--r-- 1 root root 4509 Mar 23 14:37 <filename>
>
> As you can see here, the file date looks increased and the size clearly shows that self-heal went wrong.
>
> Consequently the server2 copy now looks like:
>
> # stat <filename>
> File: <filename>
> Size: 4509 Blocks: 16 IO Block: 4096 reguläre Datei
> Device: 804h/2052d Inode: 42762291 Links: 1
> Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
> Access: 2010-03-23 11:10:36.000000000 +0100
> Modify: 2010-03-23 00:32:25.000000000 +0100
> Change: 2010-03-23 14:41:13.000000000 +0100
>
> Modification date went back and file size is increased, so the older file version was choosen to overwrite the newer one.
>
> --
> Regards,
> Stephan
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
More information about the Gluster-devel
mailing list