[Gluster-devel] self heal problem
Tejas N. Bhise
tejas at gluster.com
Wed Mar 24 14:46:37 UTC 2010
Hi Stephan,
GlusterFS keeps track if an operation happened on one copy but not
on the replica, in case a replica was not accessible. From the attributes
remote1 and remote2, it shows that there is no pending operation on the other
replica.
>From the attributes you have shown it seems that you have gone to
the backend directly, bypassed glusterfs, and hand crafted such a
situation. The way the code is written, we do not think that we can
reach the state you have shown in your example.
The remote1 and remote2 attributes show all zeroes which means
that there were no operations pending on any server.
If not hand crafted, then please give the detailed testcase which can
lead to this situation based on just filesize.
If this situation was handcrafted then it would be akin to
overwriting the section of a disk which carries the metadata of a
filesystem and then claiming that the FS is getting corrupted.
Please see the other code around the one you have pointed in the
other mail and you can see the other higher order checks that are
made.
Regards,
Tejas.
----- Original Message -----
From: "Stephan von Krawczynski" <skraw at ithnet.com>
To: gluster-devel at nongnu.org
Sent: Tuesday, March 23, 2010 7:33:17 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi
Subject: Re: [Gluster-devel] self heal problem
Let me show you this further information for one file falsly self-healed:
server1:
# getfattr -d -m '.*' -e hex <filename>
getfattr: Removing leading '/' from absolute path names
# file: <filename>
trusted.afr.remote1=0x000000000000000000000000
trusted.afr.remote2=0x000000000000000000000000
trusted.posix.gen=0x4b9bb33c00001be6
# stat <filename>
File: <filename>
Size: 4509 Blocks: 16 IO Block: 4096 reguläre Datei
Device: 804h/2052d Inode: 16560280 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2010-03-23 11:10:36.000000000 +0100
Modify: 2010-03-23 00:32:25.000000000 +0100
Change: 2010-03-23 12:36:40.000000000 +0100
server2:
# getfattr -d -m '.*' -e hex <filename>
getfattr: Removing leading '/' from absolute path names
# file: <filename>
trusted.afr.remote1=0x000000000000000000000000
trusted.afr.remote2=0x000000000000000000000000
trusted.posix.gen=0x4b9bb2f600001be6
# stat <filename>
File: <filename>
Size: 4024 Blocks: 8 IO Block: 4096 reguläre Datei
Device: 804h/2052d Inode: 42762291 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2010-03-23 11:10:36.000000000 +0100
Modify: 2010-03-23 14:32:23.000000000 +0100
Change: 2010-03-23 14:32:23.000000000 +0100
As you can see the latest file version is on server2 (modify date) and is _smaller_ in size.
Now on client 2 a ls shows interesting values:
# ls -l <filename>
-rw-r--r-- 1 root root 4509 Mar 23 14:37 <filename>
As you can see here, the file date looks increased and the size clearly shows that self-heal went wrong.
Consequently the server2 copy now looks like:
# stat <filename>
File: <filename>
Size: 4509 Blocks: 16 IO Block: 4096 reguläre Datei
Device: 804h/2052d Inode: 42762291 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2010-03-23 11:10:36.000000000 +0100
Modify: 2010-03-23 00:32:25.000000000 +0100
Change: 2010-03-23 14:41:13.000000000 +0100
Modification date went back and file size is increased, so the older file version was choosen to overwrite the newer one.
--
Regards,
Stephan
_______________________________________________
Gluster-devel mailing list
Gluster-devel at nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list