[Gluster-users] Self-heal's behavior: problem on "replace" -- it leaves garbage.

Tue Dec 16 05:44:56 UTC 2008

At 09:26 PM 12/15/2008, Keisuke TAKAHASHI wrote:
>Hi.
>I'm using GlusterFS v1.3.12 (glusterfs-1.3.12.tar.gz) via FUSE 
>(fuse-2.7.3glfs10.tar.gz) on CentOS 5.2 x86_64 (Linux kernel 
>2.6.18-92.el5) now.
>The nodes are HP Proliant DL360 G5 (as GlusterFS Client) and DL180 
>G5 (as GlusterFS Servers).
>And the connections are all TCP/IP on Gigabit ethernet.
>
>Then, I tested self-heal and I found a technical problem about 
>"replace" -- self-heal after a node's fault and others' 
>file-contents decreasing leaves garbage.
>I would like you to show me ideas to resolve or avoid it.
>
>First, my GlusterFS's construction is following:
>   - 1 GlusterFS Client (client) and 3 GlusterFS Servers 
> (server1,server2,server3)
>   - using cluster/unify to add GlusterFS Servers
>   - using cluster/afr between 3 GlsuterFS Servers underneath the 
> cluster/unify
>   - namespace volume is on the GlusterFS Client
>
>So, self-heal will behave between server1, server2 and server3.
>
>Now, my self-healing procedure of fault scenario is following:
>   (1) Each node is active and mount point on client is 
> /mnt/glusterfs. The operating user on client is root.
>   (2) Root creates fileA and fileBC on the client local directory 
> (not on the mount point of FUSE)
>       - fileA contains strings "aaa"
>       - fileBC contains strings "bbb\nccc" (\n is line break.)
>   (3) Root copies fileBC on /mnt/glusterfs.
>   (4) Make server2 down. (# ifdown eth0)
>   (5) Root redirects fileA into fileBC (# cat fileA > fileBC)
>   (6) Make server2 up. (# ifup eth0)
>   (7) Now, the status of fileBC on servers is below:
>       - server1: fileBC contains "aaa", trusted.glusterfs.version is 3
>       - server2: fileBC contains "bbb\nccc", trusted.glusterfs.version is 2
>       - server3: fileBC contains "aaa", trusted.glusterfs.version is 3
>   (8) Execute self-heal. (# find /mnt/glusterfs -type f -print0 | 
> xargs -0 head -c1 >/dev/null)

on which server did you run this.  it seems to matter for some reason 
from what I can tell.  if it's run from the server that has the new 
version alls well but otherwise, sometimes afr doesnt work (although 
this is likely fixed in the newer versions, I haven't specifically tested)

>   (9) Then, the status of fileBC on servers is below:
>       - server1: fileBC contains "aaa", trusted.glusterfs.version is 3
>       - server2: fileBC contains "aaa\nccc", trusted.glusterfs.version is 3
>       - server3: fileBC contains "aaa", trusted.glusterfs.version is 3
>
>All right, fileBC on server2 was overwritten by others, but the 
>result of "replace" seems in bit sequence (because original fileBC's 
>"bbb" was replaced by "aaa" but "\nccc" was left).
>In this case, the part of contents "\nccc" in fileBC on server2 looks garbage.
>I would like self-heal to replace old file(s) with new file(s) completely.

you actually wouldn't want this..  Imagine of the file were a 30GB 
log file and all you really care about are the new bits.   what's 
better is if it does an rsync like update of the file which it seems 
to be doing but then forgetting to mark the end of file position.

>Can self-heal do it? Or is there any good idea to resolve it?

I'd run your test with 1.4rc2 and see if you have the same problem.

>Thanks.
>
>Keisuke Takahashi
>
>
>_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
>Keisuke TAKAHASHI / NTTPC Communications,Inc.
>    E-Mail: keith at NOSPAM.nttpc.co.jp
>    http://www.nttpc.co.jp/english/index.html
>_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users