[Gluster-users] Self-heal's behavior: problem on "replace" -- it leaves garbage.

Tue Dec 16 06:36:02 UTC 2008

Keisuke,

As Keith suggested you can try with 1.4rc2 release?

AFR should have worked as you expected, i.e it should have replaced
"server2" file completely because it calls truncate() on server2
before writing the data to it during selfheal. Looks like truncate is
reporting success without actually truncating the file on server2.

Thanks
Krishna

On Tue, Dec 16, 2008 at 11:42 AM, Keisuke TAKAHASHI <keith at nttpc.co.jp> wrote:
> Hi, Mr.Freedman.
> Thanks for replying.
>
>>At 09:26 PM 12/15/2008, Keisuke TAKAHASHI wrote:
>>>Hi.
>>>I'm using GlusterFS v1.3.12 (glusterfs-1.3.12.tar.gz) via FUSE
>>>(fuse-2.7.3glfs10.tar.gz) on CentOS 5.2 x86_64 (Linux kernel
>>>2.6.18-92.el5) now.
>>>The nodes are HP Proliant DL360 G5 (as GlusterFS Client) and DL180
>>>G5 (as GlusterFS Servers).
>>>And the connections are all TCP/IP on Gigabit ethernet.
>>>
>>>Then, I tested self-heal and I found a technical problem about
>>>"replace" -- self-heal after a node's fault and others'
>>>file-contents decreasing leaves garbage.
>>>I would like you to show me ideas to resolve or avoid it.
>>>
>>>First, my GlusterFS's construction is following:
>>>   - 1 GlusterFS Client (client) and 3 GlusterFS Servers
>>> (server1,server2,server3)
>>>   - using cluster/unify to add GlusterFS Servers
>>>   - using cluster/afr between 3 GlsuterFS Servers underneath the
>>> cluster/unify
>>>   - namespace volume is on the GlusterFS Client
>>>
>>>So, self-heal will behave between server1, server2 and server3.
>>>
>>>Now, my self-healing procedure of fault scenario is following:
>>>   (1) Each node is active and mount point on client is
>>> /mnt/glusterfs. The operating user on client is root.
>>>   (2) Root creates fileA and fileBC on the client local directory
>>> (not on the mount point of FUSE)
>>>       - fileA contains strings "aaa"
>>>       - fileBC contains strings "bbb\nccc" (\n is line break.)
>>>   (3) Root copies fileBC on /mnt/glusterfs.
>>>   (4) Make server2 down. (# ifdown eth0)
>>>   (5) Root redirects fileA into fileBC (# cat fileA > fileBC)
>>>   (6) Make server2 up. (# ifup eth0)
>>>   (7) Now, the status of fileBC on servers is below:
>>>       - server1: fileBC contains "aaa", trusted.glusterfs.version is 3
>>>       - server2: fileBC contains "bbb\nccc", trusted.glusterfs.version is 2
>>>       - server3: fileBC contains "aaa", trusted.glusterfs.version is 3
>>>   (8) Execute self-heal. (# find /mnt/glusterfs -type f -print0 |
>>> xargs -0 head -c1 >/dev/null)
>>
>>on which server did you run this.  it seems to matter for some reason
>>from what I can tell.  if it's run from the server that has the new
>>version alls well but otherwise, sometimes afr doesnt work (although
>>this is likely fixed in the newer versions, I haven't specifically tested)
>>
>
> I did it on client.
> So (9) fileBC on server2 was self-healed.
>
>>>   (9) Then, the status of fileBC on servers is below:
>>>       - server1: fileBC contains "aaa", trusted.glusterfs.version is 3
>>>       - server2: fileBC contains "aaa\nccc", trusted.glusterfs.version is 3
>>>       - server3: fileBC contains "aaa", trusted.glusterfs.version is 3
>>>
>>>All right, fileBC on server2 was overwritten by others, but the
>>>result of "replace" seems in bit sequence (because original fileBC's
>>>"bbb" was replaced by "aaa" but "\nccc" was left).
>>>In this case, the part of contents "\nccc" in fileBC on server2 looks
>>>garbage.
>>>I would like self-heal to replace old file(s) with new file(s) completely.
>>
>>you actually wouldn't want this..  Imagine of the file were a 30GB
>>log file and all you really care about are the new bits.   what's
>>better is if it does an rsync like update of the file which it seems
>>to be doing but then forgetting to mark the end of file position.
>>
>
> I really understand it.
> But, on my GlusterFS, intended data type or size, or usage, are not cut-and-dried now.
> So I should estimate the case like this.
>
>>>Can self-heal do it? Or is there any good idea to resolve it?
>>
>>I'd run your test with 1.4rc2 and see if you have the same problem.
>>
>
> Thanks a lot.
> I also try it.
>
> Regards,
> Keisuke Takahashi
>
> _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
> Keisuke TAKAHASHI / NTTPC Communications,Inc.
>   E-Mail: keith at NOSPAM.nttpc.co.jp
>   http://www.nttpc.co.jp/english/index.html
> _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>