[Gluster-users] favorite-child and self-heal BUG ?

Thu Jan 8 12:41:34 UTC 2009

As Keith said you should not modify the backend directly.

You are seeing 2nd servers files because (logs indicate) 2nd servers
directory had extended attributes
(because of some previous operations) which said that it is the
latest, hence the 1st servers files were deleted and 2nd servers files
were put in the 1st server.

Krishna

On Thu, Jan 8, 2009 at 5:27 PM, Keith Freedman <freedman at freeformit.com> wrote:
> you're modifying the filesystem outside of gluster so the results
> will not be what you expect.
>
> when you remove the files from the directory outside of gluster, the
> gluster extended attributes dont get updated.
>
> so when gluster is running again, and the directory accessed, it sees
> that the extended attributes are the same on both servers, but that
> one of them is missing files, so to be safe, it merges the directory
> listings.. this effectively 'returns' the files you deleted in the listing.
> when you then go to access the files, it will auto-heal the file from
> the other server since it's not on this one's underlying filesystem.
>
> in order to do a real test, you should turn off one server, then
> delete the file FROM THE GLUSTER MOUNT, then turn on the other
> server, and this will correctly auto-heal and delete the entries from
> the remote server.
>
> you should treat any underlying filesystem as a raw untouchable volume.
> you cant expect a system to manage things in the face of unexpected
> external influences.
>
>
>
> At 03:03 AM 1/8/2009, artur.k wrote:
>>self-heal doesn't work right
>>
>>on client:
>>
>>Command line : /usr/sbin/glusterfs --log-level=WARNING
>>--volfile=/etc/glusterfs/glusterfs-client.vol /mnt/glusterfs
>>given volfile
>>+-----
>>  1: volume client1
>>  2:   type protocol/client
>>  3:   option transport-type tcp/client
>>  4:   option remote-host trac-xx-1.atm
>>  5:   option remote-port 6996
>>  6:   option remote-subvolume brick
>>  7: end-volume
>>  8:
>>  9: volume client2
>>10:  type protocol/client
>>11:  option transport-type tcp/client
>>12:  option remote-host trac-xx-2.atm
>>13:  option remote-port 6996
>>14:  option remote-subvolume brick
>>15: end-volume
>>16:
>>17: volume afr
>>18:   type cluster/afr
>>19:   subvolumes client1 client2
>>20:   option entry-self-heal on
>>21:   option data-self-heal on
>>22:   option metadata-self-heal off
>>23:   option favorite-child client1
>>24: end-volume
>>25:
>>26: volume wh
>>27:   type performance/write-behind
>>28:   option flush-behind on
>>29:   subvolumes afr
>>30: end-volume
>>31:
>>32: volume io-cache
>>33:   type performance/io-cache
>>34:   option cache-size 64MB
>>35:   option page-size 1MB
>>36: #  option cache-timeout 2
>>37:   subvolumes wh
>>38: end-volume
>>39:
>>40: volume iot
>>41:   type performance/io-threads
>>42:   subvolumes io-cache
>>43:   option thread-count 4
>>44: end-volume
>>+-----
>>2009-01-08 11:29:22 W [afr.c:2007:init] afr: You have specified
>>subvolume 'client1' as the 'favorite child'. This means that if a
>>discrepancy in the content or attributes (ownership, permission,
>>etc.) of a file is detected among the subvolumes, the file on
>>'client1' will be considered the definitive version and its contents
>>will OVERWRITE the contents of the file on other subvolumes. All
>>versions of the file except that on 'client1' WILL BE LOST.
>>
>>
>>on server:
>>
>>
>>Command line : /usr/sbin/glusterfsd -p /var/run/glusterfsd.pid -f
>>/etc/glusterfs/server.vol -l /var/log/glusterfs/glusterfsd.l
>>og -L WARNING -p /var/run/glusterfsd.pid
>>given volfile
>>+-----
>>  1: volume posix
>>  2:   type storage/posix
>>  3:   option directory /var/storage/glusterfs
>>  4: end-volume
>>  5:
>>  6: volume p-locks
>>  7:   type features/posix-locks
>>  8:   subvolumes posix
>>  9:   option mandatory-locks on
>>10: end-volume
>>11:
>>12: volume wh
>>13:   type performance/write-behind
>>14:   option flush-behind on
>>15:   subvolumes p-locks
>>16: end-volume
>>17:
>>18: volume brick
>>19:   type performance/io-threads
>>20:   subvolumes wh
>>21:   option thread-count 2
>>22: end-volume
>>23:
>>24: volume server
>>25:   type protocol/server
>>26:   subvolumes brick
>>27:   option transport-type tcp/server
>>28:   option auth.addr.brick.allow 10.*.*.*
>>29: end-volume
>>30:
>>+-----
>>
>>
>>scenario:
>>
>>1.
>>
>>trac-xx-1:/var/storage/glusterfs# /etc/init.d/glusterfs-server stop
>>Stopping glusterfs server: glusterfsd.
>>trac-storage-1:/var/storage/glusterfs# rm -rf *
>>trac-storage-1:/var/storage/glusterfs# ls
>>trac-storage-1:/var/storage/glusterfs# touch 10
>>trac-storage-1:/var/storage/glusterfs# touch 11
>>trac-storage-1:/var/storage/glusterfs# ls
>>10  11
>>trac-storage-1:/var/storage/glusterfs# /etc/init.d/glusterfs-server start
>>Starting glusterfs server: glusterfsd.
>>
>>
>>2.
>>
>>
>>trac-xx-2:/var/storage/glusterfs# /etc/init.d/glusterfs-server stop
>>Stopping glusterfs server: glusterfsd.
>>
>>trac-xx-2:/var/storage/glusterfs# rm -rf *
>>trac-xx-2:/var/storage/glusterfs# ls
>>trac-xx-2:/var/storage/glusterfs# touch 20
>>trac-xx-2:/var/storage/glusterfs# touch 21
>>trac-xx-2:/var/storage/glusterfs# ls
>>20  21
>>trac-xx-2:/var/storage/glusterfs# /etc/init.d/glusterfs-server start
>>Starting glusterfs server: glusterfsd.
>>
>>
>>3,
>>
>>noc-xx-2:~# cat /etc/fstab
>>......
>>......
>>
>>/etc/glusterfs/glusterfs-client.vol  /mnt/glusterfs  glusterfs  defaults  0  0
>>
>>noc-xx-2:~# mount -a
>>noc-xx-2:~# cd /mnt/glusterfs/
>>noc-xx-2:/mnt/glusterfs# ls
>>20  21
>>
>>?!!!!!!
>>
>>trac-xx-1:/var/storage/glusterfs# ls
>>20  21
>>
>>?!!!!!!
>>
>>trac-xx-2:/var/storage/glusterfs# ls
>>20  21
>>
>>
>>shouldn't it be the other way around? Client1 is set to be the
>>"favorite child" not client 2
>>
>>
>>in log:
>>
>>2009-01-08 11:29:30 W
>>[afr-self-heal-entry.c:1100:afr_sh_entry_impunge_mknod] afr:
>>creating file /20 mode=0100644 dev=0x0 on client1
>>2009-01-08 11:29:30 W
>>[afr-self-heal-entry.c:1100:afr_sh_entry_impunge_mknod] afr:
>>creating file /21 mode=0100644 dev=0x0 on client1
>>2009-01-08 11:29:30 W
>>[afr-self-heal-entry.c:495:afr_sh_entry_expunge_unlink] afr:
>>unlinking file /10 on client1
>>2009-01-08 11:29:30 W
>>[afr-self-heal-entry.c:495:afr_sh_entry_expunge_unlink] afr:
>>unlinking file /11 on client1
>>
>>in documentaion:
>>"Self-heal is triggered when a file or directory is first accessed,
>>that is, the
>>first time any operation is attempted on it."
>>
>>The files have been changed before running ls command on the client
>>"/mnt/glusterfs". According to the glusterFS documentation the
>>synchronization starts when the file is first opened on the client
>>not when the server is started. Maybe I've misunderstood something.
>>Please clarify
>>
>>
>>glusterfs 1.4.0rc7 built on Jan  7 2009 15:00:10
>>Repository revision: glusterfs--mainline--3.0--patch-814
>>
>>Linux debian etch 4.0 2.6.18-5-xen-amd64
>>
>>
>>_______________________________________________
>>Gluster-users mailing list
>>Gluster-users at gluster.org
>>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>