[Gluster-devel] self healing bug continu

Fri Apr 3 11:34:29 UTC 2009

Yes, for test, i create file on backend to simulate a system after a crash.

In my case studie , i ve bug with  :
2 afr servers
if i crash server 2 ( subvolumes order ) , when i restart it, self
healing create file with 0 size in it's backend disk , on server 1 ,
file is not equal to 0.
In log i can see split brain  :
2009-04-03 12:56:15 D [fuse-bridge.c:468:fuse_lookup] glusterfs-fuse:
537: LOOKUP /images/TEST243_DISK(3221357272)
2009-04-03 12:56:15 D [inode.c:312:__inode_passivate] fuse/inode:
passivating inode(3221357266) lru=56/0 active=5 purge=0
2009-04-03 12:56:15 D [afr-self-heal-common.c:1254:afr_self_heal]
last: performing self heal on /images/TEST243_DISK (metadata=0 data=1
entry=0)
2009-04-03 12:56:15 D [afr-self-heal-common.c:1281:afr_self_heal]
last: proceeding to metadata check on /images/TEST243_DISK
2009-04-03 12:56:15 D
[afr-self-heal-common.c:652:afr_sh_missing_entries_done] last:
proceeding to metadata check on /images/TEST243_DISK
2009-04-03 12:56:15 D
[afr-self-heal-metadata.c:787:afr_self_heal_metadata] last: proceeding
to data check on /images/TEST243_DISK
2009-04-03 12:56:15 D
[afr-self-heal-metadata.c:83:afr_sh_metadata_done] last: proceeding to
data check on /images/TEST243_DISK
2009-04-03 12:56:15 D [afr-self-heal-data.c:993:afr_sh_data_lock]
last: locking /images/TEST243_DISK on subvolume brick_10.98.98.2
2009-04-03 12:56:15 D [afr-self-heal-data.c:993:afr_sh_data_lock]
last: locking /images/TEST243_DISK on subvolume brick_10.98.98.1
2009-04-03 12:56:15 D [afr-self-heal-data.c:945:afr_sh_data_lock_cbk]
last: inode of /images/TEST243_DISK on child 1 locked
2009-04-03 12:56:15 D [afr-self-heal-data.c:945:afr_sh_data_lock_cbk]
last: inode of /images/TEST243_DISK on child 0 locked
2009-04-03 12:56:15 D
[afr-self-heal-common.c:170:afr_sh_print_pending_matrix] last:
pending_matrix: [ 0 0 ]
2009-04-03 12:56:15 D
[afr-self-heal-common.c:170:afr_sh_print_pending_matrix] last:
pending_matrix: [ 0 0 ]
2009-04-03 12:56:15 E [afr-self-heal-data.c:813:afr_sh_data_fix] last:
Unable to resolve conflicting data of /images/TEST243_DISK. Please
resolve manually by deleting the file /images/TEST243_DISK from all
but the preferred subvolume. Please consider 'option favorite-child
<>'

but if i manually delete file from backend 2 , it will be recreate
with size 0  .
If i set a favorite child, in case of favorite child is server 1, size
is ok on server 2 , but if i set favorite_child to server 2 , then i
lose my file on two server.

It seems to be have a lot of problem with self healing and one it is
that glusterfs is using one server a reference ( the first in
subvolumes )
( afr_sh_select_source  ?  )
and with load balancing ( afr_first_up_child )

2009/4/3 Vikas Gorur <vikas at zresearch.com>:
> 2009/4/3 nicolas prochazka <prochazka.nicolas at gmail.com>:
>> hello
>> the latest git version no correct bug relative to afr self healing .
>>
>> -  Now if i create a file in backend server 1 ( first define in
>> subvolume) , size on gluster mount point is ok ( file is correct),
>> file is correct on server 1 backend but on all others backend afr
>> server, file exist with size of 0 .
>>
>> - If i create a file in second backend ( second define in subvolume),
>> file never appear on mount point, not in backend of afr server exect
>> the second ( normal )
>
> You are creating files directly on the backend? Are you saying that
> self-heal is not creating the file on other nodes too? Did you run ls
> -lR on the mountpoint to trigger self-heal?
>
> Vikas
> --
> Engineer - Z Research
> http://gluster.com/
>