[Gluster-users] AFR healing problem after returning one node.

Wed Dec 10 16:16:33 UTC 2008

I've got configuration which in simple includes combination of afrs and
unify - servers exports n[1-3]-brick[12] and n[1-3]-ns and client got
cluster configuration:

volume afr-ns
    type cluster/afr
    subvolumes n1-ns n2-ns n3-ns
    option data-self-heal on
    option metadata-self-heal on
    option entry-self-heal on
end-volume

volume afr1
    type cluster/afr
    subvolumes n1-brick2 n2-brick1
    option data-self-heal on
    option metadata-self-heal on
    option entry-self-heal on
end-volume

volume afr2
    type cluster/afr
    subvolumes n2-brick2 n3-brick1
    option data-self-heal on
    option metadata-self-heal on
    option entry-self-heal on
end-volume

volume unify
    type cluster/unify
    subvolumes afr1 afr2
    option namespace afr-ns
    option scheduler rr
end-volume

Unify is mounted on n3 (which is also a client) under /storage where
I've created directory and file:

n3:/storage# mkdir test
n3:/storage# cd test
n3:/storage/test# date > file

After I turned off n2, file was removed and I've checkd some attrs:

n3:/storage/test# getfattr -R -d -m ".*" /export/storage?/*
getfattr: Removing leading '/' from absolute path names
# file: export/storage1/brick
trusted.glusterfs.afr.entry-pending=0sAAAAAAAAAAA=
trusted.glusterfs.test="working\000"

# file: export/storage1/ns
trusted.glusterfs.afr.entry-pending=0sAAAAAAAAAAAAAAAA
trusted.glusterfs.test="working\000"

# file: export/storage1/ns/test
trusted.glusterfs.afr.entry-pending=0sAAAAAAAAAAEAAAAA

# file: export/storage2/brick
trusted.glusterfs.test="working\000"

Then n2 was bring back, and after a while I was able to cat file:

n3:/storage/test# cat file
Wed Dec 10 16:59:00 CET 2008
n3:/storage/test# getfattr -R -d -m ".*" /export/storage?/*
getfattr: Removing leading '/' from absolute path names
# file: export/storage1/brick
trusted.glusterfs.afr.entry-pending=0sAAAAAAAAAAA=
trusted.glusterfs.test="working\000"

# file: export/storage1/ns
trusted.glusterfs.afr.entry-pending=0sAAAAAAAAAAAAAAAA
trusted.glusterfs.test="working\000"

# file: export/storage1/ns/test
trusted.glusterfs.afr.entry-pending=0sAAAAAAAAAAEAAAAA

# file: export/storage1/ns/test/file
trusted.glusterfs.afr.data-pending=0sAAAAAAAAAAAAAAAA
trusted.glusterfs.afr.metadata-pending=0sAAAAAAAAAAAAAAAA

# file: export/storage2/brick
trusted.glusterfs.test="working\000"

I don't know why file was recreated. I've tasted without *-self-heal
options and the result was the same. Maybe using io-cache complicates
this issue (which is configured on both sides - server and client)? Has
anyone solution for this case?

Regards.

-- 
rash at konto pl