[Gluster-devel] AFR Heal Bug
Gareth Bult
gareth at encryptec.net
Sun Dec 30 19:09:56 UTC 2007
Ok, I'm going to call it a bug, tell me if I'm wrong .. :)
(two servers, both define a "homes" volume)
Client;
volume nodea-homes
type protocol/client
option transport-type tcp/client
option remote-host nodea
option remote-subvolume homes
end-volume
volume nodeb-homes
type protocol/client
option transport-type tcp/client
option remote-host nodeb
option remote-subvolume homes
end-volume
volume homes-afr
type cluster/afr
subvolumes nodea-homes nodeb-homes ### ISSUE IS HERE! ###
option scheduler rr
end-volume
Assume system is completely up-to-date and working Ok.
Mount homes filesystem on "client".
Kill the "nodea" server.
System carries on, effectively using nodeb.
Wipe nodea's physical volume.
Restart nodea server.
All of a sudden, "client" see's an empty "homes" filesystem, although data is still in place on "B" and "A" is blank.
i.e. the client is seeing the blank "nodea" only (!)
.. at this point you check nodeb to make sure your data really is there, then you can mop up the coffee you've just spat all over your screens ..
If you crash nodeB instead, there appears to be no problem, and a self heal "find" will correct the blank volume.
Alternatively, if you reverse the subvolumes as listed above, you don't see the problem.
The issue appears to be blanking the first subvolume.
I'm thinking the order of the volumes should not be an issue, gluster should know one volume is empty / new and one contains real data and act accordingly, rather than relying on the order volumes are listed .. (???)
I'm using fuse glfs7 and gluster 1.3.8 (tla).
More information about the Gluster-devel
mailing list