[Gluster-devel] self healing bug continu

Fri Apr 3 12:23:22 UTC 2009

On Fri, 3 Apr 2009 13:34:29 +0200, nicolas prochazka
<prochazka.nicolas at gmail.com> wrote:

> It seems to be have a lot of problem with self healing and one it is
> that glusterfs is using one server a reference ( the first in
> subvolumes )
> ( afr_sh_select_source  ?  )

This brings up an interesting point - what is the conflict resolution
supposed to be? The favorite-child option should be the resolution of last
resort (i.e. the timestamp metadata is identical). The primary resolution
should be, IIRC, the latest file wins. However, this poses potential
problems.

Consider this scenario:
Primary crashes. We only have the secondary. Files on the secondary change
while it is the only server. Primary comes back, but crashes again
mid-sync. Next time it comes back, it has a partially synced file, and it's
favorite-child, so unless the metadata (specifically timestamps) gets
synced _last_, the partial synced file would clobber the whole file. Does
the metadata get synced last? It's the only sane option as far as I can
tell, but I've seen situations before where the timestamps on the new
server get stuck to epoch (01-01-1970) after a (successful) resync.

Can somebody point at a definitive spec document for how the AFR healing is
_supposed_ to operate under various failure and resync scenarios? It
currently seems to be in quite a dangerous state and nowhere nearly enough
warnings are being made about it for something that can cause extensive
data corruption/loss. If such a specification exist, then it should be
pretty easy to create test cases for it. Speaking of which, is there a
test-harness available for it? It would be really useful to be able to do
something like "make test" before "make install". It would also encourage
more technical users to add test cases for things they find broken. It
would also provide a base-line for regressions, to make sure that something
that worked is never broken in a later release. My perception is that the
stability/bug-count has been getting progressively worse in all releases
since rc1.

Another thing - since files being in sync is such a problematic thing at
the moment, how about md5 and last-sync-timestamp fields in the metadata
for each file? This, coupled with an external cron job that
computes/verifies/updates these that can run [daily|weekly|monthly]
(depending on the amount of data) would at least provide a secondary sanity
check to make sure file corruption/de-sync gets detected early and
reliably. Not having such a thing is really just sticking one's head in the
sand and ignoring the issue.

Another thing - if a file is open for write, I think there should be a
metadata flag set, and it should be unset when the last write handle is
closed. When the server comes up, if there are any such flags are set
before any write opens are received, then the file should be marked as
crashes, and this file should explicitly be prevented from being the
sync-source. There are a lot of error-resync use cases and it might be a
good time for them to be enumerated and systematically tested against to
minimize risk of data loss.

Gordan