[Gluster-devel] Another Data Corruption Report

Mon Feb 8 03:57:24 UTC 2010

Gordan, and others using a config like Bug 542
(http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=542)

This corruption issue can shows up (but not always) when you have
loaded both io-cache and write-behind below replicate (syntactically
before replicate in the volfile) and a self-heal of a file bigger than
131072 bytes happens. Gordan, we believe this is why your corruption
observations are strongly correlated to server reconnections.

Please use write-behind and io-cache on top of replicate (the "normal"
way, the way glusterfs-volgen would generate), and you will not face
this problem. I believe the reason for using io-cache and write-behind
below replicate is for improving self-heal performance - for which we
suggest using 3.0.x release where we have background self-healing and
diff based self-healing.

Please read http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=542#c4
for details about the internals of the corruption.

In summary, loading the performance (specifically io-cache)
translators in the normal location will give you a quick remedy from
the bug.

Thanks,
Avati

On Sat, Jan 16, 2010 at 12:00 AM, Gordan Bobic <gordan at bobich.net> wrote:
> I've just observed another case of corruption similar to what I reported a
> while back with .viminfo files getting corrupted (in that case, by somehow
> being clobbered by a shared library fragment from what I could tell).
>
> This time, however, it was much more sinister, although probably the same
> failure mode. I have just seen a file in a CVS repository resiting on
> GlusterFS be replaced - by another file in the same directory in the same
> CVS repository! One of my header files somehow got replaced by the Makefile
> in the same directory. Reviewing "cvs log" indicates that the entire file on
> the CVS side was clobbered by the Makefile - there is no indication (e.g.
> from cvs log) that it was accidentally copied over and committed in.
>
> I'm sure I don't have to stress just how mind bogglingly dangerous (as in
> data corruption/loss dangerous) this is.
>
> Observed with 2.0.9, the volume is AFR.
>
> Not sure if it is in any way relevant to this particular bug report, but
> whenever I do cvs update on the glfs-backed repository, I get this sort of
> thing in the glfs log:
>
> [2010-01-15 18:05:11] E [posix.c:3156:do_xattrop] home-store: getxattr
> failed on /cvs/Project/C/#cvs.lock while doing xattrop: No such file or
> directory
>
> Gordan
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>