<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 03/20/2017 06:31 PM, Bernhard Dübi
wrote:<br>
</div>
<blockquote
cite="mid:CACxnGeQ4zsFZhBD-HASq=uLKFqHGF+HRyyi1CCPfYTmyxqZ0DA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>Hi Ravi,<br>
<br>
</div>
thank you very much for looking into this<br>
</div>
The gluster volumes are used by CommVault Simpana to store
backup data. Nothing/Nobody should access the underlying
infrastructure.<br>
<br>
</div>
while looking at the xattrs of the files, I noticed that the
only difference was the bit-rot.version. So, I assume that
something in the synchronization of the bit-rot data went
wrong and having different bit-rot.versions is considered like
a split-brain situation and access is denied because there is
no guarantee of correctness. this is just a wild guess.<br>
</div>
</div>
</blockquote>
Hi Bernhard,<br>
<br>
bit-rot version can be different between bricks of the replica when
I/O is successful only on one brick of the replica when the other
brick was down. (though AFR self-heal will later heal the contents,
but not modify bitrot xattrs). So that is not a problem.<br>
<br>
<blockquote
cite="mid:CACxnGeQ4zsFZhBD-HASq=uLKFqHGF+HRyyi1CCPfYTmyxqZ0DA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
over the weekend I identified hundreds of files with
input/output errors. I compared the sha256sum of both bricks,
they were always the same. I then deleted the affected files
from gluster and recreated them. this should have fixed the
issue. Verification is still running.<br>
<div><br>
</div>
<div>if you're interested in the root cause, I can send you more
log files and the xattrs of some files<br>
</div>
</div>
</blockquote>
<br>
If you did not access the underlying bricks directly like you said
then it could possibly be a bitrot bug. If you don't mind please
raise a BZ under the bitrot component and the appropriate gluster
version with all client and brick logs attached.<br>
Also if you do have some kind of reproducer, that would help a lot.<br>
-Ravi<br>
<br>
<blockquote
cite="mid:CACxnGeQ4zsFZhBD-HASq=uLKFqHGF+HRyyi1CCPfYTmyxqZ0DA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
<br>
</div>
<div>Best Regards<br>
</div>
<div>Bernhard<br>
</div>
<div><br>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div class="gmail_extra"><br>
<div class="gmail_quote">2017-03-20 12:57
GMT+01:00 Ravishankar N <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:ravishankar@redhat.com"
target="_blank">ravishankar@redhat.com</a>></span>:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div
class="m_1080219594149766984moz-cite-prefix">SFILE_CONTAINER_080
is the one which seems to be in
split-brain. SFILE_CONTAINER_046, for
which you have provided the getfattr
output, hard links etc doesn't seem to
be in split-brain. We do see that the
fops on SFILE_CONTAINER_046 are
failing on the client translator
itself due to EIO:<br>
<tt><br>
[2017-03-17 19:49:56.088867] E
[MSGID: 114031]
[client-rpc-fops.c:444:<wbr>client3_3_open_cbk]
0-Server_Legal_01-client-0: remote
operation failed. Path:
/Server_Legal/CV_MAGNETIC/V_<wbr>944453/CHUNK_9291168/SFILE_<wbr>CONTAINER_046
(bfdfe21a-1af3-474b-a6a4-<wbr>bc0e17edb529)
[Input/output error]</tt><tt><br>
</tt><tt><br>
</tt><tt>[2017-03-17 19:49:56.089012]
E [MSGID: 114031]
[client-rpc-fops.c:444:<wbr>client3_3_open_cbk]
0-Server_Legal_01-client-1: remote
operation failed. Path:
/Server_Legal/CV_MAGNETIC/V_<wbr>944453/CHUNK_9291168/SFILE_<wbr>CONTAINER_046
(bfdfe21a-1af3-474b-a6a4-<wbr>bc0e17edb529)
[Input/output error]</tt><br>
<br>
which is why the sha256sum on the
mount gave EIO. And that is because
the file seems to be corrupt on both
bricks because the
'trusted.bit-rot.bad-file' xattr is
set.<br>
<br>
Did you write to the files directly on
the backend? What is interesting is
that the sha256sum is same on both the
bricks despite being both marked as
bad by bitrot.<br>
<br>
-Ravi<br>
<br>
<br>
On 03/18/2017 03:20 AM, Bernhard Dübi
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>Hi,<br>
<br>
</div>
I have a situation<br>
<br>
</div>
the volume logfile reports a
possible split-brain but when I try
to heal it fails because the file is
not in split-brain. Any ideas?<br>
<br>
<br>
<p class="MsoNormal"><br>
</p>
<p class="MsoNormal"><br>
</p>
<p class="MsoNormal">Regards</p>
<p class="MsoNormal">Bernhard<br>
</p>
</div>
<br>
<fieldset
class="m_1080219594149766984mimeAttachmentHeader"></fieldset>
<br>
<pre>______________________________<wbr>_________________
Gluster-users mailing list
<a moz-do-not-send="true" class="m_1080219594149766984moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a moz-do-not-send="true" class="m_1080219594149766984moz-txt-link-freetext" href="http://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a></pre>
</blockquote>
<p>
</p>
</div>
</blockquote></div>
</div></div></div></div></div></div></div></div></div></div>
</blockquote><p>
</p></body></html>