<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Thanks for the suggestions. My question is if the risk is

      actually related to only losing the file/dir or actually creating

      inconsistencies that span through the bricks and "break

      everything".<br>

      Of course we have to take action anyway for this not to spread (as

      we already now have a second entry that developed an "unhealable"

      directory split-brain) so it is just a question of evaluation

      before acting.<br>

    </p>

    <div class="moz-cite-prefix">Am 12.08.22 um 18:12 schrieb Thomas

      Bätzler:<br>

    </div>

    <blockquote type="cite"

      cite="mid:6d41586e-95f4-a10c-4355-a364ca317b69@bringe.com">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <div class="moz-cite-prefix">Am 12.08.2022 um 17:12 schrieb Ilias

        Chasapakis forumZFD:<br>

      </div>

      <blockquote type="cite"

        cite="mid:41bf4390-aa77-ee49-d1b6-61944abd84ea@forumZFD.de">

        <meta http-equiv="content-type" content="text/html;

          charset=UTF-8">

        <p>Dear fellow gluster users,</p>

        <p>we are facing a problem with our replica 3 setup. Glusterfs

          version is 9.2.<br>

        </p>

        <p>We have a problem with a directory that is in split-brain and

          we cannot manage to heal with:</p>

        <p> </p>

        <blockquote type="cite">

          <p>gluster volume heal gfsVol split-brain latest-mtime /folder</p>

        </blockquote>

        <p>The command throws the following error: "failed:Transport

          endpoint is not connected." <br>

        </p>

        <p>So the split brain directory entry remains and and so the

          whole healing process is not completing and other entries get

          stuck.<br>

        </p>

        <p>I saw there is a python script available <a target="_blank"

            class="c-link moz-txt-link-freetext"

            data-stringify-link="https://github.com/joejulian/glusterfs-splitbrain"

            data-sk="tooltip_parent"

            href="https://github.com/joejulian/glusterfs-splitbrain"

            rel="noopener noreferrer" tabindex="-1"

            data-remove-tab-index="true" moz-do-not-send="true">https://github.com/joejulian/glusterfs-splitbrain</a>

          Would that be a good solution to try? To be honest we are a

          bit concerned with deleting the gfid and the files from the

          brick manually as it seems it can create inconsistencies and

          break things... I can of course give you more information

          about our setup and situation, but if you already have some

          tip, that would be fantastic.</p>

      </blockquote>

      <p>You could at least verify what's going on: Go to your brick

        roots and list /folder from each. You have 3n bricks with n

        replica sets. Find the replica set where you can spot a

        difference. It's most likely a file or directory that's missing

        or different. If it's a file, do a ls -ain on the file on each

        brick in the replica set. It'll report an inode number. Do a

        find .glusterfs -inum from the brick root. You'll likely see

        that you have different gfid-files.</p>

      <p>To fix the problem, you have to help gluster along by cleaning

        up the mess. This is completely "do it at your own risk, it

        worked for me, ymmv": cp (not mv!) a copy of the file you want

        to keep. On each brick in the replica-set, delete the gfid-file

        and the datafile. Try a heal on the volume and verify that you

        can access the path in question using the glusterfs mount. Copy

        back your salvaged file using the glusterfs mount.</p>

      <p>We had this happening quite often on a heavily loaded glusterfs

        shared filesystem that held a mail-spool. There would be

        parallel accesses trying to mv files and sometimes we'd end up

        with mismatched data on the bricks of the replica set. I've

        reported this on github, but apparently it wasn't seen as a

        serious problem. We've moved on to ceph FS now. That sure has

        bugs, too, but hopefully not as aggravating.<br>

      </p>

      <p> </p>

      <pre class="moz-signature" cols="72">MfG,

i.A. Thomas Bätzler

-- 

BRINGE Informationstechnik GmbH

Zur Seeplatte 12

D-76228 Karlsruhe

Germany

Fon: +49 721 94246-0

Fon: +49 171 5438457

Fax: +49 721 94246-66

Web: <a class="moz-txt-link-freetext" href="http://www.bringe.de/" moz-do-not-send="true">http://www.bringe.de/</a>

Geschäftsführer: Dipl.-Ing. (FH) Martin Bringe

Ust.Id: DE812936645, HRB 108943 Mannheim</pre>

      <br>

      <fieldset class="moz-mime-attachment-header"></fieldset>

      <pre class="moz-quote-pre" wrap="">________

Community Meeting Calendar:

Schedule -

Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC

Bridge: <a class="moz-txt-link-freetext" href="https://meet.google.com/cpu-eiue-hvk">https://meet.google.com/cpu-eiue-hvk</a>

Gluster-users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>

<a class="moz-txt-link-freetext" href="https://lists.gluster.org/mailman/listinfo/gluster-users">https://lists.gluster.org/mailman/listinfo/gluster-users</a>

</pre>

    </blockquote>

    <pre class="moz-signature" cols="72">-- 

forumZFD

Entschieden für Frieden | Committed to Peace

Ilias Chasapakis

Referent IT | IT Consultant

Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service

Am Kölner Brett 8 | 50825 Köln | Germany

Tel 0221 91273243 | Fax 0221 91273299 | <a class="moz-txt-link-freetext" href="http://www.forumZFD.de">http://www.forumZFD.de</a>

Vorstand nach § 26 BGB, einzelvertretungsberechtigt | Executive Board:

Oliver Knabe (Vorsitz | Chair), Jens von Bargen, Alexander Mauz

VR 17651 Amtsgericht Köln

Spenden | Donations: IBAN DE37 3702 0500 0008 2401 01 BIC BFSWDE33XXX</pre>

  </body>

</html>