<div dir="ltr">Output of glfsheal-gv0.log:<div><br><div><div>[2018-07-04 16:11:05.435680] I [MSGID: 114035] [client-handshake.c:202:<wbr>client_set_lk_version_cbk] 0-gv0-client-1: Server lk version = 1</div><div>[2018-07-04 16:11:05.436847] I [rpc-clnt.c:1986:rpc_clnt_<wbr>reconfig] 0-gv0-client-2: changing port to 49153 (from 0)</div><div>[2018-07-04 16:11:05.437722] W [MSGID: 114007] [client-handshake.c:1190:<wbr>client_setvolume_cbk] 0-gv0-client-0: failed to find key &#39;child_up&#39; in the options</div><div>[2018-07-04 16:11:05.437744] I [MSGID: 114046] [client-handshake.c:1231:<wbr>client_setvolume_cbk] 0-gv0-client-0: Connected to gv0-client-0, attached to remote volume &#39;/gluster/brick/brick0&#39;.</div><div>[2018-07-04 16:11:05.437755] I [MSGID: 114047] [client-handshake.c:1242:<wbr>client_setvolume_cbk] 0-gv0-client-0: Server and Client lk-version numbers are not same, reopening the fds</div><div>[2018-07-04 16:11:05.531514] I [MSGID: 108002] [afr-common.c:5312:afr_notify] 0-gv0-replicate-0: Client-quorum is met</div><div>[2018-07-04 16:11:05.531550] I [MSGID: 114035] [client-handshake.c:202:<wbr>client_set_lk_version_cbk] 0-gv0-client-0: Server lk version = 1</div><div>[2018-07-04 16:11:05.532115] I [MSGID: 114057] [client-handshake.c:1478:<wbr>select_server_supported_<wbr>programs] 0-gv0-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)</div><div>[2018-07-04 16:11:05.537528] I [MSGID: 114046] [client-handshake.c:1231:<wbr>client_setvolume_cbk] 0-gv0-client-2: Connected to gv0-client-2, attached to remote volume &#39;/gluster/brick/brick0&#39;.</div><div>[2018-07-04 16:11:05.537569] I [MSGID: 114047] [client-handshake.c:1242:<wbr>client_setvolume_cbk] 0-gv0-client-2: Server and Client lk-version numbers are not same, reopening the fds</div><div>[2018-07-04 16:11:05.544248] I [MSGID: 114035] [client-handshake.c:202:<wbr>client_set_lk_version_cbk] 0-gv0-client-2: Server lk version = 1</div><div>[2018-07-04 16:11:05.547665] I [MSGID: 108031] [afr-common.c:2458:afr_local_<wbr>discovery_cbk] 0-gv0-replicate-0: selecting local read_child gv0-client-1</div><div>[2018-07-04 16:11:05.556948] W [MSGID: 108027] [afr-common.c:2821:afr_<wbr>discover_done] 0-gv0-replicate-0: no read subvols for /</div><div>[2018-07-04 16:11:05.577751] W [MSGID: 108027] [afr-common.c:2821:afr_<wbr>discover_done] 0-gv0-replicate-0: no read subvols for /</div><div>[2018-07-04 16:11:05.577839] I [MSGID: 104041] [glfs-resolve.c:971:__glfs_<wbr>active_subvol] 0-gv0: switched to graph 6766732d-766d-3030-312d-<wbr>37373932362d (0)</div><div>[2018-07-04 16:11:05.578355] W [MSGID: 114031] [client-rpc-fops.c:2860:<wbr>client3_3_lookup_cbk] 0-gv0-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-<wbr>000000000000) [Invalid argument]</div><div>[2018-07-04 16:11:05.579562] W [MSGID: 114031] [client-rpc-fops.c:2860:<wbr>client3_3_lookup_cbk] 0-gv0-client-0: remote operation failed. Path: / (00000000-0000-0000-0000-<wbr>000000000000) [Invalid argument]</div><div>[2018-07-04 16:11:05.579776] W [MSGID: 114031] [client-rpc-fops.c:2860:<wbr>client3_3_lookup_cbk] 0-gv0-client-2: remote operation failed. Path: / (00000000-0000-0000-0000-<wbr>000000000000) [Invalid argument]</div></div><div><br></div></div><div>Removing the afr xattrs on node 3 did solve the split brain issue on root. Thank you!</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jul 4, 2018 at 9:01 AM, Ravishankar N <span dir="ltr">&lt;<a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF"><span class="">
    <p><br>
    </p>
    <br>
    <div class="m_-8322615140981260098moz-cite-prefix">On 07/04/2018 09:20 PM, Anh Vo wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr">I forgot to mention we&#39;re using 3.12.10</div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Wed, Jul 4, 2018 at 8:45 AM, Anh Vo
          <span dir="ltr">&lt;<a href="mailto:vtqanh@gmail.com" target="_blank">vtqanh@gmail.com</a>&gt;</span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr">If I run &quot;sudo gluster volume heal gv0
              split-brain latest-mtime /&quot; I get the following:
              <div>
                <div><br>
                </div>
                <div>Lookup failed on /:Invalid argument.</div>
                <div>Volume heal failed.</div>
              </div>
            </div>
          </blockquote>
        </div>
      </div>
    </blockquote>
    <br></span>
    Can you share the glfsheal-&lt;volname&gt;.log on the node where you
    ran this failed command?<span class=""><br>
    <blockquote type="cite">
      <div class="gmail_extra">
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr">
              <div>
                <div><br>
                </div>
              </div>
              <div>node2 was not connected at that time, because if we
                connect it to the system after a few minutes gluster
                will become almost unusable and we have many jobs
                failing. This morning I reconnected it and ran heal info
                and we have about 30000 entries to heal (15K from
                gfs-vm000 and 15k from gfs-vm001, 80% are all gfid, 20%
                have file names). It&#39;s not feasible for us to check the
                individual gfid so we kinda rely on gluster self heal to
                handle those gfid. The &quot;/&quot; is a concern because it
                prevents us from mounting nfs. We do need to mount nfs
                for some of our management because gluster fuse mount is
                much slower compared to nfs when it comes to recursive
                operations like &#39;du&#39; </div>
              <div><br>
              </div>
              <div>Do you have any suggestion for healing the metadata
                on &#39;/&#39; ?</div>
            </div>
          </blockquote>
        </div>
      </div>
    </blockquote></span>
    You can manually delete the afr xattrs on node 3 as a workaround:<br>
    <tt>setfattr -x trusted.afr.gv0-client-0 gluster/brick/brick0</tt><tt><br>
    </tt><tt>setfattr -x trusted.afr.gv0-client-1 gluster/brick/brick0</tt><br>
    <br>
    This should remove the split-brain on root.<br>
    <br>
    HTH,<br>
    Ravi<div><div class="h5"><br>
    <blockquote type="cite">
      <div class="gmail_extra">
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr">
              <div><br>
              </div>
              <div>Thanks</div>
              <span class="m_-8322615140981260098HOEnZb"><font color="#888888">
                  <div>Anh</div>
                </font></span></div>
            <div class="m_-8322615140981260098HOEnZb">
              <div class="m_-8322615140981260098h5">
                <div class="gmail_extra"><br>
                  <div class="gmail_quote">On Tue, Jul 3, 2018 at 8:02
                    PM, Ravishankar N <span dir="ltr">&lt;<a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>&gt;</span>
                    wrote:<br>
                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      <div text="#000000" bgcolor="#FFFFFF">
                        <p>Hi,</p>
                        <p>What version of gluster are you using?</p>
                        <p>1. The afr xattrs on &#39;/&#39; indicate a meta-data
                          split-brain. You can resolve it using one of
                          the policies listed in <a class="m_-8322615140981260098m_-2457352565565741707m_3342788517930548362moz-txt-link-freetext" href="https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/" target="_blank">https://docs.gluster.org/en/la<wbr>test/Troubleshooting/resolving<wbr>-splitbrain/</a></p>
                        <p>For example, &quot;<code>gluster volume heal gv0
                            split-brain latest-mtime / &quot;<br>
                          </code></p>
                        2. Is the file corresponding to the other gfid
                        (81289110-867b-42ff-ba3b-1373a<wbr>187032b)
                        present in all bricks? What do the getfattr
                        outputs for this file indicate?<br>
                        <br>
                        3. As for the discrepancy in output of heal
                        info, is node2 connected to the other nodes?
                        Does heal info still print the details of all 3
                        bricks when you run it on node2 ?<br>
                        -Ravi
                        <div>
                          <div class="m_-8322615140981260098m_-2457352565565741707h5"><br>
                            <br>
                            <div class="m_-8322615140981260098m_-2457352565565741707m_3342788517930548362moz-cite-prefix">On
                              07/04/2018 01:47 AM, Anh Vo wrote:<br>
                            </div>
                            <blockquote type="cite">Actually we just
                              discovered that the heal info command was
                              returning different things when executed
                              on the different nodes of our 3-replica
                              setup.
                              <div>When we execute it on node2 we did
                                not see the split brain reported &quot;/&quot; but
                                if I execute it on node0 and node1 I am
                                seeing:</div>
                              <div><br>
                                <div>x@gfs-vm001:~$ sudo gluster volume
                                  heal gv0 info | tee heal-info</div>
                                <div>Brick
                                  gfs-vm000:/gluster/brick/brick<wbr>0</div>
                                <div>&lt;gfid:81289110-867b-42ff-ba3b-<wbr>1373a187032b&gt;</div>
                                <div>/ - Is in split-brain</div>
                                <div><br>
                                </div>
                                <div>Status: Connected</div>
                                <div>Number of entries: 2</div>
                                <div><br>
                                </div>
                                <div>Brick
                                  gfs-vm001:/gluster/brick/brick<wbr>0</div>
                                <div>/ - Is in split-brain</div>
                                <div><br>
                                </div>
                                <div>&lt;gfid:81289110-867b-42ff-ba3b-<wbr>1373a187032b&gt;</div>
                                <div>Status: Connected</div>
                                <div>Number of entries: 2</div>
                                <div><br>
                                </div>
                                <div>Brick
                                  gfs-vm002:/gluster/brick/brick<wbr>0</div>
                                <div>/ - Is in split-brain</div>
                                <div><br>
                                </div>
                                <div>Status: Connected</div>
                                <div>Number of entries: 1</div>
                              </div>
                              <div><br>
                              </div>
                              <div><br>
                              </div>
                              <div>I ran getfattr -d -m . -e hex
                                /gluster/brick/brick0 on all three nodes
                                and I am seeing node2 has slightly
                                different attr:</div>
                              <div>node0:</div>
                              <div>
                                <div>sudo getfattr -d -m . -e hex
                                  /gluster/brick/brick0</div>
                                <div>getfattr: Removing leading &#39;/&#39; from
                                  absolute path names</div>
                                <div># file: gluster/brick/brick0</div>
                                <div>trusted.afr.gv0-client-2=0x000<wbr>000000000000100000000</div>
                                <div>trusted.gfid=0x000000000000000<wbr>00000000000000001</div>
                                <div>trusted.glusterfs.dht=0x000000<wbr>010000000000000000ffffffff</div>
                                <div>trusted.glusterfs.volume-id=0x<wbr>7fa3aac372d543f987ed0c66b77f02<wbr>e2</div>
                              </div>
                              <div><br>
                              </div>
                              <div>node1:</div>
                              <div>
                                <div>sudo getfattr -d -m . -e hex
                                  /gluster/brick/brick0</div>
                                <div>getfattr: Removing leading &#39;/&#39; from
                                  absolute path names</div>
                                <div># file: gluster/brick/brick0</div>
                                <div>trusted.afr.gv0-client-2=0x000<wbr>000000000000100000000</div>
                                <div>trusted.gfid=0x000000000000000<wbr>00000000000000001</div>
                                <div>trusted.glusterfs.dht=0x000000<wbr>010000000000000000ffffffff</div>
                                <div>trusted.glusterfs.volume-id=0x<wbr>7fa3aac372d543f987ed0c66b77f02<wbr>e2</div>
                              </div>
                              <div><br>
                              </div>
                              <div>node2:</div>
                              <div>
                                <div>sudo getfattr -d -m . -e hex
                                  /gluster/brick/brick0</div>
                                <div>getfattr: Removing leading &#39;/&#39; from
                                  absolute path names</div>
                                <div># file: gluster/brick/brick0</div>
                                <div>trusted.afr.dirty=0x0000000000<wbr>00000000000000</div>
                                <div>trusted.afr.gv0-client-0=0x000<wbr>000000000000200000000</div>
                                <div>trusted.afr.gv0-client-1=0x000<wbr>000000000000200000000</div>
                                <div>trusted.afr.gv0-client-2=0x000<wbr>000000000000000000000</div>
                                <div>trusted.gfid=0x000000000000000<wbr>00000000000000001</div>
                                <div>trusted.glusterfs.dht=0x000000<wbr>010000000000000000ffffffff</div>
                                <div>trusted.glusterfs.volume-id=0x<wbr>7fa3aac372d543f987ed0c66b77f02<wbr>e2</div>
                              </div>
                              <div><br>
                              </div>
                              <div>Where do I go from here? Thanks</div>
                            </blockquote>
                            <br>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </div>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br></div>