<div dir="ltr"><div class="gmail_default" style="font-family:monospace,monospace">​Just to let you know: I have reverted back to glusterfs 3.4.2 and everything is working again. No more disconnects, no more errors in the kernel log. So there *has* to be some kind of regression in the newer versions​. Sadly, I guess, it will be hard to find.</div></div><div class="gmail_extra"><br><div class="gmail_quote">2016-12-20 13:31 GMT+01:00 Micha Ober <span dir="ltr">&lt;<a href="mailto:micha2k@gmail.com" target="_blank">micha2k@gmail.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:monospace,monospace">Hi Rafi,</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">here are the log files:</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">NFS: <a href="http://paste.ubuntu.com/23658653/" target="_blank">http://paste.ubuntu.com/<wbr>23658653/</a></div><div class="gmail_default" style="font-family:monospace,monospace">Brick: <a href="http://paste.ubuntu.com/23658656/" target="_blank">http://paste.ubuntu.<wbr>com/23658656/</a><br><br>The brick log is of the brick which has caused the last disconnect at 2016-12-20 06:46:36 (0-gv0-client-7).</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">For completeness, here is also dmesg output: <a href="http://paste.ubuntu.com/23658691/" target="_blank">http://paste.ubuntu.<wbr>com/23658691/</a></div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">Regards,</div><div class="gmail_default" style="font-family:monospace,monospace">Micha</div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">2016-12-19 7:28 GMT+01:00 Mohammed Rafi K C <span dir="ltr">&lt;<a href="mailto:rkavunga@redhat.com" target="_blank">rkavunga@redhat.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    <p>Hi Micha,</p>
    <p>Sorry for the late reply. I was busy with some other things.</p>
    <p>If you have still the setup available Can you enable TRACE log
      level [1],[2] and see if you could find any log entries when the
      network start disconnecting. Basically I&#39;m trying to find out any
      disconnection had occurred other than ping timer expire issue.</p>
    <p><br>
    </p>
    <p><br>
    </p>
    <p>[1] : gluster volume &lt;volname&gt; diagnostics.brick-log-level
      TRACE</p>
    <p>[2] : gluster volume &lt;volname&gt; diagnostics.client-log-level
      TRACE<br>
    </p>
    <p><br>
    </p>
    <p>Regards</p>
    <p>Rafi KC<br>
    </p><div><div class="m_5591780224253036987h5">
    <br>
    <div class="m_5591780224253036987m_-6466580696529642375moz-cite-prefix">On 12/08/2016 07:59 PM, Atin Mukherjee
      wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Thu, Dec 8, 2016 at 4:37 PM, Micha
            Ober <span dir="ltr">&lt;<a href="mailto:micha2k@gmail.com" target="_blank">micha2k@gmail.com</a>&gt;</span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-cite-prefix">Hi
                  Rafi,<br>
                  <br>
                  thank you for your support. It is greatly appreciated.<br>
                  <br>
                  Just some more thoughts from my side:<br>
                  <br>
                  There have been no reports from other  users in *this*
                  thread until now, but I have found at least one user
                  with a very simiar problem in an older thread:<br>
                  <br>
                  <a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-freetext" href="https://www.gluster.org/pipermail/gluster-users/2014-November/019637.html" target="_blank">https://www.gluster.org/piperm<wbr>ail/gluster-users/2014-Novembe<wbr>r/019637.html</a><br>
                  <br>
                  He is also reporting disconnects  with no apparent
                  reasons, althogh his setup is a bit more complicated,
                  also involving a firewall. In our setup, all
                  servers/clients are connected via 1 GbE with no
                  firewall or anything that might block/throttle
                  traffic. Also, we are using exactly the same software
                  versions on all nodes.<br>
                  <br>
                  <br>
                  I can also find some reports in the bugtracker when
                  searching for &quot;rpc_client_ping_timer_expired<wbr>&quot; and
                  &quot;rpc_clnt_ping_timer_expired&quot; (looks like spelling
                  changed during versions).<br>
                  <br>
                  <a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1096729" target="_blank">https://bugzilla.redhat.com/sh<wbr>ow_bug.cgi?id=1096729</a></div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>Just FYI, this is a different issue, here GlusterD
              fails to handle the volume of incoming requests on time
              since MT-epoll is not enabled here.<br>
               <br>
            </div>
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-cite-prefix"><br>
                  <a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1370683" target="_blank">https://bugzilla.redhat.com/sh<wbr>ow_bug.cgi?id=1370683</a><br>
                  <br>
                  But both reports involve large traffic/load on the
                  bricks/disks, which is not the case for out setup.<br>
                  To give a ballpark figure: Over three days, 30 GiB
                  were written. And the data was not written at once,
                  but continuously over the whole time.<br>
                  <br>
                  <br>
                  Just to be sure, I have checked the logfiles of one of
                  the other clusters right now, which are sitting in the
                  same building, in the same rack, even on the same
                  switch, running the same jobs, but with glusterfs
                  3.4.2 and I can see no disconnects in the logfiles. So
                  I can definitely rule out our infrastructure as
                  problem.<br>
                  <br>
                  Regards,<br>
                  Micha
                  <div>
                    <div class="m_5591780224253036987m_-6466580696529642375h5"><br>
                      <br>
                      <br>
                      Am 07.12.2016 um 18:08 schrieb Mohammed Rafi K C:<br>
                    </div>
                  </div>
                </div>
                <div>
                  <div class="m_5591780224253036987m_-6466580696529642375h5">
                    <blockquote type="cite">
                      <p>Hi Micha,</p>
                      <p>This is great. I will provide you one debug
                        build which has two fixes which I possible
                        suspect for a frequent disconnect issue, though
                        I don&#39;t have much data to validate my theory. So
                        I will take one more day to dig in to that.</p>
                      <p>Thanks for your support, and opensource++  </p>
                      <p>Regards</p>
                      <p>Rafi KC<br>
                      </p>
                      <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-cite-prefix">On
                        12/07/2016 05:02 AM, Micha Ober wrote:<br>
                      </div>
                      <blockquote type="cite">
                        <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-cite-prefix">Hi,<br>
                          <br>
                          thank you for your answer and even more for
                          the question!<br>
                          Until now, I was using FUSE. Today I changed
                          all mounts to NFS using the same 3.7.17
                          version.<br>
                          <br>
                          But: The problem is still the same. Now, the
                          NFS logfile contains lines like these:<br>
                          <br>
                          [2016-12-06 15:12:29.006325] C
                          [rpc-clnt-ping.c:165:rpc_clnt_<wbr>ping_timer_expired]
                          0-gv0-client-7: server X.X.18.62:49153 has not
                          responded in the last 42 seconds,
                          disconnecting.<br>
                          <br>
                          Interestingly enough,  the IP address
                          X.X.18.62 is the same machine! As I wrote
                          earlier, each node serves both as a server and
                          a client, as each node contributes bricks to
                          the volume. Every server is connecting to
                          itself via its hostname. For example, the
                          fstab on the node &quot;giant2&quot; looks like:<br>
                          <br>
                          #giant2:/gv0    /shared_data   
                          glusterfs       defaults,noauto 0       0<br>
                          #giant2:/gv2    /shared_slurm  
                          glusterfs       defaults,noauto 0       0<br>
                          <br>
                          giant2:/gv0     /shared_data   
                          nfs             defaults,_netdev,vers=3
                          0       0<br>
                          giant2:/gv2     /shared_slurm  
                          nfs             defaults,_netdev,vers=3
                          0       0<br>
                          <br>
                          So I understand the disconnects even less. <br>
                          <br>
                          I don&#39;t know if it&#39;s possible to create a
                          dummy cluster which exposes the same
                          behaviour, because the disconnects only happen
                          when there are compute jobs running on those
                          nodes - and they are GPU compute jobs, so
                          that&#39;s something which cannot be easily
                          emulated in a VM.<br>
                          <br>
                          As we have more clusters (which are running
                          fine with an ancient 3.4 version :-)) and we
                          are currently not dependent on this particular
                          cluster (which may stay like this for this
                          month, I think) I should be able to deploy the
                          debug build on the &quot;real&quot; cluster, if you can
                          provide a debug build.<br>
                          <br>
                          Regards and thanks,<br>
                          Micha<br>
                          <br>
                          <br>
                          <br>
                          Am 06.12.2016 um 08:15 schrieb Mohammed Rafi K
                          C:<br>
                        </div>
                        <blockquote type="cite">
                          <p><br>
                          </p>
                          <br>
                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-cite-prefix">On
                            12/03/2016 12:56 AM, Micha Ober wrote:<br>
                          </div>
                          <blockquote type="cite">
                            <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-cite-prefix"><tt>**
                                Update: ** I have downgraded from 3.8.6
                                to 3.7.17 now, but the problem still
                                exists.</tt><tt><br>
                              </tt></div>
                          </blockquote>
                          <blockquote type="cite">
                            <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-cite-prefix"><tt>
                              </tt><tt><br>
                              </tt><tt>Client log: <a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-freetext" href="http://paste.ubuntu.com/23569065/" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-freetext" href="http://paste.ubuntu.com/" target="_blank">http://paste.ubuntu.com/</a>235690<wbr>65/</tt><tt><br>
                              </tt><tt>Brick log: <a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-freetext" href="http://paste.ubuntu.com/23569067/" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-freetext" href="http://paste.ubuntu.com/" target="_blank">http://paste.ubuntu.com/</a>235690<wbr>67/</tt><tt><br>
                              </tt><tt><br>
                              </tt><tt>Please note that each server has
                                two bricks.</tt><tt><br>
                              </tt><tt>Whereas, according to the logs,
                                one brick loses the connection to all
                                other hosts:</tt><tt><br>
                              </tt>
                              <pre style="color:rgb(0,0,0);font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px">[2016-12-02 18:38:53.703301] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.219:49121 failed (Broken pipe)
[2016-12-02 18:38:53.703381] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.62:49118 failed (Broken pipe)
[2016-12-02 18:38:53.703380] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.107:49121 failed (Broken pipe)
[2016-12-02 18:38:53.703424] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.206:49120 failed (Broken pipe)
[2016-12-02 18:38:53.703359] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.58:49121 failed (Broken pipe)

The SECOND brick on the SAME host is NOT affected, i.e. no disconnects!
As I said, the network connection is fine and the disks are idle.
The CPU always has 2 free cores.

It looks like I have to downgrade to 3.4 now in order for the disconnects to stop.</pre>
                            </div>
                          </blockquote>
                          <br>
                          Hi Micha,<br>
                          <br>
                          Thanks for the update and sorry for what
                          happened with gluster higher versions. I can
                          understand the need for downgrade as it is a
                          production setup.<br>
                          <br>
                          Can you tell me the clients used here ?
                          whether it is a fuse,nfs,nfs-ganesha, smb or
                          libgfapi ?<br>
                          <br>
                          Since I&#39;m not able to reproduce the issue (I
                          have been trying from last 3days) and the logs
                          are not much helpful here (we don&#39;t have much
                          logs in socket layer), Could you please create
                          a dummy cluster and try to reproduce the
                          issue? If then we can play with that volume
                          and I could provide some debug build which we
                          can use for further debugging?<br>
                          <br>
                          If you don&#39;t have bandwidth for this, please
                          leave it ;).<br>
                          <br>
                          Regards<br>
                          Rafi KC<br>
                          <br>
                          <blockquote type="cite">
                            <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-cite-prefix">
                              <pre style="color:rgb(0,0,0);font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px">- Micha
</pre>
                              <br>
                              Am 30.11.2016 um 06:57 schrieb Mohammed
                              Rafi K C:<br>
                            </div>
                            <blockquote type="cite">
                              <p>Hi Micha,</p>
                              <p>I have changed the thread and subject
                                so that your original thread remain same
                                for your query. Let&#39;s try to fix the
                                problem what you observed with 3.8.4, So
                                I have started a new thread to discuss
                                the frequent disconnect problem.</p>
                              <p><b>If any one else has experienced the
                                  same problem, please respond to the
                                  mail.</b><br>
                              </p>
                              <p>It would be very helpful if you could
                                give us some more logs from clients and
                                bricks.  Also any reproducible steps
                                will surely help to chase the problem
                                further.</p>
                              <p>Regards</p>
                              <p>Rafi KC<br>
                              </p>
                              <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-cite-prefix">On
                                11/30/2016 04:44 AM, Micha Ober wrote:<br>
                              </div>
                              <blockquote type="cite">
                                <div dir="ltr">
                                  <div>
                                    <div><font face="monospace,
                                        monospace">I had opened another
                                        thread on this mailing list
                                        (Subject: &quot;After upgrade from
                                        3.4.2 to 3.8.5 - High CPU usage
                                        resulting in disconnects and
                                        split-brain&quot;).</font></div>
                                    <div><font face="monospace,
                                        monospace"><br>
                                      </font></div>
                                    <div><font face="monospace,
                                        monospace">The title may be a
                                        bit misleading now, as I am no
                                        longer observing high CPU usage
                                        after upgrading to 3.8.6, but
                                        the disconnects are still
                                        happening and the number of
                                        files in split-brain is growing.</font></div>
                                    <div><font face="monospace,
                                        monospace"><br>
                                      </font></div>
                                    <div><font face="monospace,
                                        monospace">Setup: 6 compute
                                        nodes, each serving as a
                                        glusterfs server and client,
                                        Ubuntu 14.04, two bricks per
                                        node, distribute-replicate</font></div>
                                    <div><font face="monospace,
                                        monospace"><br>
                                      </font></div>
                                    <div><font face="monospace,
                                        monospace">I have two gluster
                                        volumes set up (one for scratch
                                        data, one for the slurm
                                        scheduler). Only the scratch
                                        data volume shows critical
                                        errors &quot;[...] has not responded
                                        in the last 42 seconds,
                                        disconnecting.&quot;. So I can rule
                                        out network problems, the
                                        gigabit link between the nodes
                                        is not saturated at all. The
                                        disks are almost idle (&lt;10%).</font></div>
                                    <div><font face="monospace,
                                        monospace"><br>
                                      </font></div>
                                    <div><font face="monospace,
                                        monospace">I have glusterfs
                                        3.4.2 on Ubuntu 12.04 on a
                                        another compute cluster, running
                                        fine since it was deployed.</font></div>
                                    <div><font face="monospace,
                                        monospace">I had glusterfs 3.4.2
                                        on Ubuntu 14.04 on this cluster,
                                        running fine for almost a year.</font></div>
                                    <div><font face="monospace,
                                        monospace"><br>
                                      </font></div>
                                    <div><font face="monospace,
                                        monospace">After upgrading to
                                        3.8.5, the problems (as
                                        described) started. I would like
                                        to use some of the new features
                                        of the newer versions (like
                                        bitrot), but the users can&#39;t run
                                        their compute jobs right now
                                        because the result files are
                                        garbled.</font></div>
                                    <div><font face="monospace,
                                        monospace"><br>
                                      </font></div>
                                    <div><font face="monospace,
                                        monospace">There also seems to
                                        be a bug report with a smiliar
                                        problem: (but no progress)</font></div>
                                    <div><font face="monospace,
                                        monospace"><a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1370683" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-freetext" href="https://bugzilla.redhat.com/" target="_blank">https://bugzilla.redhat.com/</a>sh<wbr>ow_bug.cgi?id=1370683</font></div>
                                    <div><font face="monospace,
                                        monospace"><br>
                                      </font></div>
                                    <div><font face="monospace,
                                        monospace">For me, ALL servers
                                        are affected (not isolated to
                                        one or two servers)</font></div>
                                    <div><font face="monospace,
                                        monospace"><br>
                                      </font></div>
                                    <div><font face="monospace,
                                        monospace">I also see messages
                                        like <a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-rfc2396E">&quot;INFO: task
                                          gpu_graphene_bv:4476 blocked
                                          for more than 120 seconds.&quot;</a>
                                        in the syslog.</font></div>
                                    <div><font face="monospace,
                                        monospace"><br>
                                      </font></div>
                                    <div><font face="monospace,
                                        monospace">For completeness (gv0
                                        is the scratch volume, gv2 the
                                        slurm volume):</font></div>
                                    <div><font face="monospace,
                                        monospace"><br>
                                      </font></div>
                                    <div><font face="monospace,
                                        monospace">[root@giant2: ~]#
                                        gluster v info</font></div>
                                    <div><font face="monospace,
                                        monospace"><br>
                                      </font></div>
                                    <div><font face="monospace,
                                        monospace">Volume Name: gv0</font></div>
                                    <div><font face="monospace,
                                        monospace">Type:
                                        Distributed-Replicate</font></div>
                                    <div><font face="monospace,
                                        monospace">Volume ID:
                                        993ec7c9-e4bc-44d0-b7c4-2d977e<wbr>622e86</font></div>
                                    <div><font face="monospace,
                                        monospace">Status: Started</font></div>
                                    <div><font face="monospace,
                                        monospace">Snapshot Count: 0</font></div>
                                    <div><font face="monospace,
                                        monospace">Number of Bricks: 6 x
                                        2 = 12</font></div>
                                    <div><font face="monospace,
                                        monospace">Transport-type: tcp</font></div>
                                    <div><font face="monospace,
                                        monospace">Bricks:</font></div>
                                    <div><font face="monospace,
                                        monospace">Brick1:
                                        giant1:/gluster/sdc/gv0</font></div>
                                    <div><font face="monospace,
                                        monospace">Brick2:
                                        giant2:/gluster/sdc/gv0</font></div>
                                    <div><font face="monospace,
                                        monospace">Brick3:
                                        giant3:/gluster/sdc/gv0</font></div>
                                    <div><font face="monospace,
                                        monospace">Brick4:
                                        giant4:/gluster/sdc/gv0</font></div>
                                    <div><font face="monospace,
                                        monospace">Brick5:
                                        giant5:/gluster/sdc/gv0</font></div>
                                    <div><font face="monospace,
                                        monospace">Brick6:
                                        giant6:/gluster/sdc/gv0</font></div>
                                    <div><font face="monospace,
                                        monospace">Brick7:
                                        giant1:/gluster/sdd/gv0</font></div>
                                    <div><font face="monospace,
                                        monospace">Brick8:
                                        giant2:/gluster/sdd/gv0</font></div>
                                    <div><font face="monospace,
                                        monospace">Brick9:
                                        giant3:/gluster/sdd/gv0</font></div>
                                    <div><font face="monospace,
                                        monospace">Brick10:
                                        giant4:/gluster/sdd/gv0</font></div>
                                    <div><font face="monospace,
                                        monospace">Brick11:
                                        giant5:/gluster/sdd/gv0</font></div>
                                    <div><font face="monospace,
                                        monospace">Brick12:
                                        giant6:/gluster/sdd/gv0</font></div>
                                    <div><font face="monospace,
                                        monospace">Options Reconfigured:</font></div>
                                    <div><font face="monospace,
                                        monospace">auth.allow:
                                        X.X.X.*,127.0.0.1</font></div>
                                    <div><font face="monospace,
                                        monospace">nfs.disable: on</font></div>
                                    <div><font face="monospace,
                                        monospace"><br>
                                      </font></div>
                                    <div><font face="monospace,
                                        monospace">Volume Name: gv2</font></div>
                                    <div><font face="monospace,
                                        monospace">Type: Replicate</font></div>
                                    <div><font face="monospace,
                                        monospace">Volume ID:
                                        30c78928-5f2c-4671-becc-8deaee<wbr>1a7a8d</font></div>
                                    <div><font face="monospace,
                                        monospace">Status: Started</font></div>
                                    <div><font face="monospace,
                                        monospace">Snapshot Count: 0</font></div>
                                    <div><font face="monospace,
                                        monospace">Number of Bricks: 1 x
                                        2 = 2</font></div>
                                    <div><font face="monospace,
                                        monospace">Transport-type: tcp</font></div>
                                    <div><font face="monospace,
                                        monospace">Bricks:</font></div>
                                    <div><font face="monospace,
                                        monospace">Brick1:
                                        giant1:/gluster/sdd/gv2</font></div>
                                    <div><font face="monospace,
                                        monospace">Brick2:
                                        giant2:/gluster/sdd/gv2</font></div>
                                    <div><font face="monospace,
                                        monospace">Options Reconfigured:</font></div>
                                    <div><font face="monospace,
                                        monospace">auth.allow:
                                        X.X.X.*,127.0.0.1</font></div>
                                    <div><font face="monospace,
                                        monospace">cluster.granular-entry-heal:
                                        on</font></div>
                                    <div><font face="monospace,
                                        monospace">cluster.locking-scheme:
                                        granular</font></div>
                                    <div><font face="monospace,
                                        monospace">nfs.disable: on</font></div>
                                    <div style="font-family:monospace,monospace"><br>
                                    </div>
                                  </div>
                                </div>
                                <div class="gmail_extra"><br>
                                  <div class="gmail_quote">2016-11-30
                                    0:10 GMT+01:00 Micha Ober <span dir="ltr">&lt;<a href="mailto:micha2k@gmail.com" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com" target="_blank">micha2k@gmail.com</a>&gt;</span>:<br>
                                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                      <div dir="ltr">
                                        <div style="font-family:monospace,monospace">There
                                          also seems to be a bug report
                                          with a smiliar problem: (but
                                          no progress)</div>
                                        <div><font face="monospace,
                                            monospace"><a href="https://bugzilla.redhat.com/show_bug.cgi?id=1370683" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-freetext" href="https://bugzilla.redhat.com/sh" target="_blank">https://bugzilla.redhat.com/sh</a><wbr>ow_bug.cgi?id=1370683</font><br>
                                        </div>
                                        <div><font face="monospace,
                                            monospace"><br>
                                          </font></div>
                                        <div><font face="monospace,
                                            monospace">For me, ALL
                                            servers are affected (not
                                            isolated to one or two
                                            servers)</font></div>
                                        <div><font face="monospace,
                                            monospace"><br>
                                          </font></div>
                                        <div><font face="monospace,
                                            monospace">I also see
                                            messages like <a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-rfc2396E"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-rfc2396E">&quot;INFO:
                                              task gpu_graphene_bv:4476
                                              blocked for more than 120
                                              seconds.&quot;</a> in the
                                            syslog.</font></div>
                                        <div><font face="monospace,
                                            monospace"><br>
                                          </font></div>
                                        <div><font face="monospace,
                                            monospace">For completeness
                                            (gv0 is the scratch volume,
                                            gv2 the slurm volume):</font></div>
                                        <div><font face="monospace,
                                            monospace"><br>
                                          </font></div>
                                        <div><font face="monospace,
                                            monospace">
                                            <div>[root@giant2: ~]#
                                              gluster v info</div>
                                            <div><br>
                                            </div>
                                            <div>Volume Name: gv0</div>
                                            <div>Type:
                                              Distributed-Replicate</div>
                                            <div>Volume ID:
                                              993ec7c9-e4bc-44d0-b7c4-2d977e<wbr>622e86</div>
                                            <div>Status: Started</div>
                                            <div>Snapshot Count: 0</div>
                                            <div>Number of Bricks: 6 x 2
                                              = 12</div>
                                            <div>Transport-type: tcp</div>
                                            <div>Bricks:</div>
                                            <div>Brick1:
                                              giant1:/gluster/sdc/gv0</div>
                                            <div>Brick2:
                                              giant2:/gluster/sdc/gv0</div>
                                            <div>Brick3:
                                              giant3:/gluster/sdc/gv0</div>
                                            <div>Brick4:
                                              giant4:/gluster/sdc/gv0</div>
                                            <div>Brick5:
                                              giant5:/gluster/sdc/gv0</div>
                                            <div>Brick6:
                                              giant6:/gluster/sdc/gv0</div>
                                            <div>Brick7:
                                              giant1:/gluster/sdd/gv0</div>
                                            <div>Brick8:
                                              giant2:/gluster/sdd/gv0</div>
                                            <div>Brick9:
                                              giant3:/gluster/sdd/gv0</div>
                                            <div>Brick10:
                                              giant4:/gluster/sdd/gv0</div>
                                            <div>Brick11:
                                              giant5:/gluster/sdd/gv0</div>
                                            <div>Brick12:
                                              giant6:/gluster/sdd/gv0</div>
                                            <div>Options Reconfigured:</div>
                                            <div>auth.allow:
                                              X.X.X.*,127.0.0.1</div>
                                            <div>nfs.disable: on</div>
                                            <div><br>
                                            </div>
                                            <div>Volume Name: gv2</div>
                                            <div>Type: Replicate</div>
                                            <div>Volume ID:
                                              30c78928-5f2c-4671-becc-8deaee<wbr>1a7a8d</div>
                                            <div>Status: Started</div>
                                            <div>Snapshot Count: 0</div>
                                            <div>Number of Bricks: 1 x 2
                                              = 2</div>
                                            <div>Transport-type: tcp</div>
                                            <div>Bricks:</div>
                                            <div>Brick1:
                                              giant1:/gluster/sdd/gv2</div>
                                            <div>Brick2:
                                              giant2:/gluster/sdd/gv2</div>
                                            <div>Options Reconfigured:</div>
                                            <div>auth.allow:
                                              X.X.X.*,127.0.0.1</div>
                                            <div>cluster.granular-entry-heal:
                                              on</div>
                                            <div>cluster.locking-scheme:
                                              granular</div>
                                            <div>nfs.disable: on</div>
                                            <div><br>
                                            </div>
                                          </font></div>
                                      </div>
                                      <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127HOEnZb">
                                        <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127h5">
                                          <div class="gmail_extra"><br>
                                            <div class="gmail_quote">2016-11-29
                                              19:21 GMT+01:00 Micha Ober
                                              <span dir="ltr">&lt;<a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com" target="_blank">micha2k@gmail.com</a>&gt;</span>:<br>
                                              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                                <div dir="ltr">
                                                  <div style="font-family:monospace,monospace">I
                                                    had opened another
                                                    thread on this
                                                    mailing list
                                                    (Subject: &quot;After
                                                    upgrade from 3.4.2
                                                    to 3.8.5 - High CPU
                                                    usage resulting in
                                                    disconnects and
                                                    split-brain&quot;).</div>
                                                  <div style="font-family:monospace,monospace"><br>
                                                  </div>
                                                  <div style="font-family:monospace,monospace">The
                                                    title may be a bit
                                                    misleading now, as I
                                                    am no longer
                                                    observing high CPU
                                                    usage after
                                                    upgrading to 3.8.6,
                                                    but the disconnects
                                                    are still happening
                                                    and the number of
                                                    files in split-brain
                                                    is growing.<br>
                                                  </div>
                                                  <div style="font-family:monospace,monospace"><br>
                                                  </div>
                                                  <div style="font-family:monospace,monospace">Setup:
                                                    6 compute nodes,
                                                    each serving as a
                                                    glusterfs server and
                                                    client, Ubuntu
                                                    14.04, two bricks
                                                    per node,
                                                    distribute-replicate</div>
                                                  <div style="font-family:monospace,monospace"><br>
                                                  </div>
                                                  <div style="font-family:monospace,monospace">I
                                                    have two gluster
                                                    volumes set up (one
                                                    for scratch data,
                                                    one for the slurm
                                                    scheduler). Only the
                                                    scratch data volume
                                                    shows critical
                                                    errors &quot;[...] has
                                                    not responded in the
                                                    last 42 seconds,
                                                    disconnecting.&quot;. So
                                                    I can rule out
                                                    network problems,
                                                    the gigabit link
                                                    between the nodes is
                                                    not saturated at
                                                    all. The disks are
                                                    almost idle
                                                    (&lt;10%).</div>
                                                  <div style="font-family:monospace,monospace"><br>
                                                  </div>
                                                  <div style="font-family:monospace,monospace">I
                                                    have glusterfs 3.4.2
                                                    on Ubuntu 12.04 on a
                                                    another compute
                                                    cluster, running
                                                    fine since it was
                                                    deployed.</div>
                                                  <div style="font-family:monospace,monospace">I
                                                    had glusterfs 3.4.2
                                                    on Ubuntu 14.04 on
                                                    this cluster,
                                                    running fine for
                                                    almost a year.</div>
                                                  <div style="font-family:monospace,monospace"><br>
                                                  </div>
                                                  <div style="font-family:monospace,monospace">After
                                                    upgrading to 3.8.5,
                                                    the problems (as
                                                    described) started.
                                                    I would like to use
                                                    some of the new
                                                    features of the
                                                    newer versions (like
                                                    bitrot), but the
                                                    users can&#39;t run
                                                    their compute jobs
                                                    right now because
                                                    the result files are
                                                    garbled.</div>
                                                </div>
                                                <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071HOEnZb">
                                                  <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071h5">
                                                    <div class="gmail_extra"><br>
                                                      <div class="gmail_quote">2016-11-29
                                                        18:53 GMT+01:00
                                                        Atin Mukherjee <span dir="ltr">&lt;<a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-abbreviated" href="mailto:amukherj@redhat.com" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-abbreviated" href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>&gt;</span>:<br>
                                                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                                          <div style="white-space:pre-wrap">Would you be able to share what is not working for you in 3.8.x (mention the exact version). 3.4 is quite old and falling back to an unsupported version doesn&#39;t look a feasible option.</div>
                                                          <br>
                                                          <div class="gmail_quote">
                                                          <div>
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209h5">
                                                          <div dir="ltr">On
                                                          Tue, 29 Nov
                                                          2016 at 17:01,
                                                          Micha Ober
                                                          &lt;<a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com" target="_blank">micha2k@gmail.com</a>&gt; wrote:<br>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                                          <div>
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209h5">
                                                          <div dir="ltr" class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">Hi,</div>
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace"><br class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
                                                          </div>
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">I was using gluster 3.4 and
                                                          upgraded to
                                                          3.8, but that
                                                          version showed
                                                          to be unusable
                                                          for me. I now
                                                          need to
                                                          downgrade.</div>
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace"><br class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
                                                          </div>
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">I&#39;m running Ubuntu 14.04. As
                                                          upgrades of
                                                          the op version
are irreversible, I guess I have to delete all gluster volumes and
                                                          re-create them
                                                          with the
                                                          downgraded
                                                          version. </div>
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace"><br class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
                                                          </div>
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">0. Backup data</div>
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">1. Unmount all gluster volumes</div>
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">2. apt-get purge
                                                          glusterfs-server
glusterfs-client</div>
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">3. Remove PPA for 3.8</div>
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">4. Add PPA for older version</div>
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">5. apt-get install
                                                          glusterfs-server
glusterfs-client</div>
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">6. Create volumes</div>
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace"><br class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
                                                          </div>
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">Is &quot;purge&quot; enough to delete all
                                                          configuration
                                                          files of the
                                                          currently
                                                          installed
                                                          version or do
                                                          I need to
                                                           manually
                                                          clear some
                                                          residues
                                                          before
                                                          installing an
                                                          older version?</div>
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace"><br class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
                                                          </div>
                                                          <div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">Thanks.</div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          <span>
                                                          ______________________________<wbr>_________________<br class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
                                                          Gluster-users
                                                          mailing list<br class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
                                                          <a href="mailto:Gluster-users@gluster.org" class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
                                                          <a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-freetext" href="http://www.gluster.org/mailman" target="_blank">http://www.gluster.org/mailman</a><wbr>/listinfo/gluster-users</span></blockquote>
                                                          </div>
                                                          <span class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209HOEnZb"><font color="#888888">
                                                          <div dir="ltr">--
                                                          <br>
                                                          </div>
                                                          <div data-smartmail="gmail_signature">-
                                                          Atin (atinm)</div>
                                                          </font></span></blockquote>
                                                      </div>
                                                      <br>
                                                    </div>
                                                  </div>
                                                </div>
                                              </blockquote>
                                            </div>
                                            <br>
                                          </div>
                                        </div>
                                      </div>
                                    </blockquote>
                                  </div>
                                  <br>
                                </div>
                                <br>
                                <fieldset class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127mimeAttachmentHeader"></fieldset>
                                <br>
                                <pre>______________________________<wbr>_________________
Gluster-users mailing list
<a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://www.gluster.org/mailman<wbr>/listinfo/gluster-users</a></pre>
              </blockquote>
              

            </blockquote>
            <p>

            </p>
          </blockquote>
          

        </blockquote>
        <p>

        </p>
      </blockquote>
      

    </blockquote>
    <p>

    </p>
  </div></div></div>

</blockquote></div>


-- 
<div class="m_5591780224253036987m_-6466580696529642375gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr">
</div><div>~ Atin (atinm)
</div></div></div></div>
</div></div>



</blockquote>
</div></div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>