<div dir="ltr"><div class="gmail_default" style="font-family:monospace,monospace">Just to let you know: I have reverted back to glusterfs 3.4.2 and everything is working again. No more disconnects, no more errors in the kernel log. So there *has* to be some kind of regression in the newer versions. Sadly, I guess, it will be hard to find.</div></div><div class="gmail_extra"><br><div class="gmail_quote">2016-12-20 13:31 GMT+01:00 Micha Ober <span dir="ltr"><<a href="mailto:micha2k@gmail.com" target="_blank">micha2k@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:monospace,monospace">Hi Rafi,</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">here are the log files:</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">NFS: <a href="http://paste.ubuntu.com/23658653/" target="_blank">http://paste.ubuntu.com/<wbr>23658653/</a></div><div class="gmail_default" style="font-family:monospace,monospace">Brick: <a href="http://paste.ubuntu.com/23658656/" target="_blank">http://paste.ubuntu.<wbr>com/23658656/</a><br><br>The brick log is of the brick which has caused the last disconnect at 2016-12-20 06:46:36 (0-gv0-client-7).</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">For completeness, here is also dmesg output: <a href="http://paste.ubuntu.com/23658691/" target="_blank">http://paste.ubuntu.<wbr>com/23658691/</a></div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">Regards,</div><div class="gmail_default" style="font-family:monospace,monospace">Micha</div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">2016-12-19 7:28 GMT+01:00 Mohammed Rafi K C <span dir="ltr"><<a href="mailto:rkavunga@redhat.com" target="_blank">rkavunga@redhat.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<p>Hi Micha,</p>
<p>Sorry for the late reply. I was busy with some other things.</p>
<p>If you have still the setup available Can you enable TRACE log
level [1],[2] and see if you could find any log entries when the
network start disconnecting. Basically I'm trying to find out any
disconnection had occurred other than ping timer expire issue.</p>
<p><br>
</p>
<p><br>
</p>
<p>[1] : gluster volume <volname> diagnostics.brick-log-level
TRACE</p>
<p>[2] : gluster volume <volname> diagnostics.client-log-level
TRACE<br>
</p>
<p><br>
</p>
<p>Regards</p>
<p>Rafi KC<br>
</p><div><div class="m_5591780224253036987h5">
<br>
<div class="m_5591780224253036987m_-6466580696529642375moz-cite-prefix">On 12/08/2016 07:59 PM, Atin Mukherjee
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Dec 8, 2016 at 4:37 PM, Micha
Ober <span dir="ltr"><<a href="mailto:micha2k@gmail.com" target="_blank">micha2k@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-cite-prefix">Hi
Rafi,<br>
<br>
thank you for your support. It is greatly appreciated.<br>
<br>
Just some more thoughts from my side:<br>
<br>
There have been no reports from other users in *this*
thread until now, but I have found at least one user
with a very simiar problem in an older thread:<br>
<br>
<a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-freetext" href="https://www.gluster.org/pipermail/gluster-users/2014-November/019637.html" target="_blank">https://www.gluster.org/piperm<wbr>ail/gluster-users/2014-Novembe<wbr>r/019637.html</a><br>
<br>
He is also reporting disconnects with no apparent
reasons, althogh his setup is a bit more complicated,
also involving a firewall. In our setup, all
servers/clients are connected via 1 GbE with no
firewall or anything that might block/throttle
traffic. Also, we are using exactly the same software
versions on all nodes.<br>
<br>
<br>
I can also find some reports in the bugtracker when
searching for "rpc_client_ping_timer_expired<wbr>" and
"rpc_clnt_ping_timer_expired" (looks like spelling
changed during versions).<br>
<br>
<a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1096729" target="_blank">https://bugzilla.redhat.com/sh<wbr>ow_bug.cgi?id=1096729</a></div>
</div>
</blockquote>
<div><br>
</div>
<div>Just FYI, this is a different issue, here GlusterD
fails to handle the volume of incoming requests on time
since MT-epoll is not enabled here.<br>
<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-cite-prefix"><br>
<a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1370683" target="_blank">https://bugzilla.redhat.com/sh<wbr>ow_bug.cgi?id=1370683</a><br>
<br>
But both reports involve large traffic/load on the
bricks/disks, which is not the case for out setup.<br>
To give a ballpark figure: Over three days, 30 GiB
were written. And the data was not written at once,
but continuously over the whole time.<br>
<br>
<br>
Just to be sure, I have checked the logfiles of one of
the other clusters right now, which are sitting in the
same building, in the same rack, even on the same
switch, running the same jobs, but with glusterfs
3.4.2 and I can see no disconnects in the logfiles. So
I can definitely rule out our infrastructure as
problem.<br>
<br>
Regards,<br>
Micha
<div>
<div class="m_5591780224253036987m_-6466580696529642375h5"><br>
<br>
<br>
Am 07.12.2016 um 18:08 schrieb Mohammed Rafi K C:<br>
</div>
</div>
</div>
<div>
<div class="m_5591780224253036987m_-6466580696529642375h5">
<blockquote type="cite">
<p>Hi Micha,</p>
<p>This is great. I will provide you one debug
build which has two fixes which I possible
suspect for a frequent disconnect issue, though
I don't have much data to validate my theory. So
I will take one more day to dig in to that.</p>
<p>Thanks for your support, and opensource++ </p>
<p>Regards</p>
<p>Rafi KC<br>
</p>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-cite-prefix">On
12/07/2016 05:02 AM, Micha Ober wrote:<br>
</div>
<blockquote type="cite">
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-cite-prefix">Hi,<br>
<br>
thank you for your answer and even more for
the question!<br>
Until now, I was using FUSE. Today I changed
all mounts to NFS using the same 3.7.17
version.<br>
<br>
But: The problem is still the same. Now, the
NFS logfile contains lines like these:<br>
<br>
[2016-12-06 15:12:29.006325] C
[rpc-clnt-ping.c:165:rpc_clnt_<wbr>ping_timer_expired]
0-gv0-client-7: server X.X.18.62:49153 has not
responded in the last 42 seconds,
disconnecting.<br>
<br>
Interestingly enough, the IP address
X.X.18.62 is the same machine! As I wrote
earlier, each node serves both as a server and
a client, as each node contributes bricks to
the volume. Every server is connecting to
itself via its hostname. For example, the
fstab on the node "giant2" looks like:<br>
<br>
#giant2:/gv0 /shared_data
glusterfs defaults,noauto 0 0<br>
#giant2:/gv2 /shared_slurm
glusterfs defaults,noauto 0 0<br>
<br>
giant2:/gv0 /shared_data
nfs defaults,_netdev,vers=3
0 0<br>
giant2:/gv2 /shared_slurm
nfs defaults,_netdev,vers=3
0 0<br>
<br>
So I understand the disconnects even less. <br>
<br>
I don't know if it's possible to create a
dummy cluster which exposes the same
behaviour, because the disconnects only happen
when there are compute jobs running on those
nodes - and they are GPU compute jobs, so
that's something which cannot be easily
emulated in a VM.<br>
<br>
As we have more clusters (which are running
fine with an ancient 3.4 version :-)) and we
are currently not dependent on this particular
cluster (which may stay like this for this
month, I think) I should be able to deploy the
debug build on the "real" cluster, if you can
provide a debug build.<br>
<br>
Regards and thanks,<br>
Micha<br>
<br>
<br>
<br>
Am 06.12.2016 um 08:15 schrieb Mohammed Rafi K
C:<br>
</div>
<blockquote type="cite">
<p><br>
</p>
<br>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-cite-prefix">On
12/03/2016 12:56 AM, Micha Ober wrote:<br>
</div>
<blockquote type="cite">
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-cite-prefix"><tt>**
Update: ** I have downgraded from 3.8.6
to 3.7.17 now, but the problem still
exists.</tt><tt><br>
</tt></div>
</blockquote>
<blockquote type="cite">
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-cite-prefix"><tt>
</tt><tt><br>
</tt><tt>Client log: <a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-freetext" href="http://paste.ubuntu.com/23569065/" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-freetext" href="http://paste.ubuntu.com/" target="_blank">http://paste.ubuntu.com/</a>235690<wbr>65/</tt><tt><br>
</tt><tt>Brick log: <a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-freetext" href="http://paste.ubuntu.com/23569067/" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-freetext" href="http://paste.ubuntu.com/" target="_blank">http://paste.ubuntu.com/</a>235690<wbr>67/</tt><tt><br>
</tt><tt><br>
</tt><tt>Please note that each server has
two bricks.</tt><tt><br>
</tt><tt>Whereas, according to the logs,
one brick loses the connection to all
other hosts:</tt><tt><br>
</tt>
<pre style="color:rgb(0,0,0);font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px">[2016-12-02 18:38:53.703301] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.219:49121 failed (Broken pipe)
[2016-12-02 18:38:53.703381] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.62:49118 failed (Broken pipe)
[2016-12-02 18:38:53.703380] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.107:49121 failed (Broken pipe)
[2016-12-02 18:38:53.703424] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.206:49120 failed (Broken pipe)
[2016-12-02 18:38:53.703359] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.58:49121 failed (Broken pipe)
The SECOND brick on the SAME host is NOT affected, i.e. no disconnects!
As I said, the network connection is fine and the disks are idle.
The CPU always has 2 free cores.
It looks like I have to downgrade to 3.4 now in order for the disconnects to stop.</pre>
</div>
</blockquote>
<br>
Hi Micha,<br>
<br>
Thanks for the update and sorry for what
happened with gluster higher versions. I can
understand the need for downgrade as it is a
production setup.<br>
<br>
Can you tell me the clients used here ?
whether it is a fuse,nfs,nfs-ganesha, smb or
libgfapi ?<br>
<br>
Since I'm not able to reproduce the issue (I
have been trying from last 3days) and the logs
are not much helpful here (we don't have much
logs in socket layer), Could you please create
a dummy cluster and try to reproduce the
issue? If then we can play with that volume
and I could provide some debug build which we
can use for further debugging?<br>
<br>
If you don't have bandwidth for this, please
leave it ;).<br>
<br>
Regards<br>
Rafi KC<br>
<br>
<blockquote type="cite">
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-cite-prefix">
<pre style="color:rgb(0,0,0);font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px">- Micha
</pre>
<br>
Am 30.11.2016 um 06:57 schrieb Mohammed
Rafi K C:<br>
</div>
<blockquote type="cite">
<p>Hi Micha,</p>
<p>I have changed the thread and subject
so that your original thread remain same
for your query. Let's try to fix the
problem what you observed with 3.8.4, So
I have started a new thread to discuss
the frequent disconnect problem.</p>
<p><b>If any one else has experienced the
same problem, please respond to the
mail.</b><br>
</p>
<p>It would be very helpful if you could
give us some more logs from clients and
bricks. Also any reproducible steps
will surely help to chase the problem
further.</p>
<p>Regards</p>
<p>Rafi KC<br>
</p>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-cite-prefix">On
11/30/2016 04:44 AM, Micha Ober wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div><font face="monospace,
monospace">I had opened another
thread on this mailing list
(Subject: "After upgrade from
3.4.2 to 3.8.5 - High CPU usage
resulting in disconnects and
split-brain").</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">The title may be a
bit misleading now, as I am no
longer observing high CPU usage
after upgrading to 3.8.6, but
the disconnects are still
happening and the number of
files in split-brain is growing.</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">Setup: 6 compute
nodes, each serving as a
glusterfs server and client,
Ubuntu 14.04, two bricks per
node, distribute-replicate</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">I have two gluster
volumes set up (one for scratch
data, one for the slurm
scheduler). Only the scratch
data volume shows critical
errors "[...] has not responded
in the last 42 seconds,
disconnecting.". So I can rule
out network problems, the
gigabit link between the nodes
is not saturated at all. The
disks are almost idle (<10%).</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">I have glusterfs
3.4.2 on Ubuntu 12.04 on a
another compute cluster, running
fine since it was deployed.</font></div>
<div><font face="monospace,
monospace">I had glusterfs 3.4.2
on Ubuntu 14.04 on this cluster,
running fine for almost a year.</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">After upgrading to
3.8.5, the problems (as
described) started. I would like
to use some of the new features
of the newer versions (like
bitrot), but the users can't run
their compute jobs right now
because the result files are
garbled.</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">There also seems to
be a bug report with a smiliar
problem: (but no progress)</font></div>
<div><font face="monospace,
monospace"><a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1370683" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-freetext" href="https://bugzilla.redhat.com/" target="_blank">https://bugzilla.redhat.com/</a>sh<wbr>ow_bug.cgi?id=1370683</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">For me, ALL servers
are affected (not isolated to
one or two servers)</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">I also see messages
like <a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-rfc2396E">"INFO: task
gpu_graphene_bv:4476 blocked
for more than 120 seconds."</a>
in the syslog.</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">For completeness (gv0
is the scratch volume, gv2 the
slurm volume):</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">[root@giant2: ~]#
gluster v info</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">Volume Name: gv0</font></div>
<div><font face="monospace,
monospace">Type:
Distributed-Replicate</font></div>
<div><font face="monospace,
monospace">Volume ID:
993ec7c9-e4bc-44d0-b7c4-2d977e<wbr>622e86</font></div>
<div><font face="monospace,
monospace">Status: Started</font></div>
<div><font face="monospace,
monospace">Snapshot Count: 0</font></div>
<div><font face="monospace,
monospace">Number of Bricks: 6 x
2 = 12</font></div>
<div><font face="monospace,
monospace">Transport-type: tcp</font></div>
<div><font face="monospace,
monospace">Bricks:</font></div>
<div><font face="monospace,
monospace">Brick1:
giant1:/gluster/sdc/gv0</font></div>
<div><font face="monospace,
monospace">Brick2:
giant2:/gluster/sdc/gv0</font></div>
<div><font face="monospace,
monospace">Brick3:
giant3:/gluster/sdc/gv0</font></div>
<div><font face="monospace,
monospace">Brick4:
giant4:/gluster/sdc/gv0</font></div>
<div><font face="monospace,
monospace">Brick5:
giant5:/gluster/sdc/gv0</font></div>
<div><font face="monospace,
monospace">Brick6:
giant6:/gluster/sdc/gv0</font></div>
<div><font face="monospace,
monospace">Brick7:
giant1:/gluster/sdd/gv0</font></div>
<div><font face="monospace,
monospace">Brick8:
giant2:/gluster/sdd/gv0</font></div>
<div><font face="monospace,
monospace">Brick9:
giant3:/gluster/sdd/gv0</font></div>
<div><font face="monospace,
monospace">Brick10:
giant4:/gluster/sdd/gv0</font></div>
<div><font face="monospace,
monospace">Brick11:
giant5:/gluster/sdd/gv0</font></div>
<div><font face="monospace,
monospace">Brick12:
giant6:/gluster/sdd/gv0</font></div>
<div><font face="monospace,
monospace">Options Reconfigured:</font></div>
<div><font face="monospace,
monospace">auth.allow:
X.X.X.*,127.0.0.1</font></div>
<div><font face="monospace,
monospace">nfs.disable: on</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">Volume Name: gv2</font></div>
<div><font face="monospace,
monospace">Type: Replicate</font></div>
<div><font face="monospace,
monospace">Volume ID:
30c78928-5f2c-4671-becc-8deaee<wbr>1a7a8d</font></div>
<div><font face="monospace,
monospace">Status: Started</font></div>
<div><font face="monospace,
monospace">Snapshot Count: 0</font></div>
<div><font face="monospace,
monospace">Number of Bricks: 1 x
2 = 2</font></div>
<div><font face="monospace,
monospace">Transport-type: tcp</font></div>
<div><font face="monospace,
monospace">Bricks:</font></div>
<div><font face="monospace,
monospace">Brick1:
giant1:/gluster/sdd/gv2</font></div>
<div><font face="monospace,
monospace">Brick2:
giant2:/gluster/sdd/gv2</font></div>
<div><font face="monospace,
monospace">Options Reconfigured:</font></div>
<div><font face="monospace,
monospace">auth.allow:
X.X.X.*,127.0.0.1</font></div>
<div><font face="monospace,
monospace">cluster.granular-entry-heal:
on</font></div>
<div><font face="monospace,
monospace">cluster.locking-scheme:
granular</font></div>
<div><font face="monospace,
monospace">nfs.disable: on</font></div>
<div style="font-family:monospace,monospace"><br>
</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">2016-11-30
0:10 GMT+01:00 Micha Ober <span dir="ltr"><<a href="mailto:micha2k@gmail.com" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com" target="_blank">micha2k@gmail.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div style="font-family:monospace,monospace">There
also seems to be a bug report
with a smiliar problem: (but
no progress)</div>
<div><font face="monospace,
monospace"><a href="https://bugzilla.redhat.com/show_bug.cgi?id=1370683" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-freetext" href="https://bugzilla.redhat.com/sh" target="_blank">https://bugzilla.redhat.com/sh</a><wbr>ow_bug.cgi?id=1370683</font><br>
</div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">For me, ALL
servers are affected (not
isolated to one or two
servers)</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">I also see
messages like <a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-rfc2396E"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-rfc2396E">"INFO:
task gpu_graphene_bv:4476
blocked for more than 120
seconds."</a> in the
syslog.</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">For completeness
(gv0 is the scratch volume,
gv2 the slurm volume):</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">
<div>[root@giant2: ~]#
gluster v info</div>
<div><br>
</div>
<div>Volume Name: gv0</div>
<div>Type:
Distributed-Replicate</div>
<div>Volume ID:
993ec7c9-e4bc-44d0-b7c4-2d977e<wbr>622e86</div>
<div>Status: Started</div>
<div>Snapshot Count: 0</div>
<div>Number of Bricks: 6 x 2
= 12</div>
<div>Transport-type: tcp</div>
<div>Bricks:</div>
<div>Brick1:
giant1:/gluster/sdc/gv0</div>
<div>Brick2:
giant2:/gluster/sdc/gv0</div>
<div>Brick3:
giant3:/gluster/sdc/gv0</div>
<div>Brick4:
giant4:/gluster/sdc/gv0</div>
<div>Brick5:
giant5:/gluster/sdc/gv0</div>
<div>Brick6:
giant6:/gluster/sdc/gv0</div>
<div>Brick7:
giant1:/gluster/sdd/gv0</div>
<div>Brick8:
giant2:/gluster/sdd/gv0</div>
<div>Brick9:
giant3:/gluster/sdd/gv0</div>
<div>Brick10:
giant4:/gluster/sdd/gv0</div>
<div>Brick11:
giant5:/gluster/sdd/gv0</div>
<div>Brick12:
giant6:/gluster/sdd/gv0</div>
<div>Options Reconfigured:</div>
<div>auth.allow:
X.X.X.*,127.0.0.1</div>
<div>nfs.disable: on</div>
<div><br>
</div>
<div>Volume Name: gv2</div>
<div>Type: Replicate</div>
<div>Volume ID:
30c78928-5f2c-4671-becc-8deaee<wbr>1a7a8d</div>
<div>Status: Started</div>
<div>Snapshot Count: 0</div>
<div>Number of Bricks: 1 x 2
= 2</div>
<div>Transport-type: tcp</div>
<div>Bricks:</div>
<div>Brick1:
giant1:/gluster/sdd/gv2</div>
<div>Brick2:
giant2:/gluster/sdd/gv2</div>
<div>Options Reconfigured:</div>
<div>auth.allow:
X.X.X.*,127.0.0.1</div>
<div>cluster.granular-entry-heal:
on</div>
<div>cluster.locking-scheme:
granular</div>
<div>nfs.disable: on</div>
<div><br>
</div>
</font></div>
</div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127HOEnZb">
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127h5">
<div class="gmail_extra"><br>
<div class="gmail_quote">2016-11-29
19:21 GMT+01:00 Micha Ober
<span dir="ltr"><<a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com" target="_blank">micha2k@gmail.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div style="font-family:monospace,monospace">I
had opened another
thread on this
mailing list
(Subject: "After
upgrade from 3.4.2
to 3.8.5 - High CPU
usage resulting in
disconnects and
split-brain").</div>
<div style="font-family:monospace,monospace"><br>
</div>
<div style="font-family:monospace,monospace">The
title may be a bit
misleading now, as I
am no longer
observing high CPU
usage after
upgrading to 3.8.6,
but the disconnects
are still happening
and the number of
files in split-brain
is growing.<br>
</div>
<div style="font-family:monospace,monospace"><br>
</div>
<div style="font-family:monospace,monospace">Setup:
6 compute nodes,
each serving as a
glusterfs server and
client, Ubuntu
14.04, two bricks
per node,
distribute-replicate</div>
<div style="font-family:monospace,monospace"><br>
</div>
<div style="font-family:monospace,monospace">I
have two gluster
volumes set up (one
for scratch data,
one for the slurm
scheduler). Only the
scratch data volume
shows critical
errors "[...] has
not responded in the
last 42 seconds,
disconnecting.". So
I can rule out
network problems,
the gigabit link
between the nodes is
not saturated at
all. The disks are
almost idle
(<10%).</div>
<div style="font-family:monospace,monospace"><br>
</div>
<div style="font-family:monospace,monospace">I
have glusterfs 3.4.2
on Ubuntu 12.04 on a
another compute
cluster, running
fine since it was
deployed.</div>
<div style="font-family:monospace,monospace">I
had glusterfs 3.4.2
on Ubuntu 14.04 on
this cluster,
running fine for
almost a year.</div>
<div style="font-family:monospace,monospace"><br>
</div>
<div style="font-family:monospace,monospace">After
upgrading to 3.8.5,
the problems (as
described) started.
I would like to use
some of the new
features of the
newer versions (like
bitrot), but the
users can't run
their compute jobs
right now because
the result files are
garbled.</div>
</div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071HOEnZb">
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071h5">
<div class="gmail_extra"><br>
<div class="gmail_quote">2016-11-29
18:53 GMT+01:00
Atin Mukherjee <span dir="ltr"><<a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-abbreviated" href="mailto:amukherj@redhat.com" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-abbreviated" href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="white-space:pre-wrap">Would you be able to share what is not working for you in 3.8.x (mention the exact version). 3.4 is quite old and falling back to an unsupported version doesn't look a feasible option.</div>
<br>
<div class="gmail_quote">
<div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209h5">
<div dir="ltr">On
Tue, 29 Nov
2016 at 17:01,
Micha Ober
<<a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com" target="_blank">micha2k@gmail.com</a>> wrote:<br>
</div>
</div>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209h5">
<div dir="ltr" class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">Hi,</div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace"><br class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
</div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">I was using gluster 3.4 and
upgraded to
3.8, but that
version showed
to be unusable
for me. I now
need to
downgrade.</div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace"><br class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
</div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">I'm running Ubuntu 14.04. As
upgrades of
the op version
are irreversible, I guess I have to delete all gluster volumes and
re-create them
with the
downgraded
version. </div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace"><br class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
</div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">0. Backup data</div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">1. Unmount all gluster volumes</div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">2. apt-get purge
glusterfs-server
glusterfs-client</div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">3. Remove PPA for 3.8</div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">4. Add PPA for older version</div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">5. apt-get install
glusterfs-server
glusterfs-client</div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">6. Create volumes</div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace"><br class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
</div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">Is "purge" enough to delete all
configuration
files of the
currently
installed
version or do
I need to
manually
clear some
residues
before
installing an
older version?</div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace"><br class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
</div>
<div class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" style="font-family:monospace,monospace">Thanks.</div>
</div>
</div>
</div>
<span>
______________________________<wbr>_________________<br class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
Gluster-users
mailing list<br class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
<a href="mailto:Gluster-users@gluster.org" class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
<a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg" target="_blank"></a><a class="m_5591780224253036987m_-6466580696529642375moz-txt-link-freetext" href="http://www.gluster.org/mailman" target="_blank">http://www.gluster.org/mailman</a><wbr>/listinfo/gluster-users</span></blockquote>
</div>
<span class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127m_-1578094958703753071m_-2811647508981727209HOEnZb"><font color="#888888">
<div dir="ltr">--
<br>
</div>
<div data-smartmail="gmail_signature">-
Atin (atinm)</div>
</font></span></blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127mimeAttachmentHeader"></fieldset>
<br>
<pre>______________________________<wbr>_________________
Gluster-users mailing list
<a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a class="m_5591780224253036987m_-6466580696529642375m_4766802258719003127moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://www.gluster.org/mailman<wbr>/listinfo/gluster-users</a></pre>
</blockquote>
</blockquote>
<p>
</p>
</blockquote>
</blockquote>
<p>
</p>
</blockquote>
</blockquote>
<p>
</p>
</div></div></div>
</blockquote></div>
--
<div class="m_5591780224253036987m_-6466580696529642375gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr">
</div><div>~ Atin (atinm)
</div></div></div></div>
</div></div>
</blockquote>
</div></div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>