<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>OK, on my nagios instance I've disabled gluster status check on
all nodes except on one, I'll check if this is enough.</p>
<p>Thanks,</p>
<p> Paolo<br>
</p>
<br>
<div class="moz-cite-prefix">Il 20/07/2017 13:50, Atin Mukherjee ha
scritto:<br>
</div>
<blockquote type="cite"
cite="mid:CAGNCGH29Z-363jsVJj4dg8bRaQRxscFWwjNikXgO+CLTkQmu_Q@mail.gmail.com">
<div dir="ltr">So from the cmd_history.logs across all the nodes
it's evident that multiple commands on the same volume are run
simultaneously which can result into transactions collision and
you can end up with one command succeeding and others failing.
Ideally if you are running volume status command for monitoring
it's suggested to be run from only one node.<br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Jul 20, 2017 at 3:54 PM, Paolo
Margara <span dir="ltr"><<a
href="mailto:paolo.margara@polito.it" target="_blank"
moz-do-not-send="true">paolo.margara@polito.it</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<p>In attachment the requested logs for all the three
nodes.</p>
<p>thanks,</p>
<p> Paolo<br>
</p>
<div>
<div class="h5"> <br>
<div class="m_6615590896069534251moz-cite-prefix">Il
20/07/2017 11:38, Atin Mukherjee ha scritto:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Please share the cmd_history.log file
from all the storage nodes.<br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Jul 20, 2017 at
2:34 PM, Paolo Margara <span dir="ltr"><<a
href="mailto:paolo.margara@polito.it"
target="_blank" moz-do-not-send="true">paolo.margara@polito.it</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<p>Hi list,</p>
<p>recently I've noted a strange behaviour
of my gluster storage, sometimes while
executing a simple command like "gluster
volume status vm-images-repo" as a
response I got "Another transaction is in
progress for vm-images-repo. Please try
again after sometime.". This situation
does not get solved simply waiting for but
I've to restart glusterd on the node that
hold (and does not release) the lock, this
situation occur randomly after some days.
In the meanwhile, prior and after the
issue appear, everything is working as
expected.</p>
<p>I'm using gluster 3.8.12 on CentOS 7.3,
the only relevant information that I found
on the log file
(etc-glusterfs-glusterd.vol.lo<wbr>g) of
my three nodes are the following:</p>
<p>* node1, at the moment the issue begins:</p>
<p>[2017-07-19 15:07:43.130203] W
[glusterd-locks.c:572:glusterd<wbr>_mgmt_v3_lock]
(-->/usr/lib64/glusterfs/3.8.1<wbr>2/xlator/mgmt/glusterd.so(+0x3<wbr>a00f)
[0x7f373f25f00f]
-->/usr/lib64/glusterfs/3.8.12<wbr>/xlator/mgmt/glusterd.so(+0x2b<wbr>a25)
[0x7f373f250a25]
-->/usr/lib64/glusterfs/3.8.12<wbr>/xlator/mgmt/glusterd.so(+0xd0<wbr>48f)
[0x7f373f2f548f] ) 0-management: Lock for
vm-images-repo held by
2c6f154f-efe3-4479-addc-b2021a<wbr>a9d5df<br>
[2017-07-19 15:07:43.128242] I [MSGID:
106499] [glusterd-handler.c:4349:__glu<wbr>sterd_handle_status_volume]
0-management: Received status volume req
for volume vm-images-repo<br>
[2017-07-19 15:07:43.130244] E [MSGID:
106119] [glusterd-op-sm.c:3782:gluster<wbr>d_op_ac_lock]
0-management: Unable to acquire lock for
vm-images-repo<br>
[2017-07-19 15:07:43.130320] E [MSGID:
106376] [glusterd-op-sm.c:7775:gluster<wbr>d_op_sm]
0-management: handler returned: -1<br>
[2017-07-19 15:07:43.130665] E [MSGID:
106116] [glusterd-mgmt.c:135:gd_mgmt_v<wbr>3_collate_errors]
0-management: Locking failed on
virtnode-0-1-gluster. Please check log
file for details.<br>
[2017-07-19 15:07:43.131293] E [MSGID:
106116] [glusterd-mgmt.c:135:gd_mgmt_v<wbr>3_collate_errors]
0-management: Locking failed on
virtnode-0-2-gluster. Please check log
file for details.<br>
[2017-07-19 15:07:43.131360] E [MSGID:
106151] [glusterd-syncop.c:1884:gd_syn<wbr>c_task_begin]
0-management: Locking Peers Failed.<br>
[2017-07-19 15:07:43.132005] E [MSGID:
106116] [glusterd-mgmt.c:135:gd_mgmt_v<wbr>3_collate_errors]
0-management: Unlocking failed on
virtnode-0-2-gluster. Please check log
file for details.<br>
[2017-07-19 15:07:43.132182] E [MSGID:
106116] [glusterd-mgmt.c:135:gd_mgmt_v<wbr>3_collate_errors]
0-management: Unlocking failed on
virtnode-0-1-gluster. Please check log
file for details.</p>
<p>* node2, at the moment the issue begins:</p>
<p>[2017-07-19 15:07:43.131975] W
[glusterd-locks.c:572:glusterd<wbr>_mgmt_v3_lock]
(-->/usr/lib64/glusterfs/3.8.1<wbr>2/xlator/mgmt/glusterd.so(+0x3<wbr>a00f)
[0x7f17b5b9e00f]
-->/usr/lib64/glusterfs/3.8.12<wbr>/xlator/mgmt/glusterd.so(+0x2b<wbr>a25)
[0x7f17b5b8fa25]
-->/usr/lib64/glusterfs/3.8.12<wbr>/xlator/mgmt/glusterd.so(+0xd0<wbr>48f)
[0x7f17b5c3448f] ) 0-management: Lock for
vm-images-repo held by
d9047ecd-26b5-467b-8e91-50f76a<wbr>0c4d16<br>
[2017-07-19 15:07:43.132019] E [MSGID:
106119] [glusterd-op-sm.c:3782:gluster<wbr>d_op_ac_lock]
0-management: Unable to acquire lock for
vm-images-repo<br>
[2017-07-19 15:07:43.133568] W
[glusterd-locks.c:686:glusterd<wbr>_mgmt_v3_unlock]
(-->/usr/lib64/glusterfs/3.8.1<wbr>2/xlator/mgmt/glusterd.so(+0x3<wbr>a00f)
[0x7f17b5b9e00f]
-->/usr/lib64/glusterfs/3.8.12<wbr>/xlator/mgmt/glusterd.so(+0x2b<wbr>712)
[0x7f17b5b8f712]
-->/usr/lib64/glusterfs/3.8.12<wbr>/xlator/mgmt/glusterd.so(+0xd0<wbr>82a)
[0x7f17b5c3482a] ) 0-management: Lock
owner mismatch. Lock for vol
vm-images-repo held by
d9047ecd-26b5-467b-8e91-50f76a<wbr>0c4d16<br>
[2017-07-19 15:07:43.133597] E [MSGID:
106118] [glusterd-op-sm.c:3845:gluster<wbr>d_op_ac_unlock]
0-management: Unable to release lock for
vm-images-repo<br>
The message "E [MSGID: 106376]
[glusterd-op-sm.c:7775:gluster<wbr>d_op_sm]
0-management: handler returned: -1"
repeated 3 times between [2017-07-19
15:07:42.976193] and [2017-07-19
15:07:43.133646]<br>
</p>
<p>* node3, at the moment the issue begins:</p>
<p>[2017-07-19 15:07:42.976593] I [MSGID:
106499] [glusterd-handler.c:4349:__glu<wbr>sterd_handle_status_volume]
0-management: Received status volume req
for volume vm-images-repo<br>
[2017-07-19 15:07:43.129941] W
[glusterd-locks.c:572:glusterd<wbr>_mgmt_v3_lock]
(-->/usr/lib64/glusterfs/3.8.1<wbr>2/xlator/mgmt/glusterd.so(+0x3<wbr>a00f)
[0x7f6133f5b00f]
-->/usr/lib64/glusterfs/3.8.12<wbr>/xlator/mgmt/glusterd.so(+0x2b<wbr>a25)
[0x7f6133f4ca25]
-->/usr/lib64/glusterfs/3.8.12<wbr>/xlator/mgmt/glusterd.so(+0xd0<wbr>48f)
[0x7f6133ff148f] ) 0-management: Lock for
vm-images-repo held by
d9047ecd-26b5-467b-8e91-50f76a<wbr>0c4d16<br>
[2017-07-19 15:07:43.129981] E [MSGID:
106119] [glusterd-op-sm.c:3782:gluster<wbr>d_op_ac_lock]
0-management: Unable to acquire lock for
vm-images-repo<br>
[2017-07-19 15:07:43.130034] E [MSGID:
106376] [glusterd-op-sm.c:7775:gluster<wbr>d_op_sm]
0-management: handler returned: -1<br>
[2017-07-19 15:07:43.130131] E [MSGID:
106275] [glusterd-rpc-ops.c:876:gluste<wbr>rd_mgmt_v3_lock_peers_cbk_fn]
0-management: Received mgmt_v3 lock RJT
from uuid: 2c6f154f-efe3-4479-addc-b2021a<wbr>a9d5df<br>
[2017-07-19 15:07:43.130710] W
[glusterd-locks.c:686:glusterd<wbr>_mgmt_v3_unlock]
(-->/usr/lib64/glusterfs/3.8.1<wbr>2/xlator/mgmt/glusterd.so(+0x3<wbr>a00f)
[0x7f6133f5b00f]
-->/usr/lib64/glusterfs/3.8.12<wbr>/xlator/mgmt/glusterd.so(+0x2b<wbr>712)
[0x7f6133f4c712]
-->/usr/lib64/glusterfs/3.8.12<wbr>/xlator/mgmt/glusterd.so(+0xd0<wbr>82a)
[0x7f6133ff182a] ) 0-management: Lock
owner mismatch. Lock for vol
vm-images-repo held by
d9047ecd-26b5-467b-8e91-50f76a<wbr>0c4d16<br>
[2017-07-19 15:07:43.130733] E [MSGID:
106118] [glusterd-op-sm.c:3845:gluster<wbr>d_op_ac_unlock]
0-management: Unable to release lock for
vm-images-repo<br>
[2017-07-19 15:07:43.130771] E [MSGID:
106376] [glusterd-op-sm.c:7775:gluster<wbr>d_op_sm]
0-management: handler returned: -1</p>
<p>The thing that is really strange is that
in this case the uuid of node3 is
d9047ecd-26b5-467b-8e91-50f76a<wbr>0c4d16!</p>
<p>The mapping nodename-uuid is:</p>
<p>* (node1) virtnode-0-0-gluster:
2c6f154f-efe3-4479-addc-b2021a<wbr>a9d5df</p>
<p>* (node2) virtnode-0-1-gluster:
e93ebee7-5d95-4100-a9df-4a3e60<wbr>134b73</p>
<p>* (node3) virtnode-0-2-gluster:
d9047ecd-26b5-467b-8e91-50f76a<wbr>0c4d16<br>
<br>
</p>
<p>In this case restarting glusterd on node3
usually solve the issue.</p>
<p>What could be the root cause of this
behavior? How can I fix this <span
id="m_6615590896069534251m_5188137498948038144result_box"
class="m_6615590896069534251m_5188137498948038144short_text" lang="en"><span>once
and for all?</span></span></p>
<p><span
id="m_6615590896069534251m_5188137498948038144result_box"
class="m_6615590896069534251m_5188137498948038144short_text" lang="en"><span>If
needed I could provide the full log
file.<br>
</span></span></p>
<p><br>
</p>
<p>Greetings,</p>
<p> Paolo Margara<br>
</p>
</div>
<br>
______________________________<wbr>_________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org"
target="_blank" moz-do-not-send="true">Gluster-users@gluster.org</a><br>
<a
href="http://lists.gluster.org/mailman/listinfo/gluster-users"
rel="noreferrer" target="_blank"
moz-do-not-send="true">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a><br>
</blockquote>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</body>
</html>