<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 14 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri","sans-serif";
        mso-fareast-language:EN-US;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-CA link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Thanks for the reply. What would be the best course of action? The data on the volume isn’t important right now but I’m worried when our setup goes to production we don’t have the same situation and really need to recover our Gluster setup.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>I’m assuming that to redo is to delete everything in the /var/lib/glusterd directory on each of the nodes and recreate the volume again. Essentially starting over. If I leave the mount points the same and keep the data&setup intact will the files still be there and accessible after? (I don’t delete the data on the bricks)<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Regards,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Victor Nomura<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> Atin Mukherjee [mailto:amukherj@redhat.com] <br><b>Sent:</b> June-27-17 12:29 AM<br><b>To:</b> Victor Nomura<br><b>Cc:</b> gluster-users<br><b>Subject:</b> Re: [Gluster-users] Gluster failure due to "0-management: Lock not released for <volumename>"<o:p></o:p></span></p><p class=MsoNormal><o:p> </o:p></p><div><div><p class=MsoNormal style='margin-bottom:12.0pt'>I had looked at the logs shared by Victor privately and it seems to be there is a N/W glitch in the cluster which is causing the glusterd to lose its connection with other peers and as a side effect to this, lot of rpc requests are getting bailed out resulting glusterd to end up into a stale lock and hence you see that some of the commands failed with "another transaction is in progress or locking failed."<o:p></o:p></p></div><div><p class=MsoNormal style='margin-bottom:12.0pt'>Some examples of the symptom highlighted:<br><br>[2017-06-21 23:02:03.826858] E [rpc-clnt.c:200:call_bail] 0-management: bailing out frame type(Peer mgmt) op(--(2)) xid = 0x4 sent = 2017-06-21 22:52:02.719068. timeout = 600 for <a href="http://192.168.150.53:24007">192.168.150.53:24007</a><br>[2017-06-21 23:02:03.826888] E [rpc-clnt.c:200:call_bail] 0-management: bailing out frame type(Peer mgmt) op(--(2)) xid = 0x4 sent = 2017-06-21 22:52:02.716782. timeout = 600 for <a href="http://192.168.150.52:24007">192.168.150.52:24007</a><br>[2017-06-21 23:02:53.836936] E [rpc-clnt.c:200:call_bail] 0-management: bailing out frame type(glusterd mgmt v3) op(--(1)) xid = 0x5 sent = 2017-06-21 22:52:47.909169. timeout = 600 for <a href="http://192.168.150.53:24007">192.168.150.53:24007</a><br>[2017-06-21 23:02:53.836991] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Locking failed on gfsnode3. Please check log file for details.<br>[2017-06-21 23:02:53.837016] E [rpc-clnt.c:200:call_bail] 0-management: bailing out frame type(glusterd mgmt v3) op(--(1)) xid = 0x5 sent = 2017-06-21 22:52:47.909175. timeout = 600 for <a href="http://192.168.150.52:24007">192.168.150.52:24007</a><o:p></o:p></p></div><p class=MsoNormal>I'd like you to request to first look at the N/W layer and rectify the problems.<o:p></o:p></p><div><p class=MsoNormal style='margin-bottom:12.0pt'><br><br><br><o:p></o:p></p></div></div><div><p class=MsoNormal><o:p> </o:p></p><div><p class=MsoNormal>On Thu, Jun 22, 2017 at 9:30 PM, Atin Mukherjee <<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>> wrote:<o:p></o:p></p><div><p class=MsoNormal>Could you attach glusterd.log and cmd_history.log files from all the nodes?<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p><div><div><div><p class=MsoNormal>On Wed, Jun 21, 2017 at 11:40 PM, Victor Nomura <<a href="mailto:victor@mezine.com" target="_blank">victor@mezine.com</a>> wrote:<o:p></o:p></p></div></div><blockquote style='border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm'><div><div><div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Hi All,<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>I’m fairly new to Gluster (3.10.3) and got it going for a couple of months now but suddenly after a power failure in our building it all came crashing down. No client is able to connect after powering back the 3 nodes I have setup.<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Looking at the logs, it looks like there’s some sort of “Lock” placed on the volume which prevents all the clients from connecting to the Gluster endpoint.<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>I can’t even do a #gluster volume status all command IF more than 1 node is powered up. I have to shutdown node2-3 and then I am able to issue the command<span style='color:#1F497D'> on node1</span> to see volume status. When all nodes are powered up and I check the peer status, it says that all peers are connected. Trying to connect to the Gluster volume from all clients says gluster endpoint is not available and times<span style='color:#1F497D'> </span>out. There are no network issues and each node can ping each other and there are no firewalls or any other device between the nodes and clients.<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Please help if you think you know how to fix this. I have a feeling it’s this “lock” that’s not “released” due to the whole setup losing power all of a sudden. I’ve tried restarting all the nodes, restarting glusterfs-server etc. I’m out of ideas.<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Thanks in advance!<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Victor<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Volume Name: teravolume<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Type: Distributed-Replicate<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Volume ID: 85af74d0-f1bc-4b0d-8901-4dea6e4efae5<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Status: Started<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Snapshot Count: 0<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Number of Bricks: 3 x 2 = 6<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Transport-type: tcp<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Bricks:<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Brick1: gfsnode1:/media/brick1<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Brick2: gfsnode2:/media/brick1<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Brick3: gfsnode3:/media/brick1<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Brick4: gfsnode1:/media/brick2<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Brick5: gfsnode2:/media/brick2<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Brick6: gfsnode3:/media/brick2<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Options Reconfigured:<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>nfs.disable: on<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>[2017-06-21 16:02:52.376709] W [MSGID: 106118] [glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock not released for teravolume<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>[2017-06-21 16:03:03.429032] I [MSGID: 106163] [glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 31000<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>[2017-06-21 16:13:13.326478] E [rpc-clnt.c:200:call_bail] 0-management: bailing out frame type(Peer mgmt) op(--(2)) xid = 0x105 sent = 2017-06-21 16:03:03.202284. timeout = 600 for 192.168.150.52:$<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>[2017-06-21 16:13:13.326519] E [rpc-clnt.c:200:call_bail] 0-management: bailing out frame type(Peer mgmt) op(--(2)) xid = 0x105 sent = 2017-06-21 16:03:03.204555. timeout = 600 for 192.168.150.53:$<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>[2017-06-21 16:18:34.456522] I [MSGID: 106004] [glusterd-handler.c:5888:__glusterd_peer_rpc_notify] 0-management: Peer <gfsnode2> (<e1e1caa5-9842-40d8-8492-a82b079879a3>), in state <Peer in Cluste$<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>[2017-06-21 16:18:34.456619] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x1f879) [0x7fee6bc22879] -->/usr/lib/x86_64-l$<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>[2017-06-21 16:18:34.456638] W [MSGID: 106118] [glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock not released for teravolume<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>[2017-06-21 16:18:34.456661] I [MSGID: 106004] [glusterd-handler.c:5888:__glusterd_peer_rpc_notify] 0-management: Peer <gfsnode3> (<59b9effa-2b88-4764-9130-4f31c14c362e>), in state <Peer in Cluste$<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>[2017-06-21 16:18:34.456692] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x1f879) [0x7fee6bc22879] -->/usr/lib/x86_64-l$<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>[2017-06-21 16:18:43.323944] I [MSGID: 106163] [glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 31000<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>[2017-06-21 16:18:34.456699] W [MSGID: 106118] [glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock not released for teravolume<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>[2017-06-21 16:18:45.628552] I [MSGID: 106163] [glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 31000<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>[2017-06-21 16:23:40.607173] I [MSGID: 106499] [glusterd-handler.c:4363:__glusterd_handle_status_volume] 0-management: Received status volume req for volume teravolume<o:p></o:p></p></div></div><p class=MsoNormal><o:p> </o:p></p></div></div><p class=MsoNormal>_______________________________________________<br>Gluster-users mailing list<br><a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br><a href="http://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://lists.gluster.org/mailman/listinfo/gluster-users</a><o:p></o:p></p></blockquote></div><p class=MsoNormal><o:p> </o:p></p></div></div><p class=MsoNormal><o:p> </o:p></p></div></div></body></html>