<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hello, I have a replica 3 volume that has lost quorum twice this week causing us much pain. What seems to happen is one of the sans thinks one of the other two peers has disconnected. Then a few seconds later another disconnects causing quorum to be lost.
This causes us pain since we have 7 ovirt host that are connected to this gluster volume and they never seem to reattach. I was able to unmount the brick manually on the ovirt host and then run the commands to mount them again and that seemed to get things
working again.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
We have 3 sans running glusterfs 3.12.14-1 and nothing else. </div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span># gluster volume info gv1<br>
</span>
<div><br>
</div>
<div>Volume Name: gv1<br>
</div>
<div>Type: Replicate<br>
</div>
<div>Volume ID: ea12f72d-a228-43ba-a360-4477cada292a<br>
</div>
<div>Status: Started<br>
</div>
<div>Snapshot Count: 0<br>
</div>
<div>Number of Bricks: 1 x 3 = 3<br>
</div>
<div>Transport-type: tcp<br>
</div>
<div>Bricks:<br>
</div>
<div>Brick1: 10.4.16.19:/glusterfs/data1/gv1<br>
</div>
<div>Brick2: 10.4.16.11:/glusterfs/data1/gv1<br>
</div>
<div>Brick3: 10.4.16.12:/glusterfs/data1/gv1<br>
</div>
<div>Options Reconfigured:<br>
</div>
<div>nfs.register-with-portmap: on<br>
</div>
<div>diagnostics.count-fop-hits: on<br>
</div>
<div>diagnostics.latency-measurement: on<br>
</div>
<div>cluster.self-heal-daemon: enable<br>
</div>
<div>cluster.server-quorum-type: server<br>
</div>
<div>cluster.quorum-type: auto<br>
</div>
<div>network.remote-dio: enable<br>
</div>
<div>cluster.eager-lock: enable<br>
</div>
<div>performance.stat-prefetch: off<br>
</div>
<div>performance.io-cache: off<br>
</div>
<div>performance.read-ahead: off<br>
</div>
<div>performance.quick-read: off<br>
</div>
<div>auth.allow: 10.4.16.*<br>
</div>
<div>nfs.rpc-auth-allow: 10.4.16.*<br>
</div>
<div>nfs.disable: off<br>
</div>
<div>server.allow-insecure: on<br>
</div>
<div>storage.owner-gid: 36<br>
</div>
<div>storage.owner-uid: 36<br>
</div>
<div>nfs.addr-namelookup: off<br>
</div>
<div>nfs.export-volumes: on<br>
</div>
<div>network.ping-timeout: 50<br>
</div>
<span>cluster.server-quorum-ratio: 51%</span><br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span><br>
</span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span><span style="font-family: Calibri, Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255); display: inline !important">They produced the following logs this morning. and the first entry is the first entry for 2019-06-07. </span><br>
</span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span><span style="font-family: Calibri, Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255); display: inline !important"><br>
</span></span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
san3 seems to have an issue first:</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span>[2019-06-07 14:23:20.670561] I [MSGID: 106004] [glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer <10.4.16.12> (<dfe01058-5bea-4b67-8859-382a2c8854f4>), in state <Peer in Cluster>, has disconnected from glusterd.<br>
</span>
<div><br>
</div>
<div>[2019-06-07 14:23:20.774127] I [MSGID: 106004] [glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer <10.4.16.11> (<0f3090ee-080b-4a6b-9964-0ca86d801469>), in state <Peer in Cluster>, has disconnected from glusterd.<br>
</div>
<div><br>
</div>
<span>[2019-06-07 14:23:20.774413] C [MSGID: 106002] [glusterd-server-quorum.c:360:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume gv1. Stopping local bricks.</span><br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span><br>
</span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span>san1 follows:</span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span><span>[2019-06-07 14:23:22.137405] I [MSGID: 106004] [glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer <10.4.16.12> (<dfe01058-5bea-4b67-8859-382a2c8854f4>), in state <Peer in Cluster>, has disconnected from glusterd.<br>
</span>
<div><br>
</div>
<div>[2019-06-07 14:23:22.229343] I [MSGID: 106004] [glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer <10.4.16.19> (<238af98a-d2f1-491d-a1f1-64ace4eb6d3d>), in state <Peer in Cluster>, has disconnected from glusterd.<br>
</div>
<div><br>
</div>
<span>[2019-06-07 14:23:22.229618] C [MSGID: 106002] [glusterd-server-quorum.c:360:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume gv1. Stopping local bricks.</span><br>
</span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span><span><br>
</span></span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span><span>san2 seems to be the last one standing but quorum gets lost:</span></span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span><span><span>[2019-06-07 14:23:26.611435] I [MSGID: 106004] [glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer <10.4.16.11> (<0f3090ee-080b-4a6b-9964-0ca86d801469>), in state <Peer in Cluster>, has disconnected from glusterd.<br>
</span>
<div><br>
</div>
<div>[2019-06-07 14:23:26.714137] I [MSGID: 106004] [glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer <10.4.16.19> (<238af98a-d2f1-491d-a1f1-64ace4eb6d3d>), in state <Peer in Cluster>, has disconnected from glusterd.<br>
</div>
<div><br>
</div>
<span>[2019-06-07 14:23:26.714405] C [MSGID: 106002] [glusterd-server-quorum.c:360:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume gv1. Stopping local bricks.</span><br>
</span></span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span><span><span><br>
</span></span></span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span><span><span>On the ovirt host I see the following type of entries for the gluster brick that's mounted /var/log/glusterfs/rhev-data-center-mnt-glusterSD-10.4.16.11:gv1.log. They are all pretty much the same entries on all 7 host.</span></span></span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span><span><span><br>
</span></span></span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span><span><span><span>hv6 seems to be the first host to complain:<br>
</span>
<div>[2019-06-07 14:23:22.190493] I [glusterfsd-mgmt.c:2424:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: 10.4.16.11<br>
</div>
<div>[2019-06-07 14:23:22.190540] I [glusterfsd-mgmt.c:2464:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting to next volfile server 10.4.16.19<br>
</div>
<div>[2019-06-07 14:23:32.618071] I [glusterfsd-mgmt.c:2005:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing<br>
</div>
<div>[2019-06-07 14:23:33.651755] W [socket.c:719:__socket_rwv] 0-gv1-client-4: readv on 10.4.16.12:49152 failed (No data available)<br>
</div>
<div>[2019-06-07 14:23:33.651806] I [MSGID: 114018] [client.c:2288:client_rpc_notify] 0-gv1-client-4: disconnected from gv1-client-4. Client process will keep trying to connect to glusterd until brick's port is available<br>
</div>
<span></span><br>
</span></span></span></div>
<div style=""><font color="#000000" face="Calibri, Arial, Helvetica, sans-serif"><span style="font-size: 12pt;">One thing I should point out here that is probably important. We are running glusterfs 3.12.14-1 on the sans but ovirt host have been upgraded to
5.6-1. We stopped updating the sans gluster version after the previous version had a memory leak causing the sans to go down randomly. Version 3.12.14-1 has seemed to stop this from happening. What I'm not finding is there a
</span>incompatibility<span style="font-size: 12pt;"> between these versions that could cause this?</span></font></div>
<div style=""><font color="#000000" face="Calibri, Arial, Helvetica, sans-serif"><span style="font-size: 12pt;"><br>
</span></font></div>
<div style=""><font color="#000000" face="Calibri, Arial, Helvetica, sans-serif"><span style="font-size: 12pt;">Are there any other steps I can take or logs I can collect to better identify what's causing this to happen?</span></font></div>
<div></div>
<p><span style="font-size:10.5pt;font-family:"Arial","sans-serif";color:black">Edward Clay</span>
<br>
<span style="font-size:8.5pt;font-family:"Arial","sans-serif";color:black">Systems Administrator</span><br>
<span style="font-size:8.5pt;font-family:"Arial","sans-serif";color:#019EEB"><a href="http://www.thehutgroup.com/" target="_blank"><span style="color:#575a5d;
text-decoration:none;text-underline:none">The Hut Group</span></a></span>
<br>
<br>
<span style="font-size:8.5pt;font-family:"Arial","sans-serif";color:black">Tel: </span>
<br>
<span style="font-size:8.5pt;font-family:"Arial","sans-serif";color:black">Email:
<a href="mailto:edward.clay@uk2group.com">edward.clay@uk2group.com</a></span></p>
<p style="margin-bottom:12.0pt"><br>
<span style="font-size:7.5pt;font-family:"Arial","sans-serif";color:#666666">For the purposes of this email, the "company" means The Hut Group Limited, a company registered in England and Wales (company number 6539496) whose registered office is at Fifth Floor,
Voyager House, Chicago Avenue, Manchester Airport, M90 3DQ and/or any of its respective subsidiaries.</span>
<br>
<br>
<b><span style="font-size:7.5pt;font-family:"Arial","sans-serif";color:#666666">Confidentiality Notice</span></b>
<br>
<span style="font-size:7.5pt;font-family:"Arial","sans-serif";color:#666666">This e-mail is confidential and intended for the use of the named recipient only. If you are not the intended recipient please notify us by telephone immediately on +44(0)1606 811888
or return it to us by e-mail. Please then delete it from your system and note that any use, dissemination, forwarding, printing or copying is strictly prohibited. Any views or opinions are solely those of the author and do not necessarily represent those of
the company.</span> <br>
<br>
<b><span style="font-size:7.5pt;font-family:"Arial","sans-serif";color:#666666">Encryptions and Viruses</span></b>
<br>
<span style="font-size:7.5pt;font-family:"Arial","sans-serif";color:#666666">Please note that this e-mail and any attachments have not been encrypted. They may therefore be liable to be compromised. Please also note that it is your responsibility to scan this
e-mail and any attachments for viruses. We do not, to the extent permitted by law, accept any liability (whether in contract, negligence or otherwise) for any virus infection and/or external compromise of security and/or confidentiality in relation to transmissions
sent by e-mail.</span> <br>
<br>
<b><span style="font-size:7.5pt;font-family:"Arial","sans-serif";color:#666666">Monitoring</span></b>
<br>
<span style="font-size:7.5pt;font-family:"Arial","sans-serif";color:#666666">Activity and use of the company's systems is monitored to secure its effective use and operation and for other lawful business purposes. Communications using these systems will also
be monitored and may be recorded to secure effective use and operation and for other lawful business purposes.</span>
</p>
<span style="font-size:4pt;color:#FFFFFF">hgvyjuv</span>
<div></div>
</body>
</html>