<!DOCTYPE html><html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /></head><body><div data-html-editor-font-wrapper="true" style="font-family: arial, sans-serif; font-size: 13px;"> <signature>Thank you very much indeed, I'll try and add an arbiter node.<br><br>--<br>Best Regards,<br><br>Seva Gluschenko<br>CTO @ <a target="_blank" rel="noopener noreferrer" href="http://webkontrol.ru/">http://webkontrol.ru</a><br>+7 916 172 6 170</signature><br><br><br><br>August 1, 2017 12:29 AM, "WK" <<a target="_blank" tabindex="-1" href="mailto:%22WK%22%20<wkmail@bneit.com>">wkmail@bneit.com</a>> wrote:<br> <blockquote><div><div style="color: #000000;background-color: #FFFFFF"> <div>On 7/31/2017 1:12 AM, Seva Gluschenko wrote:</div> <blockquote type="cite" cite="mid:0c1fc4c47cf391c2b7135ef6d3431883@webkontrol.ru"><div style="font-family: arial, sans-serif;font-size: 13px"><signature>Hi folks,<br><br><br>I'm running a simple gluster setup with a single volume replicated at two servers, as follows:<br><br>Volume Name: gv0<br>Type: Replicate<br>Volume ID: dd4996c0-04e6-4f9b-a04e-73279c4f112b<br>Status: Started<br>Snapshot Count: 0<br>Number of Bricks: 1 x 2 = 2<br>Transport-type: tcp</signature></div></blockquote> <blockquote type="cite" cite="mid:0c1fc4c47cf391c2b7135ef6d3431883@webkontrol.ru"><div style="font-family: arial, sans-serif;font-size: 13px"><signature>The problem is, when it happened that one of replica servers hung, it caused the whole glusterfs to hang. </signature></div></blockquote> <br>Yes, you lost quorum and the system doesn't want you to get a split-brain.<br> <blockquote type="cite" cite="mid:0c1fc4c47cf391c2b7135ef6d3431883@webkontrol.ru"><div style="font-family: arial, sans-serif;font-size: 13px"> <signature>Could you please drop me a hint, is it expected behaviour, or are there any tweaks and server or volume settings that might be altered to change this? Any help would be appreciated much.</signature><br> </div></blockquote> <br>Add a third replica node (or just an arbiter node if you aren't that ambitious or want to save on the kit)<br><br>That way when you lose a node, the cluster it will pause for 40 seconds or so while it figures things out and then continue on.<br>When the missing node returns, the self-heal will kick in and you will be back to 100%.<br><br>Your other alternative is to turn off quorum. But that risks split-brain. Depending upon your data, that may or may not be a serious issue.<br><br>-wk<br><br> </div></div></blockquote> </div></body></html>