[Gluster-users] Recovering from Arb/Quorum Write Locks

Mon May 29 05:15:12 UTC 2017

On 5/28/2017 9:24 PM, Ravishankar N wrote:
> Just to elaborate further, if all nodes were up to begin with and 
> there were zero self-heals pending, and you only brought down only 
> gluster2, writes must still be allowed. I guess in your case, there 
> must be some pending heals from gluster2 to gluster1 before you 
> brought gluster2 down due to a network disconnect from the fuse mount 
> to gluster1.
>

OK, I was aggressively writing within and to those VMs all at the same 
time pulling cables (power and network). My initial observation was that 
the shards healed quickly, but perhaps that I may have gotten too 
aggressive didn't wait long enough between tests for the healing to 
kick-in and/or finish.

I will retest and pay attention to outstanding heals, both prior and 
during the tests.

>> I suppose I could fiddle with the quorum settings as above, but I'd 
>> like to be able to PAUSE/FLUSH/FSYNC the Volume before taking down 
>> Gluster2, then unpause and let the volume continue with Gluster1 and 
>> the ARB providing some sort of protection and to help when Gluster2 
>> is returned to the cluster.
>>
>
> I think you should try to find if there were self-heals pending to 
> gluster1 before you brought gluster2 down or the VMs should not have 
> paused.

yes, I'll start look at heals PRIOR to yanking cables.

OK, can I assume SOME pause is expected when Gluster first sees gluster2 
go down which would unpause after a timeout period. I have seen that 
behaviour as well.

-bill