[Gluster-users] Recovering from Arb/Quorum Write Locks
wk
wkmail at bneit.com
Mon May 29 05:15:12 UTC 2017
On 5/28/2017 9:24 PM, Ravishankar N wrote:
> Just to elaborate further, if all nodes were up to begin with and
> there were zero self-heals pending, and you only brought down only
> gluster2, writes must still be allowed. I guess in your case, there
> must be some pending heals from gluster2 to gluster1 before you
> brought gluster2 down due to a network disconnect from the fuse mount
> to gluster1.
>
OK, I was aggressively writing within and to those VMs all at the same
time pulling cables (power and network). My initial observation was that
the shards healed quickly, but perhaps that I may have gotten too
aggressive didn't wait long enough between tests for the healing to
kick-in and/or finish.
I will retest and pay attention to outstanding heals, both prior and
during the tests.
>> I suppose I could fiddle with the quorum settings as above, but I'd
>> like to be able to PAUSE/FLUSH/FSYNC the Volume before taking down
>> Gluster2, then unpause and let the volume continue with Gluster1 and
>> the ARB providing some sort of protection and to help when Gluster2
>> is returned to the cluster.
>>
>
> I think you should try to find if there were self-heals pending to
> gluster1 before you brought gluster2 down or the VMs should not have
> paused.
yes, I'll start look at heals PRIOR to yanking cables.
OK, can I assume SOME pause is expected when Gluster first sees gluster2
go down which would unpause after a timeout period. I have seen that
behaviour as well.
-bill
More information about the Gluster-users
mailing list