[Gluster-users] Rebooting gluster nodes make VMs pause due to storage error
Sahina Bose
sabose at redhat.com
Wed Oct 28 08:05:57 UTC 2015
On 10/27/2015 10:54 PM, Nicolás wrote:
> Hi,
>
> We're using ovirt 3.5.3.1, and as storage backend we use GlusterFS. We
> added a Storage Domain with the path "gluster.fqdn1:/volume", and as
> options, we used "backup-volfile-servers=gluster.fqdn2". We now need
> to restart both gluster.fqdn1 and gluster.fqdn2 machines due to system
> update (not at the same time, obviously). We're worried because in
> previous attempts, when restarted the main gluster node (gluster.fqdn1
> in this case), all the VMs running against that storage backend got
> paused due to storage errors, and we couldn't resume them and finally
> had to power them off the hard way and start them again.
>
> Gluster version on gluster.fqdn1 and gluster.fqdn2 is 3.6.3-1 (on
> CentOS7).
>
> Gluster configuration for that volume is:
>
> Volume Name: volume
> Type: Replicate
> Volume ID: a2d7e52c-2f63-4e72-9635-4e311baae6ff
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: gluster.fqdn1:/gluster/brick_01/brick
> Brick2: gluster.fqdn2:/gluster/brick_01/brick
> Options Reconfigured:
> storage.owner-gid: 36
> storage.owner-uid: 36
> cluster.server-quorum-type: server
> cluster.quorum-type: none
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.stat-prefetch: off
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
>
Supported configuration for gluster storage domain in oVirt is replica 3
(echoing what Nir mentioned in ovirt users)
With "cluster.server-quorum-type: server" and a replica 2 setup,
bringing down one of the nodes will cause bricks on the remaining server
to be shut down too, and will cause VMs to pause.
We strongly advise you to use a replica 3 configuration or an arbiter
volume
(http://gluster.readthedocs.org/en/release-3.7.0/Features/afr-arbiter-volumes/)
If adding an additional server is not an option, you could try this for
present scenario -
1. turn off server quorum
2. Put host to maintenance in oVirt - and bring down gluster processes
on host
3. perform maintenance activity
4. Trigger self-heal and wait for it complete
5. Put second host to maintenance in oVirt and repeat process
> We would like to know if this configuration should work, or if there's
> something missing or some problem with the above specified version, as
> pausing the VMs is a way to make it not fail but is not affordable for
> us. Also, we've noted that the self-healing process takes *a lot* of
> time, the above specified volume is 6T and it might take hours to
> synchronize after a half-hour desynchronization.
The sharding feature available in gluster 3.7 will help with heal times.
Promising results have been reported by other users - minutes as opposed
to hours.
>
> Any hints are appreciated,
>
> Thanks.
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list