[Gluster-users] Rebooting gluster nodes make VMs pause due to storage error
Nicolás
nicolas at devels.es
Tue Oct 27 17:24:21 UTC 2015
Hi,
We're using ovirt 3.5.3.1, and as storage backend we use GlusterFS. We
added a Storage Domain with the path "gluster.fqdn1:/volume", and as
options, we used "backup-volfile-servers=gluster.fqdn2". We now need to
restart both gluster.fqdn1 and gluster.fqdn2 machines due to system
update (not at the same time, obviously). We're worried because in
previous attempts, when restarted the main gluster node (gluster.fqdn1
in this case), all the VMs running against that storage backend got
paused due to storage errors, and we couldn't resume them and finally
had to power them off the hard way and start them again.
Gluster version on gluster.fqdn1 and gluster.fqdn2 is 3.6.3-1 (on CentOS7).
Gluster configuration for that volume is:
Volume Name: volume
Type: Replicate
Volume ID: a2d7e52c-2f63-4e72-9635-4e311baae6ff
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gluster.fqdn1:/gluster/brick_01/brick
Brick2: gluster.fqdn2:/gluster/brick_01/brick
Options Reconfigured:
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
cluster.quorum-type: none
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
We would like to know if this configuration should work, or if there's
something missing or some problem with the above specified version, as
pausing the VMs is a way to make it not fail but is not affordable for
us. Also, we've noted that the self-healing process takes *a lot* of
time, the above specified volume is 6T and it might take hours to
synchronize after a half-hour desynchronization.
Any hints are appreciated,
Thanks.
More information about the Gluster-users
mailing list