[Gluster-users] Failover problems with gluster 3.8.8-1 (latest Debian stable)
Dave Sherohman
dave at sherohman.org
Tue Feb 13 13:33:44 UTC 2018
I'm using gluster for a virt-store with 3x2 distributed/replicated
servers for 16 qemu/kvm/libvirt virtual machines using image files
stored in gluster and accessed via libgfapi. Eight of these disk images
are standalone, while the other eight are qcow2 images which all share a
single backing file.
For the most part, this is all working very well. However, one of the
gluster servers (azathoth) causes three of the standalone VMs and all 8
of the shared-backing-image VMs to fail if it goes down. Any of the
other gluster servers can go down with no problems; only azathoth causes
issues.
In addition, the kvm hosts have the gluster volume fuse mounted and one
of them (out of five) detects an error on the gluster volume and puts
the fuse mount into read-only mode if azathoth goes down. libgfapi
connections to the VM images continue to work normally from this host
despite this and the other four kvm hosts are unaffected.
It initially seemed relevant that I have the libgfapi URIs specified as
gluster://azathoth/..., but I've tried changing them to make the initial
connection via other gluster hosts and it had no effect on the problem.
Losing azathoth still took them out.
In addition to changing the mount URI, I've also manually run a heal and
rebalance on the volume, enabled the bitrot daemons (then turned them
back off a week later, since they reported no activity in that time),
and copied one of the standalone images to a new file in case it was a
problem with the file itself. As far as I can tell, none of these
attempts changed anything.
So I'm at a loss. Is this a known type of problem? If so, how do I fix
it? If not, what's the next step to troubleshoot it?
# gluster --version
glusterfs 3.8.8 built on Jan 11 2017 14:07:11
Repository revision: git://git.gluster.com/glusterfs.git
# gluster volume status
Status of volume: palantir
Gluster process TCP Port RDMA Port Online
Pid
------------------------------------------------------------------------------
Brick saruman:/var/local/brick0/data 49154 0 Y
10690
Brick gandalf:/var/local/brick0/data 49155 0 Y
18732
Brick azathoth:/var/local/brick0/data 49155 0 Y
9507
Brick yog-sothoth:/var/local/brick0/data 49153 0 Y
39559
Brick cthulhu:/var/local/brick0/data 49152 0 Y
2682
Brick mordiggian:/var/local/brick0/data 49152 0 Y
39479
Self-heal Daemon on localhost N/A N/A Y
9614
Self-heal Daemon on saruman.lub.lu.se N/A N/A Y
15016
Self-heal Daemon on cthulhu.lub.lu.se N/A N/A Y
9756
Self-heal Daemon on gandalf.lub.lu.se N/A N/A Y
5962
Self-heal Daemon on mordiggian.lub.lu.se N/A N/A Y
8295
Self-heal Daemon on yog-sothoth.lub.lu.se N/A N/A Y
7588
Task Status of Volume palantir
------------------------------------------------------------------------------
Task : Rebalance
ID : c38e11fe-fe1b-464d-b9f5-1398441cc229
Status : completed
--
Dave Sherohman
More information about the Gluster-users
mailing list