[Gluster-users] Failover problems with gluster 3.8.8-1 (latest Debian stable)

Dave Sherohman dave at sherohman.org
Tue Feb 13 13:33:44 UTC 2018

I'm using gluster for a virt-store with 3x2 distributed/replicated
servers for 16 qemu/kvm/libvirt virtual machines using image files
stored in gluster and accessed via libgfapi.  Eight of these disk images
are standalone, while the other eight are qcow2 images which all share a
single backing file.

For the most part, this is all working very well.  However, one of the
gluster servers (azathoth) causes three of the standalone VMs and all 8
of the shared-backing-image VMs to fail if it goes down.  Any of the
other gluster servers can go down with no problems; only azathoth causes

In addition, the kvm hosts have the gluster volume fuse mounted and one
of them (out of five) detects an error on the gluster volume and puts
the fuse mount into read-only mode if azathoth goes down.  libgfapi
connections to the VM images continue to work normally from this host
despite this and the other four kvm hosts are unaffected.

It initially seemed relevant that I have the libgfapi URIs specified as
gluster://azathoth/..., but I've tried changing them to make the initial
connection via other gluster hosts and it had no effect on the problem.
Losing azathoth still took them out.

In addition to changing the mount URI, I've also manually run a heal and
rebalance on the volume, enabled the bitrot daemons (then turned them
back off a week later, since they reported no activity in that time),
and copied one of the standalone images to a new file in case it was a
problem with the file itself.  As far as I can tell, none of these
attempts changed anything.

So I'm at a loss.  Is this a known type of problem?  If so, how do I fix
it?  If not, what's the next step to troubleshoot it?

# gluster --version
glusterfs 3.8.8 built on Jan 11 2017 14:07:11
Repository revision: git://git.gluster.com/glusterfs.git

# gluster volume status
Status of volume: palantir
Gluster process                             TCP Port  RDMA Port  Online
Brick saruman:/var/local/brick0/data        49154     0          Y
Brick gandalf:/var/local/brick0/data        49155     0          Y
Brick azathoth:/var/local/brick0/data       49155     0          Y
Brick yog-sothoth:/var/local/brick0/data    49153     0          Y
Brick cthulhu:/var/local/brick0/data        49152     0          Y
Brick mordiggian:/var/local/brick0/data     49152     0          Y
Self-heal Daemon on localhost               N/A       N/A        Y
Self-heal Daemon on saruman.lub.lu.se       N/A       N/A        Y
Self-heal Daemon on cthulhu.lub.lu.se       N/A       N/A        Y
Self-heal Daemon on gandalf.lub.lu.se       N/A       N/A        Y
Self-heal Daemon on mordiggian.lub.lu.se    N/A       N/A        Y
Self-heal Daemon on yog-sothoth.lub.lu.se   N/A       N/A        Y
Task Status of Volume palantir
Task                 : Rebalance           
ID                   : c38e11fe-fe1b-464d-b9f5-1398441cc229
Status               : completed           

Dave Sherohman

More information about the Gluster-users mailing list