[Gluster-users] hook script question related to ctdb, shared storage, and bind mounts

Sun Nov 3 20:46:26 UTC 2019

So, I have a solution I have written about in the based that is based on
gluster with CTDB for IP and a level of redundancy.

It's been working fine except for a few quirks I need to work out on
giant clusters when I get access.

I have 3x9 gluster volume, each are also NFS servers, using gluster
NFS (ganesha isn't reliable for my workload yet). There are 9 IP
aliases spread across 9 servers.

I also have many bind mounts that point to the shared storage as a
source, and the /gluster/lock volume ("ctdb") of course.

glusterfs 4.1.6 (rhel8 today, but I use rhel7, rhel8, sles12, and
sles15)

Things work well when everything is up and running. IP failover works
well when one of the servers goes down. My issue is when that server
comes back up. Despite my best efforts with systemd fstab dependencies,
the shared storage areas including the gluster lock for CTDB do not
always get mounted before CTDB starts. This causes trouble for CTDB
correctly joining the collective. I also have problems where my
bind mounts can happen before the shared storage is mounted, despite my
attempts at preventing this with dependencies in fstab.

I decided a better approach would be to use a gluster hook and just
mount everything I need as I need it, and start up ctdb when I know and
verify that /gluster/lock is really gluster and not a local disk.

I started down a road of doing this with a start host hook and after
spending a while at it, I realized my logic error. This will only fire
when the volume is *started*, not when a server that was down re-joins.

I took a look at the code, glusterd-hooks.c, and found that support
for "brick start" is not in place for a hook script but it's nearly
there:

        [GD_OP_START_BRICK]             = EMPTY,
...

and no entry in glusterd_hooks_add_op_args() yet.

Before I make a patch for my own use, I wanted to do a sanity check and
find out if others have solved this better than the road I'm heading
down.

What I was thinking of doing is enabling a brick start hook, and
do my processing for volumes being mounted from there. However, I
suppose brick start is a bad choice for the case of simply stopping and
starting the volume, because my processing would try to complete before
the gluster volume was fully started. It would probably work for a brick
"coming back and joining" but not "stop volume/start volume".

Any suggestions?

My end goal is:
 - mount shared storage every boot
 - only attempt to mount when gluster is available (_netdev doesn't seem
   to be enough)
 - never start ctdb unless /gluster/lock is a shared storage and not a
   directory.
 - only do my bind mounts from shared storage in to the rest of the
   layout when we are sure the shared storage is mounted (don't
   bind-mount using an empty directory as a source by accident!)

Thanks so much for reading my question,

Erik