[Gluster-users] Reliably mounting a gluster volume

Fri Oct 21 12:46:05 UTC 2016

Hi everyone,

For the past few days I've been experimenting with Gluster and systemd. 
The issue I'm trying to solve is that my gluster servers always fail to 
self-mount their gluster volume locally on boot. Apparently this is 
because the mount happens right after glusterd has been started, but 
before it is ready to serve the volume.

I'm doing a refresh of our internal gluster based KVM system, bringing 
it to Ubuntu 16.04LTS. As the Ubuntu gluster package as shipped still 
has this boot/mount issue, and to simplify things a bit, I've removed 
all SystemV and Upstart that ships with the current Ubuntu Gluster 
package, aiming for a systemd-only solution. Ubuntu 16.04LTS uses systemd.

The problem, in my opinion, stems from the fact that in the Unit file 
for glusterd, it is declared as a 'forking' kind of service. This means 
that as soon as the double fork happens, systemd has no option but to 
consider the service as available, and continues with the rest of its 
work. I try to delay the mounting of my /gluster by adding 
"x-systemd.requires=glusterd.service" but for the reasons above, that 
still causes the mount to happen immediately after glusterd has started, 
and then the mount fails.

Is there a way for systemd to know when the gluster service is actually 
able to service a mount request, so one can delay this step of the boot 
process?

In the Unit file, I have:
[Unit]
Requires=rpcbind.service
After=network.target rpcbind.service network-online.target

The curious thing is that, according to gluster.log, the gluster client 
does find out on which hostnames the subvolumes are available. However, 
it seems that talking to both the local (0-gv0-client-0) as remote 
(0-gv0-client-1) fails. For the service on localhost, the error is 
'failed to get the port number for remote subvolume'. For the remote 
volume, it is 'no route to host'. But at this stage, local networking 
(which is fully static and on the same network) should already be up.

Some error messages during the mount:

[12:15:50.749137] E [MSGID: 114058] 
[client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-0: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
[12:15:50.749178] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 
0-gv0-client-0: disconnected from gv0-client-0. Client process will keep 
trying to connect to glusterd until brick's port is available
[12:15:53.679570] E [socket.c:2278:socket_connect_finish] 
0-gv0-client-1: connection to 10.0.0.3:24007 failed (No route to host)
[12:15:53.679611] E [MSGID: 108006] [afr-common.c:3880:afr_notify] 
0-gv0-replicate-0: All subvolumes are down. Going offline until atleast 
one of them comes back up.

Once the machine has fully booted and I log in, simply typing 'mount 
/gluster' always succeeds. I would really appreciate your help in making 
this happening on boot without intervention.

Regards, Paul Boven.
-- 
Paul Boven <boven at jive.eu> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.eu
VLBI - It's a fringe science