[Gluster-users] Error at boot and can't mount : failed to get the port number

Wed Jul 8 07:50:33 UTC 2015

Hello

I'm trying to find a solution to an error, maybe someone can help.

I'm using Centos 7, and glusterfs 3.6.3. 
I've got 2 nodes on the same network and a volume replicated.

If both nodes are up, the volume is OK, and I can mount it on NFS on each node.

If one node is down, when I reboot the other, the volume can't be mounted.

I've got the error on log  "failed to get the port number for remote subvolume" :

+------------------------------------------------------------------------------+
1: volume data-sync-client-0
2:     type protocol/client
3:     option ping-timeout 42
4:     option remote-host host1
5:     option remote-subvolume /gluster
6:     option transport-type socket
7:     option username fbd26745-afb8-4729-801e-e1a2db8ff38f
8:     option password d077f325-1d03-494d-bfe5-d662ce2d22fe
9:     option send-gids true
10: end-volume
11:
12: volume data-sync-client-1
13:     type protocol/client
14:     option ping-timeout 42
15:     option remote-host host2
16:     option remote-subvolume /gluster
17:     option transport-type socket
18:     option username fbd26745-afb8-4729-801e-e1a2db8ff38f
19:     option password d077f325-1d03-494d-bfe5-d662ce2d22fe
20:     option send-gids true
21: end-volume
22:
23: volume data-sync-replicate-0
24:     type cluster/replicate
25:     subvolumes data-sync-client-0 data-sync-client-1
26: end-volume
27:
28: volume data-sync-dht
29:     type cluster/distribute
30:     subvolumes data-sync-replicate-0
31: end-volume
32:
33: volume data-sync-write-behind
34:     type performance/write-behind
35:     subvolumes data-sync-dht
36: end-volume
37:
38: volume data-sync-read-ahead
39:     type performance/read-ahead
40:     subvolumes data-sync-write-behind
41: end-volume
42:
43: volume data-sync-io-cache
44:     type performance/io-cache
45:     subvolumes data-sync-read-ahead
46: end-volume
47:
48: volume data-sync-quick-read
49:     type performance/quick-read
50:     subvolumes data-sync-io-cache
51: end-volume
52:
53: volume data-sync-open-behind
54:     type performance/open-behind
55:     subvolumes data-sync-quick-read
56: end-volume
57:
58: volume data-sync-md-cache
59:     type performance/md-cache
60:     subvolumes data-sync-open-behind
61: end-volume
62:
63: volume data-sync
64:     type debug/io-stats
65:     option latency-measurement off
66:     option count-fop-hits off
67:     subvolumes data-sync-md-cache
68: end-volume
69:
70: volume meta-autoload
71:     type meta
72:     subvolumes data-sync
73: end-volume
74:
+------------------------------------------------------------------------------+
[2015-07-08 06:06:08.088983] E [client-handshake.c:1496:client_query_portmap_cbk]
0-data-sync-client-1: failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process is running.
[2015-07-08 06:06:08.089034] I [client.c:2215:client_rpc_notify] 0-data-sync-client-1: disconnected
from data-sync-client-1. Client process will keep trying to connect to glusterd until brick's port
is available
[2015-07-08 06:06:10.769962] E [socket.c:2276:socket_connect_finish] 0-data-sync-client-0:
connection to 192.168.1.12:24007 failed (No route to host)
[2015-07-08 06:06:10.769991] E [MSGID: 108006] [afr-common.c:3708:afr_notify]
0-data-sync-replicate-0: All subvolumes are down. Going offline until atleast one of them comes
back up.
[2015-07-08 06:06:10.772310] I [fuse-bridge.c:5080:fuse_graph_setup] 0-fuse: switched to graph 0
[2015-07-08 06:06:10.772430] I [fuse-bridge.c:4009:fuse_init] 0-glusterfs-fuse: FUSE inited with
protocol versions: glusterfs 7.22 kernel 7.22
[2015-07-08 06:06:10.772503] I [afr-common.c:3839:afr_local_init] 0-data-sync-replicate-0: no
subvolumes up
[2015-07-08 06:06:10.772631] I [afr-common.c:3839:afr_local_init] 0-data-sync-replicate-0: no
subvolumes up
[2015-07-08 06:06:10.772653] W [fuse-bridge.c:779:fuse_attr_cbk] 0-glusterfs-fuse: 2: LOOKUP() / =>
-1 (Transport endpoint is not connected)
[2015-07-08 06:06:10.776974] I [afr-common.c:3839:afr_local_init] 0-data-sync-replicate-0: no
subvolumes up
[2015-07-08 06:06:10.777810] I [fuse-bridge.c:4921:fuse_thread_proc] 0-fuse: unmounting /data-sync
[2015-07-08 06:06:10.778007] W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: received signum (15),
shutting down
[2015-07-08 06:06:10.778022] I [fuse-bridge.c:5599:fini] 0-fuse: Unmounting '/data-sync'.

The volume is started but not online.

# gluster volume status
Status of volume: data-sync
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick host2:/gluster                             N/A     N       N/A
NFS Server on localhost                                 N/A     N       N/A
Self-heal Daemon on localhost                           N/A     N       N/A

Task Status of Volume data-sync
------------------------------------------------------------------------------
There are no active volume tasks

To resolve it I need to stop the volume, and start it, and mount.

I can't find how to resolve it to each boot correctly.
I saw on a bug report it's a protection, the volume will stay offline if another node is not online, to avoid stale data. 

Any idea to force it be online at boot ? 

Thanks

Nicolas