[Gluster-users] Gluster volume not automounted when peer is down

Joe Julian joe at julianfamily.org
Tue Nov 25 18:03:34 UTC 2014


A much simpler answer is to assign a hostname to multiple IP addresses 
(round robin dns). When gethostbyname() returns multiple entries, the 
client will try them all until it's successful.

On 11/24/2014 06:23 PM, Paul Robert Marino wrote:
> This is simple and can be handled in many ways.
>
> Some background first.
> The mount point is a single IP or host name. The only thing the client 
> uses it for is to download a describing all the bricks in the cluster. 
> The next thing is it opens connections to all the nodes containing 
> bricks for that volume.
>
> So the answer is tell the client to connect to a virtual IP address.
>
> I personally use keepalived for this but you can use any one of the 
> many IPVS Or other tools that manage IPS for this.  I assign the VIP 
> to a primary node then have each node monitor the cluster processes if 
> they die on a node it goes into a faulted state and can not own the VIP.
>
> As long as the client are connecting to a running host in the cluster 
> you are fine even if that host doesn't own bricks in the volume but is 
> aware of them as part of the cluster.
> -- Sent from my HP Pre3
>
> ------------------------------------------------------------------------
> On Nov 24, 2014 8:07 PM, Eric Ewanco <Eric.Ewanco at genband.com> wrote:
>
> Hi all,
>
> We’re trying to use gluster as a replicated volume.  It works OK when 
> both peers are up but when one peer is down and the other reboots, the 
> “surviving” peer does not automount glusterfs.  Furthermore, after the 
> boot sequence is complete, it can be mounted without issue.  It 
> automounts fine when the peer is up during startup.  I tried to google 
> this and while I found some similar issues, I haven’t found any 
> solutions to my problem.  Any insight would be appreciated. Thanks.
>
> gluster volume info output (after startup):
>
> Volume Name: rel-vol
>
> Type: Replicate
>
> Volume ID: 90cbe313-e9f9-42d9-a947-802315ab72b0
>
> Status: Started
>
> Number of Bricks: 1 x 2 = 2
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: 10.250.1.1:/export/brick1
>
> Brick2: 10.250.1.2:/export/brick1
>
> gluster peer status output (after startup):
>
> Number of Peers: 1
>
> Hostname: 10.250.1.2
>
> Uuid: 8d49b929-4660-4b1e-821b-bfcd6291f516
>
> State: Peer in Cluster (Disconnected)
>
> Original volume create command:
>
> gluster volume create rel-vol rep 2 transport tcp 
> 10.250.1.1:/export/brick1 10.250.1.2:/export/brick1
>
> I am running Gluster 3.4.5 on OpenSuSE 12.2.
>
> gluster --version:
>
> glusterfs 3.4.5 built on Jul 25 2014 08:31:19
>
> Repository revision: git://git.gluster.com/glusterfs.git
>
> Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
>
> GlusterFS comes with ABSOLUTELY NO WARRANTY.
>
> You may redistribute copies of GlusterFS under the terms of the GNU 
> General Public License.
>
> The fstab line is:
>
> localhost:/rel-vol /home        glusterfs  defaults,_netdev      0 0
>
> lsof -i :24007-24100:
>
> COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
>
> glusterd  4073 root    6u  IPv4  82170 0t0  TCP s1:24007->s1:1023 
> (ESTABLISHED)
>
> glusterd  4073 root    9u  IPv4  13816 0t0  TCP *:24007 (LISTEN)
>
> glusterd  4073 root   10u  IPv4  88106 0t0  TCP s1:exp2->s2:24007 
> (SYN_SENT)
>
> glusterfs 4097 root    8u  IPv4  16751 0t0  TCP s1:1023->s1:24007 
> (ESTABLISHED)
>
> This is shorter than it is when it works, but maybe that’s because the 
> mount spawns some more processes.
>
> Some ports are down:
>
> root at q50-s1:/root> telnet localhost 24007
>
> Trying ::1...
>
> telnet: connect to address ::1: Connection refused
>
> Trying 127.0.0.1...
>
> Connected to localhost.
>
> Escape character is '^]'.
>
> telnet> close
>
> Connection closed.
>
> root at q50-s1:/root> telnet localhost 24009
>
> Trying ::1...
>
> telnet: connect to address ::1: Connection refused
>
> Trying 127.0.0.1...
>
> telnet: connect to address 127.0.0.1: Connection refused
>
> ps axww | fgrep glu:
>
> 4073 ?        Ssl    0:10 /usr/sbin/glusterd -p /run/glusterd.pid
>
> 4097 ?        Ssl    0:00 /usr/sbin/glusterfsd -s 10.250.1.1 
> --volfile-id rel-vol.10.250.1.1.export-brick1 -p 
> /var/lib/glusterd/vols/rel-vol/run/10.250.1.1-export-brick1.pid -S 
> /var/run/89ba432ed09e07e107723b4b266e18f9.socket --brick-name 
> /export/brick1 -l /var/log/glusterfs/bricks/export-brick1.log 
> --xlator-option 
> *-posix.glusterd-uuid=3b02a581-8fb9-4c6a-8323-9463262f23bc 
> --brick-port 49152 --xlator-option rel-vol-server.listen-port=49152
>
> 5949 ttyS0    S+     0:00 fgrep glu
>
> These are the error messages I see in /var/log/gluster/home.log (/home 
> is the mountpoint):
>
> +------------------------------------------------------------------------------+
>
> [2014-11-24 13:51:27.932285] E 
> [client-handshake.c:1742:client_query_portmap_cbk] 0-rel-vol-client-0: 
> failed to get the port number for remote subvolume. Please run 
> 'gluster volume status' on server to see if brick process is running.
>
> [2014-11-24 13:51:27.932373] W [socket.c:514:__socket_rwv] 
> 0-rel-vol-client-0: readv failed (No data available)
>
> [2014-11-24 13:51:27.932405] I [client.c:2098:client_rpc_notify] 
> 0-rel-vol-client-0: disconnected
>
> [2014-11-24 13:51:30.818281] E [socket.c:2157:socket_connect_finish] 
> 0-rel-vol-client-1: connection to 10.250.1.2:24007 failed (No route to 
> host)
>
> [2014-11-24 13:51:30.818313] E [afr-common.c:3735:afr_notify] 
> 0-rel-vol-replicate-0: All subvolumes are down. Going offline until 
> atleast one of them comes back up.
>
> [2014-11-24 13:51:30.822189] I [fuse-bridge.c:4771:fuse_graph_setup] 
> 0-fuse: switched to graph 0
>
> [2014-11-24 13:51:30.822245] W [socket.c:514:__socket_rwv] 
> 0-rel-vol-client-1: readv failed (No data available)
>
> [2014-11-24 13:51:30.822312] I [fuse-bridge.c:3726:fuse_init] 
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 
> kernel 7.18
>
> [2014-11-24 13:51:30.822562] W [fuse-bridge.c:705:fuse_attr_cbk] 
> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not 
> connected)
>
> [2014-11-24 13:51:30.835120] I [fuse-bridge.c:4630:fuse_thread_proc] 
> 0-fuse: unmounting /home
>
> [2014-11-24 13:51:30.835397] W [glusterfsd.c:1002:cleanup_and_exit] 
> (-->/lib64/libc.so.6(clone+0x6d) [0x7f00f0f682bd] 
> (-->/lib64/libpthread.so.0(+0x7e0e) [0x7f0
>
> 0f1603e0e] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xc5) 
> [0x4075f5]))) 0-: received signum (15), shutting down
>
> [2014-11-24 13:51:30.835416] I [fuse-bridge.c:5262:fini] 0-fuse: 
> Unmounting '/home'.
>
> Relevant section from /var/log/glusterfs/etc-glusterfs-glusterd.vol.log:
>
> [2014-11-24 13:51:27.552371] I [glusterfsd.c:1910:main] 
> 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.4.5 
> (/usr/sbin/glusterd -p /run/glusterd.pid)
>
> [2014-11-24 13:51:27.574553] I [glusterd.c:961:init] 0-management: 
> Using /var/lib/glusterd as working directory
>
> [2014-11-24 13:51:27.577734] I [socket.c:3480:socket_init] 
> 0-socket.management: SSL support is NOT enabled
>
> [2014-11-24 13:51:27.577756] I [socket.c:3495:socket_init] 
> 0-socket.management: using system polling thread
>
> [2014-11-24 13:51:27.577834] E 
> [rpc-transport.c:253:rpc_transport_load] 0-rpc-transport: 
> /usr/lib64/glusterfs/3.4.5/rpc-transport/rdma.so: cannot open shared 
> object file: No such file or directory
>
> [2014-11-24 13:51:27.577849] W 
> [rpc-transport.c:257:rpc_transport_load] 0-rpc-transport: volume 
> 'rdma.management': transport-type 'rdma' is not valid or not found on 
> this machine
>
> [2014-11-24 13:51:27.577858] W [rpcsvc.c:1389:rpcsvc_transport_create] 
> 0-rpc-service: cannot create listener, initing the transport failed
>
> [2014-11-24 13:51:27.578697] I 
> [glusterd.c:354:glusterd_check_gsync_present] 0-glusterd: 
> geo-replication module not installed in the system
>
> [2014-11-24 13:51:27.598907] I 
> [glusterd-store.c:1339:glusterd_restore_op_version] 0-glusterd: 
> retrieved op-version: 2
>
> [2014-11-24 13:51:27.607802] E 
> [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown 
> key: brick-0
>
> [2014-11-24 13:51:27.607837] E 
> [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown 
> key: brick-1
>
> [2014-11-24 13:51:27.809027] I 
> [glusterd-handler.c:2818:glusterd_friend_add] 0-management: connect 
> returned 0
>
> [2014-11-24 13:51:27.809098] I 
> [rpc-clnt.c:962:rpc_clnt_connection_init] 0-management: setting 
> frame-timeout to 600
>
> [2014-11-24 13:51:27.809150] I [socket.c:3480:socket_init] 
> 0-management: SSL support is NOT enabled
>
> [2014-11-24 13:51:27.809162] I [socket.c:3495:socket_init] 
> 0-management: using system polling thread
>
> [2014-11-24 13:51:27.813801] I [glusterd.c:125:glusterd_uuid_init] 
> 0-management: retrieved UUID: 3b02a581-8fb9-4c6a-8323-9463262f23bc
>
> Given volfile:
>
> +------------------------------------------------------------------------------+
>
>   1: volume management
>
>   2:     type mgmt/glusterd
>
>   3:     option working-directory /var/lib/glusterd
>
>   4:     option transport-type socket,rdma
>
>   5:     option transport.socket.keepalive-time 10
>
>   6:     option transport.socket.keepalive-interval 2
>
>   7:     option transport.socket.read-fail-log off
>
>   8: #   option base-port 49152
>
>   9: end-volume
>
> +------------------------------------------------------------------------------+
>
> [2014-11-24 13:51:30.818283] E [socket.c:2157:socket_connect_finish] 
> 0-management: connection to 10.250.1.2:24007 failed (No route to host)
>
> [2014-11-24 13:51:30.820254] I 
> [rpc-clnt.c:962:rpc_clnt_connection_init] 0-management: setting 
> frame-timeout to 600
>
> [2014-11-24 13:51:30.820316] I [socket.c:3480:socket_init] 
> 0-management: SSL support is NOT enabled
>
> [2014-11-24 13:51:30.820327] I [socket.c:3495:socket_init] 
> 0-management: using system polling thread
>
> [2014-11-24 13:51:30.820378] W [socket.c:514:__socket_rwv] 
> 0-management: readv failed (No data available)
>
> [2014-11-24 13:51:30.821243] I 
> [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: 
> Found brick
>
> [2014-11-24 13:51:30.821268] I [socket.c:2236:socket_event_handler] 
> 0-transport: disconnecting now
>
> [2014-11-24 13:51:30.822036] I 
> [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: 
> Found brick
>
> [2014-11-24 13:51:30.863454] I 
> [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick 
> /export/brick1 on port 49152
>
> [2014-11-24 13:51:33.824274] W [socket.c:514:__socket_rwv] 
> 0-management: readv failed (No data available)
>
> [2014-11-24 13:51:34.817560] I 
> [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: 
> Found brick
>
> [2014-11-24 13:51:39.824281] W [socket.c:514:__socket_rwv] 
> 0-management: readv failed (No data available)
>
> [2014-11-24 13:51:42.830260] W [socket.c:514:__socket_rwv] 
> 0-management: readv failed (No data available)
>
> [2014-11-24 13:51:48.832276] W [socket.c:514:__socket_rwv] 
> 0-management: readv failed (No data available)
>
> [ad nauseam...]
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141125/baff11c2/attachment.html>


More information about the Gluster-users mailing list