[Gluster-users] Gluster volume not automounted when peer is down
Joe Julian
joe at julianfamily.org
Tue Nov 25 18:03:34 UTC 2014
A much simpler answer is to assign a hostname to multiple IP addresses
(round robin dns). When gethostbyname() returns multiple entries, the
client will try them all until it's successful.
On 11/24/2014 06:23 PM, Paul Robert Marino wrote:
> This is simple and can be handled in many ways.
>
> Some background first.
> The mount point is a single IP or host name. The only thing the client
> uses it for is to download a describing all the bricks in the cluster.
> The next thing is it opens connections to all the nodes containing
> bricks for that volume.
>
> So the answer is tell the client to connect to a virtual IP address.
>
> I personally use keepalived for this but you can use any one of the
> many IPVS Or other tools that manage IPS for this. I assign the VIP
> to a primary node then have each node monitor the cluster processes if
> they die on a node it goes into a faulted state and can not own the VIP.
>
> As long as the client are connecting to a running host in the cluster
> you are fine even if that host doesn't own bricks in the volume but is
> aware of them as part of the cluster.
> -- Sent from my HP Pre3
>
> ------------------------------------------------------------------------
> On Nov 24, 2014 8:07 PM, Eric Ewanco <Eric.Ewanco at genband.com> wrote:
>
> Hi all,
>
> We’re trying to use gluster as a replicated volume. It works OK when
> both peers are up but when one peer is down and the other reboots, the
> “surviving” peer does not automount glusterfs. Furthermore, after the
> boot sequence is complete, it can be mounted without issue. It
> automounts fine when the peer is up during startup. I tried to google
> this and while I found some similar issues, I haven’t found any
> solutions to my problem. Any insight would be appreciated. Thanks.
>
> gluster volume info output (after startup):
>
> Volume Name: rel-vol
>
> Type: Replicate
>
> Volume ID: 90cbe313-e9f9-42d9-a947-802315ab72b0
>
> Status: Started
>
> Number of Bricks: 1 x 2 = 2
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: 10.250.1.1:/export/brick1
>
> Brick2: 10.250.1.2:/export/brick1
>
> gluster peer status output (after startup):
>
> Number of Peers: 1
>
> Hostname: 10.250.1.2
>
> Uuid: 8d49b929-4660-4b1e-821b-bfcd6291f516
>
> State: Peer in Cluster (Disconnected)
>
> Original volume create command:
>
> gluster volume create rel-vol rep 2 transport tcp
> 10.250.1.1:/export/brick1 10.250.1.2:/export/brick1
>
> I am running Gluster 3.4.5 on OpenSuSE 12.2.
>
> gluster --version:
>
> glusterfs 3.4.5 built on Jul 25 2014 08:31:19
>
> Repository revision: git://git.gluster.com/glusterfs.git
>
> Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
>
> GlusterFS comes with ABSOLUTELY NO WARRANTY.
>
> You may redistribute copies of GlusterFS under the terms of the GNU
> General Public License.
>
> The fstab line is:
>
> localhost:/rel-vol /home glusterfs defaults,_netdev 0 0
>
> lsof -i :24007-24100:
>
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
>
> glusterd 4073 root 6u IPv4 82170 0t0 TCP s1:24007->s1:1023
> (ESTABLISHED)
>
> glusterd 4073 root 9u IPv4 13816 0t0 TCP *:24007 (LISTEN)
>
> glusterd 4073 root 10u IPv4 88106 0t0 TCP s1:exp2->s2:24007
> (SYN_SENT)
>
> glusterfs 4097 root 8u IPv4 16751 0t0 TCP s1:1023->s1:24007
> (ESTABLISHED)
>
> This is shorter than it is when it works, but maybe that’s because the
> mount spawns some more processes.
>
> Some ports are down:
>
> root at q50-s1:/root> telnet localhost 24007
>
> Trying ::1...
>
> telnet: connect to address ::1: Connection refused
>
> Trying 127.0.0.1...
>
> Connected to localhost.
>
> Escape character is '^]'.
>
> telnet> close
>
> Connection closed.
>
> root at q50-s1:/root> telnet localhost 24009
>
> Trying ::1...
>
> telnet: connect to address ::1: Connection refused
>
> Trying 127.0.0.1...
>
> telnet: connect to address 127.0.0.1: Connection refused
>
> ps axww | fgrep glu:
>
> 4073 ? Ssl 0:10 /usr/sbin/glusterd -p /run/glusterd.pid
>
> 4097 ? Ssl 0:00 /usr/sbin/glusterfsd -s 10.250.1.1
> --volfile-id rel-vol.10.250.1.1.export-brick1 -p
> /var/lib/glusterd/vols/rel-vol/run/10.250.1.1-export-brick1.pid -S
> /var/run/89ba432ed09e07e107723b4b266e18f9.socket --brick-name
> /export/brick1 -l /var/log/glusterfs/bricks/export-brick1.log
> --xlator-option
> *-posix.glusterd-uuid=3b02a581-8fb9-4c6a-8323-9463262f23bc
> --brick-port 49152 --xlator-option rel-vol-server.listen-port=49152
>
> 5949 ttyS0 S+ 0:00 fgrep glu
>
> These are the error messages I see in /var/log/gluster/home.log (/home
> is the mountpoint):
>
> +------------------------------------------------------------------------------+
>
> [2014-11-24 13:51:27.932285] E
> [client-handshake.c:1742:client_query_portmap_cbk] 0-rel-vol-client-0:
> failed to get the port number for remote subvolume. Please run
> 'gluster volume status' on server to see if brick process is running.
>
> [2014-11-24 13:51:27.932373] W [socket.c:514:__socket_rwv]
> 0-rel-vol-client-0: readv failed (No data available)
>
> [2014-11-24 13:51:27.932405] I [client.c:2098:client_rpc_notify]
> 0-rel-vol-client-0: disconnected
>
> [2014-11-24 13:51:30.818281] E [socket.c:2157:socket_connect_finish]
> 0-rel-vol-client-1: connection to 10.250.1.2:24007 failed (No route to
> host)
>
> [2014-11-24 13:51:30.818313] E [afr-common.c:3735:afr_notify]
> 0-rel-vol-replicate-0: All subvolumes are down. Going offline until
> atleast one of them comes back up.
>
> [2014-11-24 13:51:30.822189] I [fuse-bridge.c:4771:fuse_graph_setup]
> 0-fuse: switched to graph 0
>
> [2014-11-24 13:51:30.822245] W [socket.c:514:__socket_rwv]
> 0-rel-vol-client-1: readv failed (No data available)
>
> [2014-11-24 13:51:30.822312] I [fuse-bridge.c:3726:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13
> kernel 7.18
>
> [2014-11-24 13:51:30.822562] W [fuse-bridge.c:705:fuse_attr_cbk]
> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not
> connected)
>
> [2014-11-24 13:51:30.835120] I [fuse-bridge.c:4630:fuse_thread_proc]
> 0-fuse: unmounting /home
>
> [2014-11-24 13:51:30.835397] W [glusterfsd.c:1002:cleanup_and_exit]
> (-->/lib64/libc.so.6(clone+0x6d) [0x7f00f0f682bd]
> (-->/lib64/libpthread.so.0(+0x7e0e) [0x7f0
>
> 0f1603e0e] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xc5)
> [0x4075f5]))) 0-: received signum (15), shutting down
>
> [2014-11-24 13:51:30.835416] I [fuse-bridge.c:5262:fini] 0-fuse:
> Unmounting '/home'.
>
> Relevant section from /var/log/glusterfs/etc-glusterfs-glusterd.vol.log:
>
> [2014-11-24 13:51:27.552371] I [glusterfsd.c:1910:main]
> 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.4.5
> (/usr/sbin/glusterd -p /run/glusterd.pid)
>
> [2014-11-24 13:51:27.574553] I [glusterd.c:961:init] 0-management:
> Using /var/lib/glusterd as working directory
>
> [2014-11-24 13:51:27.577734] I [socket.c:3480:socket_init]
> 0-socket.management: SSL support is NOT enabled
>
> [2014-11-24 13:51:27.577756] I [socket.c:3495:socket_init]
> 0-socket.management: using system polling thread
>
> [2014-11-24 13:51:27.577834] E
> [rpc-transport.c:253:rpc_transport_load] 0-rpc-transport:
> /usr/lib64/glusterfs/3.4.5/rpc-transport/rdma.so: cannot open shared
> object file: No such file or directory
>
> [2014-11-24 13:51:27.577849] W
> [rpc-transport.c:257:rpc_transport_load] 0-rpc-transport: volume
> 'rdma.management': transport-type 'rdma' is not valid or not found on
> this machine
>
> [2014-11-24 13:51:27.577858] W [rpcsvc.c:1389:rpcsvc_transport_create]
> 0-rpc-service: cannot create listener, initing the transport failed
>
> [2014-11-24 13:51:27.578697] I
> [glusterd.c:354:glusterd_check_gsync_present] 0-glusterd:
> geo-replication module not installed in the system
>
> [2014-11-24 13:51:27.598907] I
> [glusterd-store.c:1339:glusterd_restore_op_version] 0-glusterd:
> retrieved op-version: 2
>
> [2014-11-24 13:51:27.607802] E
> [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown
> key: brick-0
>
> [2014-11-24 13:51:27.607837] E
> [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown
> key: brick-1
>
> [2014-11-24 13:51:27.809027] I
> [glusterd-handler.c:2818:glusterd_friend_add] 0-management: connect
> returned 0
>
> [2014-11-24 13:51:27.809098] I
> [rpc-clnt.c:962:rpc_clnt_connection_init] 0-management: setting
> frame-timeout to 600
>
> [2014-11-24 13:51:27.809150] I [socket.c:3480:socket_init]
> 0-management: SSL support is NOT enabled
>
> [2014-11-24 13:51:27.809162] I [socket.c:3495:socket_init]
> 0-management: using system polling thread
>
> [2014-11-24 13:51:27.813801] I [glusterd.c:125:glusterd_uuid_init]
> 0-management: retrieved UUID: 3b02a581-8fb9-4c6a-8323-9463262f23bc
>
> Given volfile:
>
> +------------------------------------------------------------------------------+
>
> 1: volume management
>
> 2: type mgmt/glusterd
>
> 3: option working-directory /var/lib/glusterd
>
> 4: option transport-type socket,rdma
>
> 5: option transport.socket.keepalive-time 10
>
> 6: option transport.socket.keepalive-interval 2
>
> 7: option transport.socket.read-fail-log off
>
> 8: # option base-port 49152
>
> 9: end-volume
>
> +------------------------------------------------------------------------------+
>
> [2014-11-24 13:51:30.818283] E [socket.c:2157:socket_connect_finish]
> 0-management: connection to 10.250.1.2:24007 failed (No route to host)
>
> [2014-11-24 13:51:30.820254] I
> [rpc-clnt.c:962:rpc_clnt_connection_init] 0-management: setting
> frame-timeout to 600
>
> [2014-11-24 13:51:30.820316] I [socket.c:3480:socket_init]
> 0-management: SSL support is NOT enabled
>
> [2014-11-24 13:51:30.820327] I [socket.c:3495:socket_init]
> 0-management: using system polling thread
>
> [2014-11-24 13:51:30.820378] W [socket.c:514:__socket_rwv]
> 0-management: readv failed (No data available)
>
> [2014-11-24 13:51:30.821243] I
> [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management:
> Found brick
>
> [2014-11-24 13:51:30.821268] I [socket.c:2236:socket_event_handler]
> 0-transport: disconnecting now
>
> [2014-11-24 13:51:30.822036] I
> [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management:
> Found brick
>
> [2014-11-24 13:51:30.863454] I
> [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick
> /export/brick1 on port 49152
>
> [2014-11-24 13:51:33.824274] W [socket.c:514:__socket_rwv]
> 0-management: readv failed (No data available)
>
> [2014-11-24 13:51:34.817560] I
> [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management:
> Found brick
>
> [2014-11-24 13:51:39.824281] W [socket.c:514:__socket_rwv]
> 0-management: readv failed (No data available)
>
> [2014-11-24 13:51:42.830260] W [socket.c:514:__socket_rwv]
> 0-management: readv failed (No data available)
>
> [2014-11-24 13:51:48.832276] W [socket.c:514:__socket_rwv]
> 0-management: readv failed (No data available)
>
> [ad nauseam...]
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141125/baff11c2/attachment.html>
More information about the Gluster-users
mailing list