[Gluster-users] One node goes offline, the other node can't see the replicated volume anymore

Greg Scott GregScott at infrasupport.com
Thu Jul 11 16:46:43 UTC 2013


> When you first mount your volume, look in the client log and see if it's connecting to both bricks. 
>  I suspect it's not and that the failure is related to firewall settings.

Logs from both nodes below.  For this test, first I did "umount /firewall-scripts" from both nodes.   Then I did “mount –av” using the default parameters in my fstab file.  I did **not** turn on the backupvolfile-server=<secondary server> for this test.   And then in another window, I did "tail tail /var/log/glusterfs/firewall-scripts.log -f" and you can see the spot where I mounted my file system back up again.  

Note that everything works as expected when both nodes are online, so this suggests everyone can see everyone else when things are steady-state.   Also note that backupvolfile-server=<secondary server> changed the behavior - I documented this in an earlier post.  

> ...the failure is related to firewall settings.

No way.   I’m wide open on the interface I’m using for heartbeat and glusterfs.  In my application, I take node fw1 offline by inserting a firewall rule and then getting rid of it a few seconds later.   For testing right now, I just insert the rule by hand, look at a bunch of stuff, then get rid of it later.    But since you brought it up, I cleaned out all firewall rules before doing and logging the mounts below.  Near as I can tell, it looks like everyone can see everyone else.  And the logs look the same to my eye as they did before I dropped all (not relevant) firewall rules.  

Log from fw1:

[root at chicago-fw1 ~]#
[root at chicago-fw1 ~]# tail /var/log/glusterfs/firewall-scripts.log -f
[2013-07-11 15:51:54.423508] I [client-handshake.c:1456:client_setvolume_cbk] 0-firewall-scripts-client-1: Connected to 192.168.253.2:49152, attached to remote volume '/gluster-fw2'.
[2013-07-11 15:51:54.423576] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2013-07-11 15:51:54.440124] I [fuse-bridge.c:4723:fuse_graph_setup] 0-fuse: switched to graph 0
[2013-07-11 15:51:54.440660] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-1: Server lk version = 1
[2013-07-11 15:51:54.440886] I [fuse-bridge.c:3680:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.21
[2013-07-11 15:51:54.442235] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-firewall-scripts-replicate-0: added root inode
[2013-07-11 15:51:54.443451] I [afr-common.c:2120:afr_discovery_cbk] 0-firewall-scripts-replicate-0: selecting local read_child firewall-scripts-client-0
[2013-07-11 16:21:22.729423] I [fuse-bridge.c:4583:fuse_thread_proc] 0-fuse: unmounting /firewall-scripts
[2013-07-11 16:21:22.730976] W [glusterfsd.c:970:cleanup_and_exit] (-->/usr/lib64/libc.so.6(clone+0x6d) [0x7f7a69fee13d] (-->/usr/lib64/libpthread.so.0(+0x33c1607c53) [0x7f7a6a684c53] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7f7a6b372e35]))) 0-: received signum (15), shutting down
[2013-07-11 16:21:22.731040] I [fuse-bridge.c:5212:fini] 0-fuse: Unmounting '/firewall-scripts'.


Blank space - mount -av below.

[2013-07-11 16:39:36.625696] I [glusterfsd.c:1878:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.4.0beta3 (/usr/sbin/glusterfs --volfile-id=/firewall-scripts --volfile-server=192.168.253.1 /firewall-scripts)
[2013-07-11 16:39:36.640661] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled
[2013-07-11 16:39:36.640800] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread
[2013-07-11 16:39:36.672416] I [socket.c:3480:socket_init] 0-firewall-scripts-client-1: SSL support is NOT enabled
[2013-07-11 16:39:36.672539] I [socket.c:3495:socket_init] 0-firewall-scripts-client-1: using system polling thread
[2013-07-11 16:39:36.674545] I [socket.c:3480:socket_init] 0-firewall-scripts-client-0: SSL support is NOT enabled
[2013-07-11 16:39:36.674667] I [socket.c:3495:socket_init] 0-firewall-scripts-client-0: using system polling thread
[2013-07-11 16:39:36.675015] I [client.c:2154:notify] 0-firewall-scripts-client-0: parent translators are ready, attempting connect on transport
[2013-07-11 16:39:36.686253] I [client.c:2154:notify] 0-firewall-scripts-client-1: parent translators are ready, attempting connect on transport
Given volfile:
+------------------------------------------------------------------------------+
  1: volume firewall-scripts-client-0
  2:     type protocol/client
  3:     option password fb3955b7-a6ca-49bb-b886-d4b6609392f8
  4:     option username de6eacd1-31bc-4bdb-a049-776cd840059e
  5:     option transport-type tcp
  6:     option remote-subvolume /gluster-fw1
  7:     option remote-host 192.168.253.1
  8: end-volume
  9:
 10: volume firewall-scripts-client-1
 11:     type protocol/client
 12:     option password fb3955b7-a6ca-49bb-b886-d4b6609392f8
 13:     option username de6eacd1-31bc-4bdb-a049-776cd840059e
 14:     option transport-type tcp
 15:     option remote-subvolume /gluster-fw2
 16:     option remote-host 192.168.253.2
 17: end-volume
 18:
 19: volume firewall-scripts-replicate-0
 20:     type cluster/replicate
 21:     subvolumes firewall-scripts-client-0 firewall-scripts-client-1
 22: end-volume
 23:
 24: volume firewall-scripts-dht
 25:     type cluster/distribute
 26:     subvolumes firewall-scripts-replicate-0
 27: end-volume
 28:
 29: volume firewall-scripts-write-behind
 30:     type performance/write-behind
 31:     subvolumes firewall-scripts-dht
 32: end-volume
 33:
 34: volume firewall-scripts-read-ahead
 35:     type performance/read-ahead
 36:     subvolumes firewall-scripts-write-behind
 37: end-volume
 38:
 39: volume firewall-scripts-io-cache
 40:     type performance/io-cache
 41:     subvolumes firewall-scripts-read-ahead
 42: end-volume
 43:
 44: volume firewall-scripts-quick-read
 45:     type performance/quick-read
 46:     subvolumes firewall-scripts-io-cache
 47: end-volume
 48:
 49: volume firewall-scripts-open-behind
 50:     type performance/open-behind
 51:     subvolumes firewall-scripts-quick-read
 52: end-volume
 53:
 54: volume firewall-scripts-md-cache
 55:     type performance/md-cache
 56:     subvolumes firewall-scripts-open-behind
 57: end-volume
 58:
 59: volume firewall-scripts
 60:     type debug/io-stats
 61:     option count-fop-hits off
 62:     option latency-measurement off
 63:     subvolumes firewall-scripts-md-cache
 64: end-volume

+------------------------------------------------------------------------------+
[2013-07-11 16:39:36.698740] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-firewall-scripts-client-0: changing port to 49152 (from 0)
[2013-07-11 16:39:36.698974] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-11 16:39:36.711537] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-firewall-scripts-client-1: changing port to 49152 (from 0)
[2013-07-11 16:39:36.711717] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-1: readv failed (No data available)
[2013-07-11 16:39:36.723116] I [client-handshake.c:1658:select_server_supported_programs] 0-firewall-scripts-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2013-07-11 16:39:36.723521] I [client-handshake.c:1658:select_server_supported_programs] 0-firewall-scripts-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2013-07-11 16:39:36.723913] I [client-handshake.c:1456:client_setvolume_cbk] 0-firewall-scripts-client-0: Connected to 192.168.253.1:49152, attached to remote volume '/gluster-fw1'.
[2013-07-11 16:39:36.723995] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2013-07-11 16:39:36.724390] I [afr-common.c:3698:afr_notify] 0-firewall-scripts-replicate-0: Subvolume 'firewall-scripts-client-0' came back up; going online.
[2013-07-11 16:39:36.724601] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-0: Server lk version = 1
[2013-07-11 16:39:36.724730] I [client-handshake.c:1456:client_setvolume_cbk] 0-firewall-scripts-client-1: Connected to 192.168.253.2:49152, attached to remote volume '/gluster-fw2'.
[2013-07-11 16:39:36.724788] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2013-07-11 16:39:36.737359] I [fuse-bridge.c:4723:fuse_graph_setup] 0-fuse: switched to graph 0
[2013-07-11 16:39:36.739297] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-1: Server lk version = 1
[2013-07-11 16:39:36.739486] I [fuse-bridge.c:3680:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.21
[2013-07-11 16:39:36.740672] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-firewall-scripts-replicate-0: added root inode
[2013-07-11 16:39:36.741820] I [afr-common.c:2120:afr_discovery_cbk] 0-firewall-scripts-replicate-0: selecting local read_child firewall-scripts-client-0

And from fw2:

[root at chicago-fw2 ~]# tail /var/log/glusterfs/firewall-scripts.log -f
[2013-07-11 15:51:45.499012] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2013-07-11 15:51:45.512667] I [fuse-bridge.c:4723:fuse_graph_setup] 0-fuse: switched to graph 0
[2013-07-11 15:51:45.513211] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-0: Server lk version = 1
[2013-07-11 15:51:45.513416] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-1: Server lk version = 1
[2013-07-11 15:51:45.513538] I [fuse-bridge.c:3680:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.21
[2013-07-11 15:51:45.515208] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-firewall-scripts-replicate-0: added root inode
[2013-07-11 15:51:45.516512] I [afr-common.c:2120:afr_discovery_cbk] 0-firewall-scripts-replicate-0: selecting local read_child firewall-scripts-client-1
[2013-07-11 16:21:28.150710] I [fuse-bridge.c:4583:fuse_thread_proc] 0-fuse: unmounting /firewall-scripts
[2013-07-11 16:21:28.154455] W [glusterfsd.c:970:cleanup_and_exit] (-->/usr/lib64/libc.so.6(clone+0x6d) [0x7fa599ad613d] (-->/usr/lib64/libpthread.so.0(+0x3c1b407c53) [0x7fa59a16cc53] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7fa59ae5ae35]))) 0-: received signum (15), shutting down
[2013-07-11 16:21:28.154503] I [fuse-bridge.c:5212:fini] 0-fuse: Unmounting '/firewall-scripts'.


Blank space - this is where I did mount -av

[2013-07-11 16:39:35.100584] I [glusterfsd.c:1878:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.4.0beta3 (/usr/sbin/glusterfs --volfile-id=/firewall-scripts --volfile-server=192.168.253.2 /firewall-scripts)
[2013-07-11 16:39:35.113481] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled
[2013-07-11 16:39:35.113614] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread
[2013-07-11 16:39:35.147118] I [socket.c:3480:socket_init] 0-firewall-scripts-client-1: SSL support is NOT enabled
[2013-07-11 16:39:35.147313] I [socket.c:3495:socket_init] 0-firewall-scripts-client-1: using system polling thread
[2013-07-11 16:39:35.149112] I [socket.c:3480:socket_init] 0-firewall-scripts-client-0: SSL support is NOT enabled
[2013-07-11 16:39:35.149268] I [socket.c:3495:socket_init] 0-firewall-scripts-client-0: using system polling thread
[2013-07-11 16:39:35.149390] I [client.c:2154:notify] 0-firewall-scripts-client-0: parent translators are ready, attempting connect on transport
[2013-07-11 16:39:35.160491] I [client.c:2154:notify] 0-firewall-scripts-client-1: parent translators are ready, attempting connect on transport
Given volfile:
+------------------------------------------------------------------------------+
  1: volume firewall-scripts-client-0
  2:     type protocol/client
  3:     option password fb3955b7-a6ca-49bb-b886-d4b6609392f8
  4:     option username de6eacd1-31bc-4bdb-a049-776cd840059e
  5:     option transport-type tcp
  6:     option remote-subvolume /gluster-fw1
  7:     option remote-host 192.168.253.1
  8: end-volume
  9:
 10: volume firewall-scripts-client-1
 11:     type protocol/client
 12:     option password fb3955b7-a6ca-49bb-b886-d4b6609392f8
 13:     option username de6eacd1-31bc-4bdb-a049-776cd840059e
 14:     option transport-type tcp
 15:     option remote-subvolume /gluster-fw2
 16:     option remote-host 192.168.253.2
 17: end-volume
 18:
 19: volume firewall-scripts-replicate-0
 20:     type cluster/replicate
 21:     subvolumes firewall-scripts-client-0 firewall-scripts-client-1
 22: end-volume
 23:
 24: volume firewall-scripts-dht
 25:     type cluster/distribute
 26:     subvolumes firewall-scripts-replicate-0
 27: end-volume
 28:
 29: volume firewall-scripts-write-behind
 30:     type performance/write-behind
 31:     subvolumes firewall-scripts-dht
 32: end-volume
 33:
 34: volume firewall-scripts-read-ahead
 35:     type performance/read-ahead
 36:     subvolumes firewall-scripts-write-behind
 37: end-volume
 38:
 39: volume firewall-scripts-io-cache
 40:     type performance/io-cache
 41:     subvolumes firewall-scripts-read-ahead
 42: end-volume
 43:
 44: volume firewall-scripts-quick-read
 45:     type performance/quick-read
 46:     subvolumes firewall-scripts-io-cache
 47: end-volume
 48:
 49: volume firewall-scripts-open-behind
 50:     type performance/open-behind
 51:     subvolumes firewall-scripts-quick-read
 52: end-volume
 53:
 54: volume firewall-scripts-md-cache
 55:     type performance/md-cache
 56:     subvolumes firewall-scripts-open-behind
 57: end-volume
 58:
 59: volume firewall-scripts
 60:     type debug/io-stats
 61:     option count-fop-hits off
 62:     option latency-measurement off
 63:     subvolumes firewall-scripts-md-cache
 64: end-volume

+------------------------------------------------------------------------------+
[2013-07-11 16:39:35.173867] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-firewall-scripts-client-0: changing port to 49152 (from 0)
[2013-07-11 16:39:35.174065] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-firewall-scripts-client-1: changing port to 49152 (from 0)
[2013-07-11 16:39:35.174377] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-11 16:39:35.185807] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-1: readv failed (No data available)
[2013-07-11 16:39:35.197485] I [client-handshake.c:1658:select_server_supported_programs] 0-firewall-scripts-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2013-07-11 16:39:35.197740] I [client-handshake.c:1658:select_server_supported_programs] 0-firewall-scripts-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2013-07-11 16:39:35.198257] I [client-handshake.c:1456:client_setvolume_cbk] 0-firewall-scripts-client-0: Connected to 192.168.253.1:49152, attached to remote volume '/gluster-fw1'.
[2013-07-11 16:39:35.198346] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2013-07-11 16:39:35.198546] I [afr-common.c:3698:afr_notify] 0-firewall-scripts-replicate-0: Subvolume 'firewall-scripts-client-0' came back up; going online.
[2013-07-11 16:39:35.198759] I [client-handshake.c:1456:client_setvolume_cbk] 0-firewall-scripts-client-1: Connected to 192.168.253.2:49152, attached to remote volume '/gluster-fw2'.
[2013-07-11 16:39:35.198810] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2013-07-11 16:39:35.211534] I [fuse-bridge.c:4723:fuse_graph_setup] 0-fuse: switched to graph 0
[2013-07-11 16:39:35.211921] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-1: Server lk version = 1
[2013-07-11 16:39:35.212098] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-0: Server lk version = 1
[2013-07-11 16:39:35.212234] I [fuse-bridge.c:3680:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.21
[2013-07-11 16:39:35.213421] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-firewall-scripts-replicate-0: added root inode
[2013-07-11 16:39:35.214372] I [afr-common.c:2120:afr_discovery_cbk] 0-firewall-scripts-replicate-0: selecting local read_child firewall-scripts-client-1



More information about the Gluster-users mailing list