[Gluster-users] One node goes offline, the other node can't see the replicated volume anymore
Greg Scott
GregScott at infrasupport.com
Thu Jul 11 16:46:43 UTC 2013
> When you first mount your volume, look in the client log and see if it's connecting to both bricks.
> I suspect it's not and that the failure is related to firewall settings.
Logs from both nodes below. For this test, first I did "umount /firewall-scripts" from both nodes. Then I did “mount –av” using the default parameters in my fstab file. I did **not** turn on the backupvolfile-server=<secondary server> for this test. And then in another window, I did "tail tail /var/log/glusterfs/firewall-scripts.log -f" and you can see the spot where I mounted my file system back up again.
Note that everything works as expected when both nodes are online, so this suggests everyone can see everyone else when things are steady-state. Also note that backupvolfile-server=<secondary server> changed the behavior - I documented this in an earlier post.
> ...the failure is related to firewall settings.
No way. I’m wide open on the interface I’m using for heartbeat and glusterfs. In my application, I take node fw1 offline by inserting a firewall rule and then getting rid of it a few seconds later. For testing right now, I just insert the rule by hand, look at a bunch of stuff, then get rid of it later. But since you brought it up, I cleaned out all firewall rules before doing and logging the mounts below. Near as I can tell, it looks like everyone can see everyone else. And the logs look the same to my eye as they did before I dropped all (not relevant) firewall rules.
Log from fw1:
[root at chicago-fw1 ~]#
[root at chicago-fw1 ~]# tail /var/log/glusterfs/firewall-scripts.log -f
[2013-07-11 15:51:54.423508] I [client-handshake.c:1456:client_setvolume_cbk] 0-firewall-scripts-client-1: Connected to 192.168.253.2:49152, attached to remote volume '/gluster-fw2'.
[2013-07-11 15:51:54.423576] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2013-07-11 15:51:54.440124] I [fuse-bridge.c:4723:fuse_graph_setup] 0-fuse: switched to graph 0
[2013-07-11 15:51:54.440660] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-1: Server lk version = 1
[2013-07-11 15:51:54.440886] I [fuse-bridge.c:3680:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.21
[2013-07-11 15:51:54.442235] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-firewall-scripts-replicate-0: added root inode
[2013-07-11 15:51:54.443451] I [afr-common.c:2120:afr_discovery_cbk] 0-firewall-scripts-replicate-0: selecting local read_child firewall-scripts-client-0
[2013-07-11 16:21:22.729423] I [fuse-bridge.c:4583:fuse_thread_proc] 0-fuse: unmounting /firewall-scripts
[2013-07-11 16:21:22.730976] W [glusterfsd.c:970:cleanup_and_exit] (-->/usr/lib64/libc.so.6(clone+0x6d) [0x7f7a69fee13d] (-->/usr/lib64/libpthread.so.0(+0x33c1607c53) [0x7f7a6a684c53] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7f7a6b372e35]))) 0-: received signum (15), shutting down
[2013-07-11 16:21:22.731040] I [fuse-bridge.c:5212:fini] 0-fuse: Unmounting '/firewall-scripts'.
Blank space - mount -av below.
[2013-07-11 16:39:36.625696] I [glusterfsd.c:1878:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.4.0beta3 (/usr/sbin/glusterfs --volfile-id=/firewall-scripts --volfile-server=192.168.253.1 /firewall-scripts)
[2013-07-11 16:39:36.640661] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled
[2013-07-11 16:39:36.640800] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread
[2013-07-11 16:39:36.672416] I [socket.c:3480:socket_init] 0-firewall-scripts-client-1: SSL support is NOT enabled
[2013-07-11 16:39:36.672539] I [socket.c:3495:socket_init] 0-firewall-scripts-client-1: using system polling thread
[2013-07-11 16:39:36.674545] I [socket.c:3480:socket_init] 0-firewall-scripts-client-0: SSL support is NOT enabled
[2013-07-11 16:39:36.674667] I [socket.c:3495:socket_init] 0-firewall-scripts-client-0: using system polling thread
[2013-07-11 16:39:36.675015] I [client.c:2154:notify] 0-firewall-scripts-client-0: parent translators are ready, attempting connect on transport
[2013-07-11 16:39:36.686253] I [client.c:2154:notify] 0-firewall-scripts-client-1: parent translators are ready, attempting connect on transport
Given volfile:
+------------------------------------------------------------------------------+
1: volume firewall-scripts-client-0
2: type protocol/client
3: option password fb3955b7-a6ca-49bb-b886-d4b6609392f8
4: option username de6eacd1-31bc-4bdb-a049-776cd840059e
5: option transport-type tcp
6: option remote-subvolume /gluster-fw1
7: option remote-host 192.168.253.1
8: end-volume
9:
10: volume firewall-scripts-client-1
11: type protocol/client
12: option password fb3955b7-a6ca-49bb-b886-d4b6609392f8
13: option username de6eacd1-31bc-4bdb-a049-776cd840059e
14: option transport-type tcp
15: option remote-subvolume /gluster-fw2
16: option remote-host 192.168.253.2
17: end-volume
18:
19: volume firewall-scripts-replicate-0
20: type cluster/replicate
21: subvolumes firewall-scripts-client-0 firewall-scripts-client-1
22: end-volume
23:
24: volume firewall-scripts-dht
25: type cluster/distribute
26: subvolumes firewall-scripts-replicate-0
27: end-volume
28:
29: volume firewall-scripts-write-behind
30: type performance/write-behind
31: subvolumes firewall-scripts-dht
32: end-volume
33:
34: volume firewall-scripts-read-ahead
35: type performance/read-ahead
36: subvolumes firewall-scripts-write-behind
37: end-volume
38:
39: volume firewall-scripts-io-cache
40: type performance/io-cache
41: subvolumes firewall-scripts-read-ahead
42: end-volume
43:
44: volume firewall-scripts-quick-read
45: type performance/quick-read
46: subvolumes firewall-scripts-io-cache
47: end-volume
48:
49: volume firewall-scripts-open-behind
50: type performance/open-behind
51: subvolumes firewall-scripts-quick-read
52: end-volume
53:
54: volume firewall-scripts-md-cache
55: type performance/md-cache
56: subvolumes firewall-scripts-open-behind
57: end-volume
58:
59: volume firewall-scripts
60: type debug/io-stats
61: option count-fop-hits off
62: option latency-measurement off
63: subvolumes firewall-scripts-md-cache
64: end-volume
+------------------------------------------------------------------------------+
[2013-07-11 16:39:36.698740] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-firewall-scripts-client-0: changing port to 49152 (from 0)
[2013-07-11 16:39:36.698974] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-11 16:39:36.711537] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-firewall-scripts-client-1: changing port to 49152 (from 0)
[2013-07-11 16:39:36.711717] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-1: readv failed (No data available)
[2013-07-11 16:39:36.723116] I [client-handshake.c:1658:select_server_supported_programs] 0-firewall-scripts-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2013-07-11 16:39:36.723521] I [client-handshake.c:1658:select_server_supported_programs] 0-firewall-scripts-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2013-07-11 16:39:36.723913] I [client-handshake.c:1456:client_setvolume_cbk] 0-firewall-scripts-client-0: Connected to 192.168.253.1:49152, attached to remote volume '/gluster-fw1'.
[2013-07-11 16:39:36.723995] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2013-07-11 16:39:36.724390] I [afr-common.c:3698:afr_notify] 0-firewall-scripts-replicate-0: Subvolume 'firewall-scripts-client-0' came back up; going online.
[2013-07-11 16:39:36.724601] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-0: Server lk version = 1
[2013-07-11 16:39:36.724730] I [client-handshake.c:1456:client_setvolume_cbk] 0-firewall-scripts-client-1: Connected to 192.168.253.2:49152, attached to remote volume '/gluster-fw2'.
[2013-07-11 16:39:36.724788] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2013-07-11 16:39:36.737359] I [fuse-bridge.c:4723:fuse_graph_setup] 0-fuse: switched to graph 0
[2013-07-11 16:39:36.739297] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-1: Server lk version = 1
[2013-07-11 16:39:36.739486] I [fuse-bridge.c:3680:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.21
[2013-07-11 16:39:36.740672] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-firewall-scripts-replicate-0: added root inode
[2013-07-11 16:39:36.741820] I [afr-common.c:2120:afr_discovery_cbk] 0-firewall-scripts-replicate-0: selecting local read_child firewall-scripts-client-0
And from fw2:
[root at chicago-fw2 ~]# tail /var/log/glusterfs/firewall-scripts.log -f
[2013-07-11 15:51:45.499012] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2013-07-11 15:51:45.512667] I [fuse-bridge.c:4723:fuse_graph_setup] 0-fuse: switched to graph 0
[2013-07-11 15:51:45.513211] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-0: Server lk version = 1
[2013-07-11 15:51:45.513416] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-1: Server lk version = 1
[2013-07-11 15:51:45.513538] I [fuse-bridge.c:3680:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.21
[2013-07-11 15:51:45.515208] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-firewall-scripts-replicate-0: added root inode
[2013-07-11 15:51:45.516512] I [afr-common.c:2120:afr_discovery_cbk] 0-firewall-scripts-replicate-0: selecting local read_child firewall-scripts-client-1
[2013-07-11 16:21:28.150710] I [fuse-bridge.c:4583:fuse_thread_proc] 0-fuse: unmounting /firewall-scripts
[2013-07-11 16:21:28.154455] W [glusterfsd.c:970:cleanup_and_exit] (-->/usr/lib64/libc.so.6(clone+0x6d) [0x7fa599ad613d] (-->/usr/lib64/libpthread.so.0(+0x3c1b407c53) [0x7fa59a16cc53] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7fa59ae5ae35]))) 0-: received signum (15), shutting down
[2013-07-11 16:21:28.154503] I [fuse-bridge.c:5212:fini] 0-fuse: Unmounting '/firewall-scripts'.
Blank space - this is where I did mount -av
[2013-07-11 16:39:35.100584] I [glusterfsd.c:1878:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.4.0beta3 (/usr/sbin/glusterfs --volfile-id=/firewall-scripts --volfile-server=192.168.253.2 /firewall-scripts)
[2013-07-11 16:39:35.113481] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled
[2013-07-11 16:39:35.113614] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread
[2013-07-11 16:39:35.147118] I [socket.c:3480:socket_init] 0-firewall-scripts-client-1: SSL support is NOT enabled
[2013-07-11 16:39:35.147313] I [socket.c:3495:socket_init] 0-firewall-scripts-client-1: using system polling thread
[2013-07-11 16:39:35.149112] I [socket.c:3480:socket_init] 0-firewall-scripts-client-0: SSL support is NOT enabled
[2013-07-11 16:39:35.149268] I [socket.c:3495:socket_init] 0-firewall-scripts-client-0: using system polling thread
[2013-07-11 16:39:35.149390] I [client.c:2154:notify] 0-firewall-scripts-client-0: parent translators are ready, attempting connect on transport
[2013-07-11 16:39:35.160491] I [client.c:2154:notify] 0-firewall-scripts-client-1: parent translators are ready, attempting connect on transport
Given volfile:
+------------------------------------------------------------------------------+
1: volume firewall-scripts-client-0
2: type protocol/client
3: option password fb3955b7-a6ca-49bb-b886-d4b6609392f8
4: option username de6eacd1-31bc-4bdb-a049-776cd840059e
5: option transport-type tcp
6: option remote-subvolume /gluster-fw1
7: option remote-host 192.168.253.1
8: end-volume
9:
10: volume firewall-scripts-client-1
11: type protocol/client
12: option password fb3955b7-a6ca-49bb-b886-d4b6609392f8
13: option username de6eacd1-31bc-4bdb-a049-776cd840059e
14: option transport-type tcp
15: option remote-subvolume /gluster-fw2
16: option remote-host 192.168.253.2
17: end-volume
18:
19: volume firewall-scripts-replicate-0
20: type cluster/replicate
21: subvolumes firewall-scripts-client-0 firewall-scripts-client-1
22: end-volume
23:
24: volume firewall-scripts-dht
25: type cluster/distribute
26: subvolumes firewall-scripts-replicate-0
27: end-volume
28:
29: volume firewall-scripts-write-behind
30: type performance/write-behind
31: subvolumes firewall-scripts-dht
32: end-volume
33:
34: volume firewall-scripts-read-ahead
35: type performance/read-ahead
36: subvolumes firewall-scripts-write-behind
37: end-volume
38:
39: volume firewall-scripts-io-cache
40: type performance/io-cache
41: subvolumes firewall-scripts-read-ahead
42: end-volume
43:
44: volume firewall-scripts-quick-read
45: type performance/quick-read
46: subvolumes firewall-scripts-io-cache
47: end-volume
48:
49: volume firewall-scripts-open-behind
50: type performance/open-behind
51: subvolumes firewall-scripts-quick-read
52: end-volume
53:
54: volume firewall-scripts-md-cache
55: type performance/md-cache
56: subvolumes firewall-scripts-open-behind
57: end-volume
58:
59: volume firewall-scripts
60: type debug/io-stats
61: option count-fop-hits off
62: option latency-measurement off
63: subvolumes firewall-scripts-md-cache
64: end-volume
+------------------------------------------------------------------------------+
[2013-07-11 16:39:35.173867] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-firewall-scripts-client-0: changing port to 49152 (from 0)
[2013-07-11 16:39:35.174065] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-firewall-scripts-client-1: changing port to 49152 (from 0)
[2013-07-11 16:39:35.174377] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-11 16:39:35.185807] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-1: readv failed (No data available)
[2013-07-11 16:39:35.197485] I [client-handshake.c:1658:select_server_supported_programs] 0-firewall-scripts-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2013-07-11 16:39:35.197740] I [client-handshake.c:1658:select_server_supported_programs] 0-firewall-scripts-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2013-07-11 16:39:35.198257] I [client-handshake.c:1456:client_setvolume_cbk] 0-firewall-scripts-client-0: Connected to 192.168.253.1:49152, attached to remote volume '/gluster-fw1'.
[2013-07-11 16:39:35.198346] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2013-07-11 16:39:35.198546] I [afr-common.c:3698:afr_notify] 0-firewall-scripts-replicate-0: Subvolume 'firewall-scripts-client-0' came back up; going online.
[2013-07-11 16:39:35.198759] I [client-handshake.c:1456:client_setvolume_cbk] 0-firewall-scripts-client-1: Connected to 192.168.253.2:49152, attached to remote volume '/gluster-fw2'.
[2013-07-11 16:39:35.198810] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2013-07-11 16:39:35.211534] I [fuse-bridge.c:4723:fuse_graph_setup] 0-fuse: switched to graph 0
[2013-07-11 16:39:35.211921] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-1: Server lk version = 1
[2013-07-11 16:39:35.212098] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-0: Server lk version = 1
[2013-07-11 16:39:35.212234] I [fuse-bridge.c:3680:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.21
[2013-07-11 16:39:35.213421] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-firewall-scripts-replicate-0: added root inode
[2013-07-11 16:39:35.214372] I [afr-common.c:2120:afr_discovery_cbk] 0-firewall-scripts-replicate-0: selecting local read_child firewall-scripts-client-1
More information about the Gluster-users
mailing list