[Gluster-users] Data on gluster volume gone
Johan Karlsson
Johan.Karlsson at dgc.se
Thu Sep 20 08:30:10 UTC 2018
I understand that a 2 way replica can require some fiddling with heal, but how is it possible that all data just vanished, even from the bricks?
---
gluster> volume info
Volume Name: gvol0
Type: Replicate
Volume ID: 17ed4d1c-2120-4fe8-abd6-dd77d7ddac59
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gfs01:/glusterdata/brick1/gvol0
Brick2: gfs02:/glusterdata/brick2/gvol0
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
---
---
gfs01 - Standard upgrade:
Start-Date: 2018-09-12 12:51:51
Commandline: apt-get dist-upgrade
---
---
gfs02 - standard upgrade:
Start-Date: 2018-09-12 13:28:32
Commandline: apt-get dist-upgrade
---
---
gfs01 glustershd.log
[2018-09-12 12:52:56.211130] W [socket.c:592:__socket_rwv] 0-glusterfs: readv on 127.0.0.1:24007 failed (No data available)
[2018-09-12 12:52:56.211155] I [glusterfsd-mgmt.c:2341:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: localhost
[2018-09-12 12:53:06.844040] E [socket.c:2517:socket_connect_finish] 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused); disconnecting socket
[2018-09-12 12:53:06.844066] I [glusterfsd-mgmt.c:2362:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2018-09-12 12:54:04.224545] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fee21cfa6ba] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xed) [0x55872a03a70d] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55872a03a524
] ) 0-: received signum (15), shutting down
[2018-09-12 12:54:05.221508] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 4.1.4 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid
-l /var/log/glusterfs/glustershd.log -S /var/run/gluster/c7535c5e8ebaab32.socket --xlator-option *replicate*.node-uuid=5865e739-3c64-4039-8f96-5fc7a75d00fe --process-name glustershd)
[2018-09-12 12:54:05.225264] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-09-12 12:54:06.246818] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-09-12 12:54:06.247109] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-0: parent translators are ready, attempting connect on transport
[2018-09-12 12:54:06.247236] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-1: parent translators are ready, attempting connect on transport
[2018-09-12 12:54:06.247269] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
Final graph:
+------------------------------------------------------------------------------+
1: volume gvol0-client-0
2: type protocol/client
3: option ping-timeout 42
4: option remote-host gfs01
5: option remote-subvolume /glusterdata/brick1/gvol0
6: option transport-type socket
7: option transport.address-family inet
8: option username d5e3e173-156f-46c1-9eb7-a35b201fc311
9: option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
10: option transport.tcp-user-timeout 0
11: option transport.socket.keepalive-time 20
12: option transport.socket.keepalive-interval 2
13: option transport.socket.keepalive-count 9
14: end-volume
15:
16: volume gvol0-client-1
17: type protocol/client
18: option ping-timeout 42
19: option remote-host gfs02
20: option remote-subvolume /glusterdata/brick2/gvol0
21: option transport-type socket
22: option transport.address-family inet
23: option username d5e3e173-156f-46c1-9eb7-a35b201fc311
24: option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
25: option transport.tcp-user-timeout 0
26: option transport.socket.keepalive-time 20
27: option transport.socket.keepalive-interval 2
28: option transport.socket.keepalive-count 9
29: end-volume
30:
31: volume gvol0-replicate-0
32: type cluster/replicate
33: option node-uuid 5865e739-3c64-4039-8f96-5fc7a75d00fe
34: option afr-pending-xattr gvol0-client-0,gvol0-client-1
35: option background-self-heal-count 0
36: option metadata-self-heal on
37: option data-self-heal on
38: option entry-self-heal on
39: option self-heal-daemon enable
40: option use-compound-fops off
41: option iam-self-heal-daemon yes
42: subvolumes gvol0-client-0 gvol0-client-1
43: end-volume
44:
45: volume glustershd
[2018-09-12 12:54:06.247484] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
46: type debug/io-stats
47: option log-level INFO
48: subvolumes gvol0-replicate-0
49: end-volume
50:
+------------------------------------------------------------------------------+
[2018-09-12 12:54:06.249099] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 12:54:06.249561] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-1: changing port to 49152 (from 0)
[2018-09-12 12:54:06.249790] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 12:54:06.250309] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 12:54:06.250889] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-1: Connected to gvol0-client-1, attached to remote volume '/glusterdata/brick2/gvol0'.
[2018-09-12 12:54:06.250904] I [MSGID: 108005] [afr-common.c:5240:__afr_handle_child_up_event] 0-gvol0-replicate-0: Subvolume 'gvol0-client-1' came back up; going online.
[2018-09-12 12:54:06.260091] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 12:54:06.269981] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-0: changing port to 49152 (from 0)
[2018-09-12 12:54:06.270175] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 12:54:06.270309] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 12:54:06.270698] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-0: Connected to gvol0-client-0, attached to remote volume '/glusterdata/brick1/gvol0'.
[2018-09-12 13:57:40.616257] W [socket.c:592:__socket_rwv] 0-glusterfs: readv on 127.0.0.1:24007 failed (No data available)
[2018-09-12 13:57:40.616312] I [glusterfsd-mgmt.c:2348:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: localhost
[2018-09-12 13:57:50.942555] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fb690a156ba] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xed) [0x561b24e0d70d] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x561b24e0d524
] ) 0-: received signum (15), shutting down
[2018-09-12 13:58:06.192019] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 4.1.4 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid
-l /var/log/glusterfs/glustershd.log -S /var/run/gluster/c7535c5e8ebaab32.socket --xlator-option *replicate*.node-uuid=5865e739-3c64-4039-8f96-5fc7a75d00fe --process-name glustershd)
[2018-09-12 13:58:06.196996] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-09-12 13:58:07.322458] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-09-12 13:58:07.322772] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-0: parent translators are ready, attempting connect on transport
[2018-09-12 13:58:07.323166] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-1: parent translators are ready, attempting connect on transport
[2018-09-12 13:58:07.323196] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.323327] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.323420] E [MSGID: 114058] [client-handshake.c:1523:client_query_portmap_cbk] 0-gvol0-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2018-09-12 13:58:07.323459] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-gvol0-client-0: disconnected from gvol0-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2018-09-12 13:58:07.323486] E [MSGID: 108006] [afr-common.c:5317:__afr_handle_child_down_event] 0-gvol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
Final graph:
+------------------------------------------------------------------------------+
1: volume gvol0-client-0
2: type protocol/client
3: option ping-timeout 42
4: option remote-host gfs01
5: option remote-subvolume /glusterdata/brick1/gvol0
6: option transport-type socket
7: option transport.address-family inet
8: option username d5e3e173-156f-46c1-9eb7-a35b201fc311
9: option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
10: option transport.tcp-user-timeout 0
11: option transport.socket.keepalive-time 20
12: option transport.socket.keepalive-interval 2
13: option transport.socket.keepalive-count 9
14: end-volume
15:
16: volume gvol0-client-1
17: type protocol/client
18: option ping-timeout 42
19: option remote-host gfs02
20: option remote-subvolume /glusterdata/brick2/gvol0
21: option transport-type socket
22: option transport.address-family inet
23: option username d5e3e173-156f-46c1-9eb7-a35b201fc311
24: option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
25: option transport.tcp-user-timeout 0
26: option transport.socket.keepalive-time 20
27: option transport.socket.keepalive-interval 2
28: option transport.socket.keepalive-count 9
29: end-volume
30:
31: volume gvol0-replicate-0
32: type cluster/replicate
33: option node-uuid 5865e739-3c64-4039-8f96-5fc7a75d00fe
34: option afr-pending-xattr gvol0-client-0,gvol0-client-1
35: option background-self-heal-count 0
36: option metadata-self-heal on
37: option data-self-heal on
38: option entry-self-heal on
39: option self-heal-daemon enable
40: option use-compound-fops off
41: option iam-self-heal-daemon yes
42: subvolumes gvol0-client-0 gvol0-client-1
43: end-volume
44:
45: volume glustershd
46: type debug/io-stats
47: option log-level INFO
48: subvolumes gvol0-replicate-0
49: end-volume
50:
+------------------------------------------------------------------------------+
[2018-09-12 13:58:07.323808] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.324101] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.324288] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-1: changing port to 49152 (from 0)
[2018-09-12 13:58:07.324737] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.325066] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.337185] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-1: Connected to gvol0-client-1, attached to remote volume '/glusterdata/brick2/gvol0'.
[2018-09-12 13:58:07.337202] I [MSGID: 108005] [afr-common.c:5240:__afr_handle_child_up_event] 0-gvol0-replicate-0: Subvolume 'gvol0-client-1' came back up; going online.
[2018-09-12 13:58:11.193402] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:11.193575] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:11.193661] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-0: changing port to 49152 (from 0)
[2018-09-12 13:58:11.193975] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:11.194217] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:11.194773] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-0: Connected to gvol0-client-0, attached to remote volume '/glusterdata/brick1/gvol0'.
[2018-09-12 13:59:05.215057] W [socket.c:592:__socket_rwv] 0-gvol0-client-1: readv on 192.168.4.85:49152 failed (No data available)
[2018-09-12 13:59:05.215112] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-gvol0-client-1: disconnected from gvol0-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2018-09-12 13:59:18.521991] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:19.504398] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:19.505038] E [MSGID: 114058] [client-handshake.c:1523:client_query_portmap_cbk] 0-gvol0-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2018-09-12 13:59:19.505088] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-gvol0-client-1: disconnected from gvol0-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2018-09-12 13:59:21.519674] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:21.519929] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:21.520103] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-1: changing port to 49152 (from 0)
[2018-09-12 13:59:21.520531] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:21.520754] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:21.521890] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-1: Connected to gvol0-client-1, attached to remote volume '/glusterdata/brick2/gvol0'.
---
---
gfs01 mountpoint log:
[2018-09-12 13:58:06.497145] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 4.1.4 (args: /usr/sbin/glusterfs --process-name fuse --volfile-server=gfs01 --volfile-id=/gvol0 /tss/filestore)
[2018-09-12 13:58:06.534575] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-09-12 13:58:07.381591] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-09-12 13:58:07.386730] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-0: parent translators are ready, attempting connect on transport
[2018-09-12 13:58:07.387087] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-1: parent translators are ready, attempting connect on transport
[2018-09-12 13:58:07.387129] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.387268] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
Final graph:
+------------------------------------------------------------------------------+
1: volume gvol0-client-0
2: type protocol/client
3: option ping-timeout 42
4: option remote-host gfs01
5: option remote-subvolume /glusterdata/brick1/gvol0
6: option transport-type socket
7: option transport.address-family inet
8: option username d5e3e173-156f-46c1-9eb7-a35b201fc311
9: option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
10: option transport.tcp-user-timeout 0
11: option transport.socket.keepalive-time 20
12: option transport.socket.keepalive-interval 2
13: option transport.socket.keepalive-count 9
14: option send-gids true
15: end-volume
16:
17: volume gvol0-client-1
18: type protocol/client
19: option ping-timeout 42
20: option remote-host gfs02
21: option remote-subvolume /glusterdata/brick2/gvol0
22: option transport-type socket
23: option transport.address-family inet
24: option username d5e3e173-156f-46c1-9eb7-a35b201fc311
25: option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
26: option transport.tcp-user-timeout 0
27: option transport.socket.keepalive-time 20
28: option transport.socket.keepalive-interval 2
29: option transport.socket.keepalive-count 9
[2018-09-12 13:58:07.387367] E [MSGID: 114058] [client-handshake.c:1523:client_query_portmap_cbk] 0-gvol0-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
30: option send-gids true
31: end-volume
32:
33: volume gvol0-replicate-0
34: type cluster/replicate
35: option afr-pending-xattr gvol0-client-0,gvol0-client-1
36: option use-compound-fops off
37: subvolumes gvol0-client-0 gvol0-client-1
38: end-volume
39:
40: volume gvol0-dht
41: type cluster/distribute
42: option lock-migration off
43: option force-migration off
[2018-09-12 13:58:07.387461] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-gvol0-client-0: disconnected from gvol0-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2018-09-12 13:58:07.387490] E [MSGID: 108006] [afr-common.c:5317:__afr_handle_child_down_event] 0-gvol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
44: subvolumes gvol0-replicate-0
45: end-volume
46:
47: volume gvol0-write-behind
48: type performance/write-behind
49: subvolumes gvol0-dht
50: end-volume
51:
52: volume gvol0-read-ahead
53: type performance/read-ahead
54: subvolumes gvol0-write-behind
55: end-volume
56:
57: volume gvol0-readdir-ahead
58: type performance/readdir-ahead
59: option parallel-readdir off
60: option rda-request-size 131072
61: option rda-cache-limit 10MB
62: subvolumes gvol0-read-ahead
63: end-volume
64:
65: volume gvol0-io-cache
66: type performance/io-cache
67: subvolumes gvol0-readdir-ahead
68: end-volume
69:
70: volume gvol0-quick-read
71: type performance/quick-read
72: subvolumes gvol0-io-cache
73: end-volume
74:
75: volume gvol0-open-behind
76: type performance/open-behind
77: subvolumes gvol0-quick-read
78: end-volume
79:
80: volume gvol0-md-cache
[2018-09-12 13:58:07.387621] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
81: type performance/md-cache
82: subvolumes gvol0-open-behind
83: end-volume
84:
85: volume gvol0
86: type debug/io-stats
87: option log-level INFO
88: option latency-measurement off
89: option count-fop-hits off
90: subvolumes gvol0-md-cache
91: end-volume
92:
93: volume meta-autoload
94: type meta
95: subvolumes gvol0
96: end-volume
97:
+------------------------------------------------------------------------------+
[2018-09-12 13:58:07.387891] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.388118] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-1: changing port to 49152 (from 0)
[2018-09-12 13:58:07.388701] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.389814] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.390371] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-1: Connected to gvol0-client-1, attached to remote volume '/glusterdata/brick2/gvol0'.
[2018-09-12 13:58:07.390390] I [MSGID: 108005] [afr-common.c:5240:__afr_handle_child_up_event] 0-gvol0-replicate-0: Subvolume 'gvol0-client-1' came back up; going online.
[2018-09-12 13:58:07.391330] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.23
[2018-09-12 13:58:07.391346] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: switched to graph 0
[2018-09-12 13:58:07.393037] I [MSGID: 109005] [dht-selfheal.c:2342:dht_selfheal_directory] 0-gvol0-dht: Directory selfheal failed: Unable to form layout for directory /
[2018-09-12 13:58:10.534498] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:10.534637] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:10.534727] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-0: changing port to 49152 (from 0)
[2018-09-12 13:58:10.535015] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:10.535155] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:10.536297] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-0: Connected to gvol0-client-0, attached to remote volume '/glusterdata/brick1/gvol0'.
[2018-09-12 13:59:05.215073] W [socket.c:592:__socket_rwv] 0-gvol0-client-1: readv on 192.168.4.85:49152 failed (No data available)
[2018-09-12 13:59:05.215112] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-gvol0-client-1: disconnected from gvol0-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2018-09-12 13:59:18.861826] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:19.505060] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:19.517843] E [MSGID: 114058] [client-handshake.c:1523:client_query_portmap_cbk] 0-gvol0-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2018-09-12 13:59:19.517934] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-gvol0-client-1: disconnected from gvol0-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2018-09-12 13:59:21.860457] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:21.860727] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:21.860903] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-1: changing port to 49152 (from 0)
[2018-09-12 13:59:21.861333] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:21.861588] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:21.862134] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-1: Connected to gvol0-client-1, attached to remote volume '/glusterdata/brick2/gvol0'.
---
---
gfs02 glustershd.log
[2018-09-12 13:29:24.440044] W [socket.c:592:__socket_rwv] 0-glusterfs: readv on 127.0.0.1:24007 failed (No data available)
[2018-09-12 13:29:24.440066] I [glusterfsd-mgmt.c:2341:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: localhost
[2018-09-12 13:29:35.300684] E [socket.c:2517:socket_connect_finish] 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused); disconnecting socket
[2018-09-12 13:29:35.300719] I [glusterfsd-mgmt.c:2362:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2018-09-12 13:30:28.718734] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f671aa8f6ba] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xed) [0x55d18aa3670d] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55d18aa36524
] ) 0-: received signum (15), shutting down
[2018-09-12 13:30:29.721210] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 4.1.4 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid
-l /var/log/glusterfs/glustershd.log -S /var/run/gluster/3c69308176cfc594.socket --xlator-option *replicate*.node-uuid=44192eee-3f26-4e14-84d5-be847d66df7b --process-name glustershd)
[2018-09-12 13:30:29.724100] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-09-12 13:30:30.748354] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-09-12 13:30:30.752656] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-0: parent translators are ready, attempting connect on transport
[2018-09-12 13:30:30.752794] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-1: parent translators are ready, attempting connect on transport
[2018-09-12 13:30:30.753009] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
Final graph:
+------------------------------------------------------------------------------+
1: volume gvol0-client-0
2: type protocol/client
3: option ping-timeout 42
4: option remote-host gfs01
5: option remote-subvolume /glusterdata/brick1/gvol0
6: option transport-type socket
7: option transport.address-family inet
8: option username d5e3e173-156f-46c1-9eb7-a35b201fc311
[2018-09-12 13:30:30.754060] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
9: option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
10: option transport.tcp-user-timeout 0
11: option transport.socket.keepalive-time 20
12: option transport.socket.keepalive-interval 2
13: option transport.socket.keepalive-count 9
14: end-volume
15:
16: volume gvol0-client-1
17: type protocol/client
18: option ping-timeout 42
19: option remote-host gfs02
20: option remote-subvolume /glusterdata/brick2/gvol0
21: option transport-type socket
22: option transport.address-family inet
23: option username d5e3e173-156f-46c1-9eb7-a35b201fc311
24: option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
25: option transport.tcp-user-timeout 0
26: option transport.socket.keepalive-time 20
27: option transport.socket.keepalive-interval 2
28: option transport.socket.keepalive-count 9
29: end-volume
30:
31: volume gvol0-replicate-0
32: type cluster/replicate
33: option node-uuid 44192eee-3f26-4e14-84d5-be847d66df7b
34: option afr-pending-xattr gvol0-client-0,gvol0-client-1
35: option background-self-heal-count 0
36: option metadata-self-heal on
37: option data-self-heal on
38: option entry-self-heal on
39: option self-heal-daemon enable
40: option use-compound-fops off
41: option iam-self-heal-daemon yes
42: subvolumes gvol0-client-0 gvol0-client-1
43: end-volume
44:
45: volume glustershd
46: type debug/io-stats
47: option log-level INFO
48: subvolumes gvol0-replicate-0
49: end-volume
50:
+------------------------------------------------------------------------------+
[2018-09-12 13:30:30.763395] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:30:30.765518] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:30:30.765727] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-0: changing port to 49152 (from 0)
[2018-09-12 13:30:30.766021] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:30:30.766308] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:30:30.767339] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-0: Connected to gvol0-client-0, attached to remote volume '/glusterdata/brick1/gvol0'.
[2018-09-12 13:30:30.767362] I [MSGID: 108005] [afr-common.c:5240:__afr_handle_child_up_event] 0-gvol0-replicate-0: Subvolume 'gvol0-client-0' came back up; going online.
[2018-09-12 13:30:30.772846] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-1: changing port to 49152 (from 0)
[2018-09-12 13:30:30.773011] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:30:30.773125] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:30:30.773472] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-1: Connected to gvol0-client-1, attached to remote volume '/glusterdata/brick2/gvol0'.
[2018-09-12 13:58:05.409172] W [socket.c:592:__socket_rwv] 0-gvol0-client-0: readv on 192.168.4.84:49152 failed (Connection reset by peer)
[2018-09-12 13:58:05.409219] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-gvol0-client-0: disconnected from gvol0-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2018-09-12 13:58:15.871815] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:15.872066] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:15.872229] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-0: changing port to 49152 (from 0)
[2018-09-12 13:58:15.872457] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:15.872704] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:15.873272] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-0: Connected to gvol0-client-0, attached to remote volume '/glusterdata/brick1/gvol0'.
[2018-09-12 13:58:54.575838] W [socket.c:592:__socket_rwv] 0-glusterfs: readv on 127.0.0.1:24007 failed (No data available)
[2018-09-12 13:58:54.575873] I [glusterfsd-mgmt.c:2348:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: localhost
[2018-09-12 13:59:04.876731] E [socket.c:2517:socket_connect_finish] 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused); disconnecting socket
[2018-09-12 13:59:04.876764] I [glusterfsd-mgmt.c:2369:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2018-09-12 13:59:05.213422] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f995004b6ba] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xed) [0x55c76d21470d] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55c76d214524
] ) 0-: received signum (15), shutting down
[2018-09-12 13:59:25.843013] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 4.1.4 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid
-l /var/log/glusterfs/glustershd.log -S /var/run/gluster/3c69308176cfc594.socket --xlator-option *replicate*.node-uuid=44192eee-3f26-4e14-84d5-be847d66df7b --process-name glustershd)
[2018-09-12 13:59:25.847197] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-09-12 13:59:26.945403] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-09-12 13:59:26.945824] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-0: parent translators are ready, attempting connect on transport
[2018-09-12 13:59:26.946110] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-1: parent translators are ready, attempting connect on transport
[2018-09-12 13:59:26.946384] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
Final graph:
+------------------------------------------------------------------------------+
1: volume gvol0-client-0
2: type protocol/client
3: option ping-timeout 42
4: option remote-host gfs01
5: option remote-subvolume /glusterdata/brick1/gvol0
6: option transport-type socket
7: option transport.address-family inet
8: option username d5e3e173-156f-46c1-9eb7-a35b201fc311
9: option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
10: option transport.tcp-user-timeout 0
11: option transport.socket.keepalive-time 20
12: option transport.socket.keepalive-interval 2
13: option transport.socket.keepalive-count 9
14: end-volume
15:
16: volume gvol0-client-1
17: type protocol/client
18: option ping-timeout 42
19: option remote-host gfs02
20: option remote-subvolume /glusterdata/brick2/gvol0
21: option transport-type socket
22: option transport.address-family inet
23: option username d5e3e173-156f-46c1-9eb7-a35b201fc311
24: option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
25: option transport.tcp-user-timeout 0
26: option transport.socket.keepalive-time 20
27: option transport.socket.keepalive-interval 2
28: option transport.socket.keepalive-count 9
29: end-volume
30:
31: volume gvol0-replicate-0
32: type cluster/replicate
33: option node-uuid 44192eee-3f26-4e14-84d5-be847d66df7b
34: option afr-pending-xattr gvol0-client-0,gvol0-client-1
35: option background-self-heal-count 0
36: option metadata-self-heal on
37: option data-self-heal on
38: option entry-self-heal on
39: option self-heal-daemon enable
40: option use-compound-fops off
41: option iam-self-heal-daemon yes
42: subvolumes gvol0-client-0 gvol0-client-1
43: end-volume
44:
45: volume glustershd
46: type debug/io-stats
47: option log-level INFO
48: subvolumes gvol0-replicate-0
49: end-volume
50:
+------------------------------------------------------------------------------+
[2018-09-12 13:59:26.946860] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:26.946961] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:26.946966] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:26.947054] E [MSGID: 114058] [client-handshake.c:1523:client_query_portmap_cbk] 0-gvol0-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2018-09-12 13:59:26.947165] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-0: changing port to 49152 (from 0)
[2018-09-12 13:59:26.947213] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-gvol0-client-1: disconnected from gvol0-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2018-09-12 13:59:26.947233] E [MSGID: 108006] [afr-common.c:5317:__afr_handle_child_down_event] 0-gvol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2018-09-12 13:59:26.947557] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:26.947796] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:26.948355] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-0: Connected to gvol0-client-0, attached to remote volume '/glusterdata/brick1/gvol0'.
[2018-09-12 13:59:26.948368] I [MSGID: 108005] [afr-common.c:5240:__afr_handle_child_up_event] 0-gvol0-replicate-0: Subvolume 'gvol0-client-0' came back up; going online.
[2018-09-12 13:59:30.845313] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:30.845467] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:30.845537] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-1: changing port to 49152 (from 0)
[2018-09-12 13:59:30.845785] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:30.845953] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:30.846293] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-1: Connected to gvol0-client-1, attached to remote volume '/glusterdata/brick2/gvol0'.
---
---
gfs02 mountpoint log:
[2018-09-12 13:59:26.116762] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 4.1.4 (args: /usr/sbin/glusterfs --process-name fuse --volfile-server=gfs02 --volfile-id=/gvol0 /tss/filestore)
[2018-09-12 13:59:26.142136] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-09-12 13:59:27.029834] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-09-12 13:59:27.034636] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-0: parent translators are ready, attempting connect on transport
[2018-09-12 13:59:27.034977] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-1: parent translators are ready, attempting connect on transport
Final graph:
+------------------------------------------------------------------------------+
1: volume gvol0-client-0
[2018-09-12 13:59:27.035277] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:27.035328] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
2: type protocol/client
3: option ping-timeout 42
4: option remote-host gfs01
5: option remote-subvolume /glusterdata/brick1/gvol0
6: option transport-type socket
7: option transport.address-family inet
8: option username d5e3e173-156f-46c1-9eb7-a35b201fc311
9: option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
10: option transport.tcp-user-timeout 0
11: option transport.socket.keepalive-time 20
12: option transport.socket.keepalive-interval 2
13: option transport.socket.keepalive-count 9
14: option send-gids true
15: end-volume
16:
17: volume gvol0-client-1
18: type protocol/client
19: option ping-timeout 42
20: option remote-host gfs02
21: option remote-subvolume /glusterdata/brick2/gvol0
22: option transport-type socket
23: option transport.address-family inet
24: option username d5e3e173-156f-46c1-9eb7-a35b201fc311
25: option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
26: option transport.tcp-user-timeout 0
27: option transport.socket.keepalive-time 20
28: option transport.socket.keepalive-interval 2
29: option transport.socket.keepalive-count 9
30: option send-gids true
31: end-volume
32:
33: volume gvol0-replicate-0
34: type cluster/replicate
35: option afr-pending-xattr gvol0-client-0,gvol0-client-1
36: option use-compound-fops off
37: subvolumes gvol0-client-0 gvol0-client-1
38: end-volume
39:
40: volume gvol0-dht
41: type cluster/distribute
42: option lock-migration off
43: option force-migration off
44: subvolumes gvol0-replicate-0
45: end-volume
46:
47: volume gvol0-write-behind
48: type performance/write-behind
49: subvolumes gvol0-dht
50: end-volume
51:
52: volume gvol0-read-ahead
53: type performance/read-ahead
54: subvolumes gvol0-write-behind
55: end-volume
56:
57: volume gvol0-readdir-ahead
58: type performance/readdir-ahead
59: option parallel-readdir off
60: option rda-request-size 131072
61: option rda-cache-limit 10MB
62: subvolumes gvol0-read-ahead
63: end-volume
64:
65: volume gvol0-io-cache
66: type performance/io-cache
67: subvolumes gvol0-readdir-ahead
68: end-volume
69:
70: volume gvol0-quick-read
71: type performance/quick-read
72: subvolumes gvol0-io-cache
73: end-volume
74:
75: volume gvol0-open-behind
76: type performance/open-behind
77: subvolumes gvol0-quick-read
78: end-volume
[2018-09-12 13:59:27.035568] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
79:
[2018-09-12 13:59:27.035672] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
80: volume gvol0-md-cache
81: type performance/md-cache
82: subvolumes gvol0-open-behind
83: end-volume
84:
85: volume gvol0
86: type debug/io-stats
87: option log-level INFO
88: option latency-measurement off
89: option count-fop-hits off
90: subvolumes gvol0-md-cache
91: end-volume
92:
93: volume meta-autoload
94: type meta
95: subvolumes gvol0
96: end-volume
97:
+------------------------------------------------------------------------------+
[2018-09-12 13:59:27.035769] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-0: changing port to 49152 (from 0)
[2018-09-12 13:59:27.036156] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:27.036187] E [MSGID: 114058] [client-handshake.c:1523:client_query_portmap_cbk] 0-gvol0-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2018-09-12 13:59:27.036230] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-gvol0-client-1: disconnected from gvol0-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2018-09-12 13:59:27.036240] E [MSGID: 108006] [afr-common.c:5317:__afr_handle_child_down_event] 0-gvol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2018-09-12 13:59:27.036411] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:27.036967] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-0: Connected to gvol0-client-0, attached to remote volume '/glusterdata/brick1/gvol0'.
[2018-09-12 13:59:27.036979] I [MSGID: 108005] [afr-common.c:5240:__afr_handle_child_up_event] 0-gvol0-replicate-0: Subvolume 'gvol0-client-0' came back up; going online.
[2018-09-12 13:59:27.037684] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.23
[2018-09-12 13:59:27.037696] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: switched to graph 0
[2018-09-12 13:59:27.038866] I [MSGID: 109005] [dht-selfheal.c:2342:dht_selfheal_directory] 0-gvol0-dht: Directory selfheal failed: Unable to form layout for directory /
[2018-09-12 13:59:30.139072] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:30.139208] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:30.139282] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-1: changing port to 49152 (from 0)
[2018-09-12 13:59:30.139537] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:30.139650] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:30.139981] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-1: Connected to gvol0-client-1, attached to remote volume '/glusterdata/brick2/gvol0'.
Regards,
Johan Karlsson
________________________________
From: Pranith Kumar Karampuri <pkarampu at redhat.com>
Sent: Thursday, September 20, 2018 8:13:47 AM
To: Gowdappa, Raghavendra
Cc: Johan Karlsson; gluster-users; Ravishankar Narayanankutty
Subject: Re: [Gluster-users] Data on gluster volume gone
Please also attach the logs for the mount points and the glustershd.logs
On Thu, Sep 20, 2018 at 11:41 AM Pranith Kumar Karampuri <pkarampu at redhat.com<mailto:pkarampu at redhat.com>> wrote:
How did you do the upgrade?
On Thu, Sep 20, 2018 at 11:01 AM Raghavendra Gowdappa <rgowdapp at redhat.com<mailto:rgowdapp at redhat.com>> wrote:
On Thu, Sep 20, 2018 at 1:29 AM, Raghavendra Gowdappa <rgowdapp at redhat.com<mailto:rgowdapp at redhat.com>> wrote:
Can you give volume info? Looks like you are using 2 way replica.
Yes indeed.
gluster volume create gvol0 replica 2 gfs01:/glusterdata/brick1/gvol0 gfs02:/glusterdata/brick2/gvol0
+Pranith. +Ravi.
Not sure whether 2 way replication has caused this. From what I understand we need either 3 way replication or arbiter for correct resolution of heals.
On Wed, Sep 19, 2018 at 9:39 AM, Johan Karlsson <Johan.Karlsson at dgc.se<mailto:Johan.Karlsson at dgc.se>> wrote:
I have two servers setup with glusterFS in replica mode, a single volume exposed via a mountpoint. The servers are running Ubuntu 16.04 LTS
After a package upgrade + reboot of both servers, it was discovered that the data was completely gone. New data written on the volume via the mountpoint is replicated correctly, and gluster status/info commands states that everything is ok (no split brain scenario or any healing needed etc). But the previous data is completely gone, not even present on any of the bricks.
The following upgrade was done:
glusterfs-server:amd64 (4.1.0-ubuntu1~xenial3 -> 4.1.4-ubuntu1~xenial1)
glusterfs-client:amd64 (4.1.0-ubuntu1~xenial3 -> 4.1.4-ubuntu1~xenial1)
glusterfs-common:amd64 (4.1.0-ubuntu1~xenial3 -> 4.1.4-ubuntu1~xenial1)
The logs only show that connection between the servers was lost, which is expected.
I can't even determine if it was the package upgrade or the reboot that caused this issue, but I've tried to recreate the issue without success.
Any idea what could have gone wrong, or if I have done some wrong during the setup. For reference, this is how I've done the setup:
---
Add a separate disk with a single partition on both servers (/dev/sdb1)
Add gfs hostnames for direct communication without DNS, on both servers:
/etc/hosts
192.168.4.45 gfs01
192.168.4.46 gfs02
On gfs01, create a new LVM Volume Group:
vgcreate gfs01-vg /dev/sdb1
And on the gfs02:
vgcreate gfs02-vg /dev/sdb1
Create logical volumes named "brick" on the servers:
gfs01:
lvcreate -l 100%VG -n brick1 gfs01-vg
gfs02:
lvcreate -l 100%VG -n brick2 gfs02-vg
Format the volumes with ext4 filesystem:
gfs01:
mkfs.ext4 /dev/gfs01-vg/brick1
gfs02:
mkfs.ext4 /dev/gfs02-vg/brick2
Create a mountpoint for the bricks on the servers:
gfs01:
mkdir -p /glusterdata/brick1
gds02:
mkdir -p /glusterdata/brick2
Make a permanent mount on the servers:
gfs01:
/dev/gfs01-vg/brick1 /glusterdata/brick1 ext4 defaults 0 0
gfs02:
/dev/gfs02-vg/brick2 /glusterdata/brick2 ext4 defaults 0 0
Mount it:
mount -a
Create a gluster volume mount point on the bricks on the servers:
gfs01:
mkdir -p /glusterdata/brick1/gvol0
gfs02:
mkdir -p /glusterdata/brick2/gvol0
>From each server, peer probe the other one:
gluster peer probe gfs01
peer probe: success
gluster peer probe gfs02
peer probe: success
>From any single server, create the gluster volume as a "replica" with two nodes; gfs01 and gfs02:
gluster volume create gvol0 replica 2 gfs01:/glusterdata/brick1/gvol0 gfs02:/glusterdata/brick2/gvol0
Start the volume:
gluster volume start gvol0
On each server, mount the gluster filesystem on the /filestore mount point:
gfs01:
mount -t glusterfs gfs01:/gvol0 /filestore
gfs02:
mount -t glusterfs gfs02:/gvol0 /filestore
Make the mount permanent on the servers:
/etc/fstab
gfs01:
gfs01:/gvol0 /filestore glusterfs defaults,_netdev 0 0
gfs02:
gfs02:/gvol0 /filestore glusterfs defaults,_netdev 0 0
---
Regards,
Johan Karlsson
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users
--
Pranith
--
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180920/98d7acd8/attachment.html>
More information about the Gluster-users
mailing list