[Gluster-users] Data on gluster volume gone

Johan Karlsson Johan.Karlsson at dgc.se
Thu Sep 20 08:30:10 UTC 2018


I understand that a 2 way replica can require some fiddling with heal, but how is it possible that all data just vanished, even from the bricks?

---
gluster> volume info

Volume Name: gvol0
Type: Replicate
Volume ID: 17ed4d1c-2120-4fe8-abd6-dd77d7ddac59
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gfs01:/glusterdata/brick1/gvol0
Brick2: gfs02:/glusterdata/brick2/gvol0
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
---

---
gfs01 - Standard upgrade:

Start-Date: 2018-09-12  12:51:51
Commandline: apt-get dist-upgrade
---

---
gfs02 - standard upgrade:

Start-Date: 2018-09-12  13:28:32
Commandline: apt-get dist-upgrade
---

---
gfs01 glustershd.log

[2018-09-12 12:52:56.211130] W [socket.c:592:__socket_rwv] 0-glusterfs: readv on 127.0.0.1:24007 failed (No data available)
[2018-09-12 12:52:56.211155] I [glusterfsd-mgmt.c:2341:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: localhost
[2018-09-12 12:53:06.844040] E [socket.c:2517:socket_connect_finish] 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused); disconnecting socket
[2018-09-12 12:53:06.844066] I [glusterfsd-mgmt.c:2362:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2018-09-12 12:54:04.224545] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fee21cfa6ba] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xed) [0x55872a03a70d] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55872a03a524
] ) 0-: received signum (15), shutting down
[2018-09-12 12:54:05.221508] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 4.1.4 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid
-l /var/log/glusterfs/glustershd.log -S /var/run/gluster/c7535c5e8ebaab32.socket --xlator-option *replicate*.node-uuid=5865e739-3c64-4039-8f96-5fc7a75d00fe --process-name glustershd)
[2018-09-12 12:54:05.225264] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-09-12 12:54:06.246818] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-09-12 12:54:06.247109] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-0: parent translators are ready, attempting connect on transport
[2018-09-12 12:54:06.247236] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-1: parent translators are ready, attempting connect on transport
[2018-09-12 12:54:06.247269] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
Final graph:
+------------------------------------------------------------------------------+
  1: volume gvol0-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host gfs01
  5:     option remote-subvolume /glusterdata/brick1/gvol0
  6:     option transport-type socket
  7:     option transport.address-family inet
  8:     option username d5e3e173-156f-46c1-9eb7-a35b201fc311
  9:     option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
 10:     option transport.tcp-user-timeout 0
 11:     option transport.socket.keepalive-time 20
 12:     option transport.socket.keepalive-interval 2
 13:     option transport.socket.keepalive-count 9
 14: end-volume
 15:
 16: volume gvol0-client-1
 17:     type protocol/client
 18:     option ping-timeout 42
 19:     option remote-host gfs02
 20:     option remote-subvolume /glusterdata/brick2/gvol0
 21:     option transport-type socket
 22:     option transport.address-family inet
 23:     option username d5e3e173-156f-46c1-9eb7-a35b201fc311
 24:     option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
 25:     option transport.tcp-user-timeout 0
 26:     option transport.socket.keepalive-time 20
 27:     option transport.socket.keepalive-interval 2
 28:     option transport.socket.keepalive-count 9
 29: end-volume
 30:
 31: volume gvol0-replicate-0
 32:     type cluster/replicate
 33:     option node-uuid 5865e739-3c64-4039-8f96-5fc7a75d00fe
 34:     option afr-pending-xattr gvol0-client-0,gvol0-client-1
 35:     option background-self-heal-count 0
 36:     option metadata-self-heal on
 37:     option data-self-heal on
 38:     option entry-self-heal on
 39:     option self-heal-daemon enable
 40:     option use-compound-fops off
 41:     option iam-self-heal-daemon yes
 42:     subvolumes gvol0-client-0 gvol0-client-1
 43: end-volume
 44:
 45: volume glustershd
[2018-09-12 12:54:06.247484] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
 46:     type debug/io-stats
 47:     option log-level INFO
 48:     subvolumes gvol0-replicate-0
 49: end-volume
 50:
+------------------------------------------------------------------------------+
[2018-09-12 12:54:06.249099] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 12:54:06.249561] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-1: changing port to 49152 (from 0)
[2018-09-12 12:54:06.249790] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 12:54:06.250309] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 12:54:06.250889] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-1: Connected to gvol0-client-1, attached to remote volume '/glusterdata/brick2/gvol0'.
[2018-09-12 12:54:06.250904] I [MSGID: 108005] [afr-common.c:5240:__afr_handle_child_up_event] 0-gvol0-replicate-0: Subvolume 'gvol0-client-1' came back up; going online.
[2018-09-12 12:54:06.260091] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 12:54:06.269981] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-0: changing port to 49152 (from 0)
[2018-09-12 12:54:06.270175] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 12:54:06.270309] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 12:54:06.270698] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-0: Connected to gvol0-client-0, attached to remote volume '/glusterdata/brick1/gvol0'.
[2018-09-12 13:57:40.616257] W [socket.c:592:__socket_rwv] 0-glusterfs: readv on 127.0.0.1:24007 failed (No data available)
[2018-09-12 13:57:40.616312] I [glusterfsd-mgmt.c:2348:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: localhost
[2018-09-12 13:57:50.942555] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fb690a156ba] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xed) [0x561b24e0d70d] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x561b24e0d524
] ) 0-: received signum (15), shutting down
[2018-09-12 13:58:06.192019] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 4.1.4 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid
-l /var/log/glusterfs/glustershd.log -S /var/run/gluster/c7535c5e8ebaab32.socket --xlator-option *replicate*.node-uuid=5865e739-3c64-4039-8f96-5fc7a75d00fe --process-name glustershd)
[2018-09-12 13:58:06.196996] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-09-12 13:58:07.322458] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-09-12 13:58:07.322772] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-0: parent translators are ready, attempting connect on transport
[2018-09-12 13:58:07.323166] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-1: parent translators are ready, attempting connect on transport
[2018-09-12 13:58:07.323196] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.323327] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.323420] E [MSGID: 114058] [client-handshake.c:1523:client_query_portmap_cbk] 0-gvol0-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2018-09-12 13:58:07.323459] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-gvol0-client-0: disconnected from gvol0-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2018-09-12 13:58:07.323486] E [MSGID: 108006] [afr-common.c:5317:__afr_handle_child_down_event] 0-gvol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
Final graph:
+------------------------------------------------------------------------------+
  1: volume gvol0-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host gfs01
  5:     option remote-subvolume /glusterdata/brick1/gvol0
  6:     option transport-type socket
  7:     option transport.address-family inet
  8:     option username d5e3e173-156f-46c1-9eb7-a35b201fc311
  9:     option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
 10:     option transport.tcp-user-timeout 0
 11:     option transport.socket.keepalive-time 20
 12:     option transport.socket.keepalive-interval 2
 13:     option transport.socket.keepalive-count 9
 14: end-volume
 15:
 16: volume gvol0-client-1
 17:     type protocol/client
 18:     option ping-timeout 42
 19:     option remote-host gfs02
 20:     option remote-subvolume /glusterdata/brick2/gvol0
 21:     option transport-type socket
 22:     option transport.address-family inet
 23:     option username d5e3e173-156f-46c1-9eb7-a35b201fc311
 24:     option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
 25:     option transport.tcp-user-timeout 0
 26:     option transport.socket.keepalive-time 20
 27:     option transport.socket.keepalive-interval 2
 28:     option transport.socket.keepalive-count 9
 29: end-volume
 30:
 31: volume gvol0-replicate-0
 32:     type cluster/replicate
 33:     option node-uuid 5865e739-3c64-4039-8f96-5fc7a75d00fe
 34:     option afr-pending-xattr gvol0-client-0,gvol0-client-1
 35:     option background-self-heal-count 0
 36:     option metadata-self-heal on
 37:     option data-self-heal on
 38:     option entry-self-heal on
 39:     option self-heal-daemon enable
 40:     option use-compound-fops off
 41:     option iam-self-heal-daemon yes
 42:     subvolumes gvol0-client-0 gvol0-client-1
 43: end-volume
 44:
 45: volume glustershd
 46:     type debug/io-stats
 47:     option log-level INFO
 48:     subvolumes gvol0-replicate-0
 49: end-volume
 50:
+------------------------------------------------------------------------------+
[2018-09-12 13:58:07.323808] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.324101] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.324288] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-1: changing port to 49152 (from 0)
[2018-09-12 13:58:07.324737] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.325066] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.337185] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-1: Connected to gvol0-client-1, attached to remote volume '/glusterdata/brick2/gvol0'.
[2018-09-12 13:58:07.337202] I [MSGID: 108005] [afr-common.c:5240:__afr_handle_child_up_event] 0-gvol0-replicate-0: Subvolume 'gvol0-client-1' came back up; going online.
[2018-09-12 13:58:11.193402] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:11.193575] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:11.193661] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-0: changing port to 49152 (from 0)
[2018-09-12 13:58:11.193975] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:11.194217] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:11.194773] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-0: Connected to gvol0-client-0, attached to remote volume '/glusterdata/brick1/gvol0'.
[2018-09-12 13:59:05.215057] W [socket.c:592:__socket_rwv] 0-gvol0-client-1: readv on 192.168.4.85:49152 failed (No data available)
[2018-09-12 13:59:05.215112] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-gvol0-client-1: disconnected from gvol0-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2018-09-12 13:59:18.521991] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:19.504398] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:19.505038] E [MSGID: 114058] [client-handshake.c:1523:client_query_portmap_cbk] 0-gvol0-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2018-09-12 13:59:19.505088] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-gvol0-client-1: disconnected from gvol0-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2018-09-12 13:59:21.519674] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:21.519929] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:21.520103] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-1: changing port to 49152 (from 0)
[2018-09-12 13:59:21.520531] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:21.520754] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:21.521890] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-1: Connected to gvol0-client-1, attached to remote volume '/glusterdata/brick2/gvol0'.
---

---
gfs01 mountpoint log:

[2018-09-12 13:58:06.497145] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 4.1.4 (args: /usr/sbin/glusterfs --process-name fuse --volfile-server=gfs01 --volfile-id=/gvol0 /tss/filestore)
[2018-09-12 13:58:06.534575] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-09-12 13:58:07.381591] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-09-12 13:58:07.386730] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-0: parent translators are ready, attempting connect on transport
[2018-09-12 13:58:07.387087] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-1: parent translators are ready, attempting connect on transport
[2018-09-12 13:58:07.387129] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.387268] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
Final graph:
+------------------------------------------------------------------------------+
  1: volume gvol0-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host gfs01
  5:     option remote-subvolume /glusterdata/brick1/gvol0
  6:     option transport-type socket
  7:     option transport.address-family inet
  8:     option username d5e3e173-156f-46c1-9eb7-a35b201fc311
  9:     option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
 10:     option transport.tcp-user-timeout 0
 11:     option transport.socket.keepalive-time 20
 12:     option transport.socket.keepalive-interval 2
 13:     option transport.socket.keepalive-count 9
 14:     option send-gids true
 15: end-volume
 16:
 17: volume gvol0-client-1
 18:     type protocol/client
 19:     option ping-timeout 42
 20:     option remote-host gfs02
 21:     option remote-subvolume /glusterdata/brick2/gvol0
 22:     option transport-type socket
 23:     option transport.address-family inet
 24:     option username d5e3e173-156f-46c1-9eb7-a35b201fc311
 25:     option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
 26:     option transport.tcp-user-timeout 0
 27:     option transport.socket.keepalive-time 20
 28:     option transport.socket.keepalive-interval 2
 29:     option transport.socket.keepalive-count 9
[2018-09-12 13:58:07.387367] E [MSGID: 114058] [client-handshake.c:1523:client_query_portmap_cbk] 0-gvol0-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
 30:     option send-gids true
 31: end-volume
 32:
 33: volume gvol0-replicate-0
 34:     type cluster/replicate
 35:     option afr-pending-xattr gvol0-client-0,gvol0-client-1
 36:     option use-compound-fops off
 37:     subvolumes gvol0-client-0 gvol0-client-1
 38: end-volume
 39:
 40: volume gvol0-dht
 41:     type cluster/distribute
 42:     option lock-migration off
 43:     option force-migration off
[2018-09-12 13:58:07.387461] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-gvol0-client-0: disconnected from gvol0-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2018-09-12 13:58:07.387490] E [MSGID: 108006] [afr-common.c:5317:__afr_handle_child_down_event] 0-gvol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
 44:     subvolumes gvol0-replicate-0
 45: end-volume
 46:
 47: volume gvol0-write-behind
 48:     type performance/write-behind
 49:     subvolumes gvol0-dht
 50: end-volume
 51:
 52: volume gvol0-read-ahead
 53:     type performance/read-ahead
 54:     subvolumes gvol0-write-behind
 55: end-volume
 56:
 57: volume gvol0-readdir-ahead
 58:     type performance/readdir-ahead
 59:     option parallel-readdir off
 60:     option rda-request-size 131072
 61:     option rda-cache-limit 10MB
 62:     subvolumes gvol0-read-ahead
 63: end-volume
 64:
 65: volume gvol0-io-cache
 66:     type performance/io-cache
 67:     subvolumes gvol0-readdir-ahead
 68: end-volume
 69:
 70: volume gvol0-quick-read
 71:     type performance/quick-read
 72:     subvolumes gvol0-io-cache
 73: end-volume
 74:
 75: volume gvol0-open-behind
 76:     type performance/open-behind
 77:     subvolumes gvol0-quick-read
 78: end-volume
 79:
 80: volume gvol0-md-cache
[2018-09-12 13:58:07.387621] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
 81:     type performance/md-cache
 82:     subvolumes gvol0-open-behind
 83: end-volume
 84:
 85: volume gvol0
 86:     type debug/io-stats
 87:     option log-level INFO
 88:     option latency-measurement off
 89:     option count-fop-hits off
 90:     subvolumes gvol0-md-cache
 91: end-volume
 92:
 93: volume meta-autoload
 94:     type meta
 95:     subvolumes gvol0
 96: end-volume
 97:
+------------------------------------------------------------------------------+
[2018-09-12 13:58:07.387891] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.388118] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-1: changing port to 49152 (from 0)
[2018-09-12 13:58:07.388701] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.389814] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:07.390371] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-1: Connected to gvol0-client-1, attached to remote volume '/glusterdata/brick2/gvol0'.
[2018-09-12 13:58:07.390390] I [MSGID: 108005] [afr-common.c:5240:__afr_handle_child_up_event] 0-gvol0-replicate-0: Subvolume 'gvol0-client-1' came back up; going online.
[2018-09-12 13:58:07.391330] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.23
[2018-09-12 13:58:07.391346] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: switched to graph 0
[2018-09-12 13:58:07.393037] I [MSGID: 109005] [dht-selfheal.c:2342:dht_selfheal_directory] 0-gvol0-dht: Directory selfheal failed: Unable to form layout for directory /
[2018-09-12 13:58:10.534498] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:10.534637] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:10.534727] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-0: changing port to 49152 (from 0)
[2018-09-12 13:58:10.535015] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:10.535155] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:10.536297] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-0: Connected to gvol0-client-0, attached to remote volume '/glusterdata/brick1/gvol0'.
[2018-09-12 13:59:05.215073] W [socket.c:592:__socket_rwv] 0-gvol0-client-1: readv on 192.168.4.85:49152 failed (No data available)
[2018-09-12 13:59:05.215112] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-gvol0-client-1: disconnected from gvol0-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2018-09-12 13:59:18.861826] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:19.505060] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:19.517843] E [MSGID: 114058] [client-handshake.c:1523:client_query_portmap_cbk] 0-gvol0-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2018-09-12 13:59:19.517934] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-gvol0-client-1: disconnected from gvol0-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2018-09-12 13:59:21.860457] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:21.860727] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:21.860903] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-1: changing port to 49152 (from 0)
[2018-09-12 13:59:21.861333] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:21.861588] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:21.862134] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-1: Connected to gvol0-client-1, attached to remote volume '/glusterdata/brick2/gvol0'.
---

---
gfs02 glustershd.log

[2018-09-12 13:29:24.440044] W [socket.c:592:__socket_rwv] 0-glusterfs: readv on 127.0.0.1:24007 failed (No data available)
[2018-09-12 13:29:24.440066] I [glusterfsd-mgmt.c:2341:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: localhost
[2018-09-12 13:29:35.300684] E [socket.c:2517:socket_connect_finish] 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused); disconnecting socket
[2018-09-12 13:29:35.300719] I [glusterfsd-mgmt.c:2362:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2018-09-12 13:30:28.718734] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f671aa8f6ba] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xed) [0x55d18aa3670d] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55d18aa36524
] ) 0-: received signum (15), shutting down
[2018-09-12 13:30:29.721210] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 4.1.4 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid
-l /var/log/glusterfs/glustershd.log -S /var/run/gluster/3c69308176cfc594.socket --xlator-option *replicate*.node-uuid=44192eee-3f26-4e14-84d5-be847d66df7b --process-name glustershd)
[2018-09-12 13:30:29.724100] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-09-12 13:30:30.748354] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-09-12 13:30:30.752656] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-0: parent translators are ready, attempting connect on transport
[2018-09-12 13:30:30.752794] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-1: parent translators are ready, attempting connect on transport
[2018-09-12 13:30:30.753009] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
Final graph:
+------------------------------------------------------------------------------+
  1: volume gvol0-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host gfs01
  5:     option remote-subvolume /glusterdata/brick1/gvol0
  6:     option transport-type socket
  7:     option transport.address-family inet
  8:     option username d5e3e173-156f-46c1-9eb7-a35b201fc311
[2018-09-12 13:30:30.754060] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
  9:     option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
 10:     option transport.tcp-user-timeout 0
 11:     option transport.socket.keepalive-time 20
 12:     option transport.socket.keepalive-interval 2
 13:     option transport.socket.keepalive-count 9
 14: end-volume
 15:
 16: volume gvol0-client-1
 17:     type protocol/client
 18:     option ping-timeout 42
 19:     option remote-host gfs02
 20:     option remote-subvolume /glusterdata/brick2/gvol0
 21:     option transport-type socket
 22:     option transport.address-family inet
 23:     option username d5e3e173-156f-46c1-9eb7-a35b201fc311
 24:     option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
 25:     option transport.tcp-user-timeout 0
 26:     option transport.socket.keepalive-time 20
 27:     option transport.socket.keepalive-interval 2
 28:     option transport.socket.keepalive-count 9
 29: end-volume
 30:
 31: volume gvol0-replicate-0
 32:     type cluster/replicate
 33:     option node-uuid 44192eee-3f26-4e14-84d5-be847d66df7b
 34:     option afr-pending-xattr gvol0-client-0,gvol0-client-1
 35:     option background-self-heal-count 0
 36:     option metadata-self-heal on
 37:     option data-self-heal on
 38:     option entry-self-heal on
 39:     option self-heal-daemon enable
 40:     option use-compound-fops off
 41:     option iam-self-heal-daemon yes
 42:     subvolumes gvol0-client-0 gvol0-client-1
 43: end-volume
 44:
 45: volume glustershd
 46:     type debug/io-stats
 47:     option log-level INFO
 48:     subvolumes gvol0-replicate-0
 49: end-volume
 50:
+------------------------------------------------------------------------------+
[2018-09-12 13:30:30.763395] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:30:30.765518] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:30:30.765727] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-0: changing port to 49152 (from 0)
[2018-09-12 13:30:30.766021] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:30:30.766308] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:30:30.767339] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-0: Connected to gvol0-client-0, attached to remote volume '/glusterdata/brick1/gvol0'.
[2018-09-12 13:30:30.767362] I [MSGID: 108005] [afr-common.c:5240:__afr_handle_child_up_event] 0-gvol0-replicate-0: Subvolume 'gvol0-client-0' came back up; going online.
[2018-09-12 13:30:30.772846] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-1: changing port to 49152 (from 0)
[2018-09-12 13:30:30.773011] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:30:30.773125] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:30:30.773472] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-1: Connected to gvol0-client-1, attached to remote volume '/glusterdata/brick2/gvol0'.
[2018-09-12 13:58:05.409172] W [socket.c:592:__socket_rwv] 0-gvol0-client-0: readv on 192.168.4.84:49152 failed (Connection reset by peer)
[2018-09-12 13:58:05.409219] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-gvol0-client-0: disconnected from gvol0-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2018-09-12 13:58:15.871815] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:15.872066] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:15.872229] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-0: changing port to 49152 (from 0)
[2018-09-12 13:58:15.872457] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:15.872704] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:58:15.873272] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-0: Connected to gvol0-client-0, attached to remote volume '/glusterdata/brick1/gvol0'.
[2018-09-12 13:58:54.575838] W [socket.c:592:__socket_rwv] 0-glusterfs: readv on 127.0.0.1:24007 failed (No data available)
[2018-09-12 13:58:54.575873] I [glusterfsd-mgmt.c:2348:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: localhost
[2018-09-12 13:59:04.876731] E [socket.c:2517:socket_connect_finish] 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused); disconnecting socket
[2018-09-12 13:59:04.876764] I [glusterfsd-mgmt.c:2369:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2018-09-12 13:59:05.213422] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f995004b6ba] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xed) [0x55c76d21470d] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55c76d214524
] ) 0-: received signum (15), shutting down
[2018-09-12 13:59:25.843013] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 4.1.4 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid
-l /var/log/glusterfs/glustershd.log -S /var/run/gluster/3c69308176cfc594.socket --xlator-option *replicate*.node-uuid=44192eee-3f26-4e14-84d5-be847d66df7b --process-name glustershd)
[2018-09-12 13:59:25.847197] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-09-12 13:59:26.945403] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-09-12 13:59:26.945824] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-0: parent translators are ready, attempting connect on transport
[2018-09-12 13:59:26.946110] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-1: parent translators are ready, attempting connect on transport
[2018-09-12 13:59:26.946384] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
Final graph:
+------------------------------------------------------------------------------+
  1: volume gvol0-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host gfs01
  5:     option remote-subvolume /glusterdata/brick1/gvol0
  6:     option transport-type socket
  7:     option transport.address-family inet
  8:     option username d5e3e173-156f-46c1-9eb7-a35b201fc311
  9:     option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
 10:     option transport.tcp-user-timeout 0
 11:     option transport.socket.keepalive-time 20
 12:     option transport.socket.keepalive-interval 2
 13:     option transport.socket.keepalive-count 9
 14: end-volume
 15:
 16: volume gvol0-client-1
 17:     type protocol/client
 18:     option ping-timeout 42
 19:     option remote-host gfs02
 20:     option remote-subvolume /glusterdata/brick2/gvol0
 21:     option transport-type socket
 22:     option transport.address-family inet
 23:     option username d5e3e173-156f-46c1-9eb7-a35b201fc311
 24:     option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
 25:     option transport.tcp-user-timeout 0
 26:     option transport.socket.keepalive-time 20
 27:     option transport.socket.keepalive-interval 2
 28:     option transport.socket.keepalive-count 9
 29: end-volume
 30:
 31: volume gvol0-replicate-0
 32:     type cluster/replicate
 33:     option node-uuid 44192eee-3f26-4e14-84d5-be847d66df7b
 34:     option afr-pending-xattr gvol0-client-0,gvol0-client-1
 35:     option background-self-heal-count 0
 36:     option metadata-self-heal on
 37:     option data-self-heal on
 38:     option entry-self-heal on
 39:     option self-heal-daemon enable
 40:     option use-compound-fops off
 41:     option iam-self-heal-daemon yes
 42:     subvolumes gvol0-client-0 gvol0-client-1
 43: end-volume
 44:
 45: volume glustershd
 46:     type debug/io-stats
 47:     option log-level INFO
 48:     subvolumes gvol0-replicate-0
 49: end-volume
 50:
+------------------------------------------------------------------------------+
[2018-09-12 13:59:26.946860] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:26.946961] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:26.946966] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:26.947054] E [MSGID: 114058] [client-handshake.c:1523:client_query_portmap_cbk] 0-gvol0-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2018-09-12 13:59:26.947165] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-0: changing port to 49152 (from 0)
[2018-09-12 13:59:26.947213] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-gvol0-client-1: disconnected from gvol0-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2018-09-12 13:59:26.947233] E [MSGID: 108006] [afr-common.c:5317:__afr_handle_child_down_event] 0-gvol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2018-09-12 13:59:26.947557] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:26.947796] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:26.948355] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-0: Connected to gvol0-client-0, attached to remote volume '/glusterdata/brick1/gvol0'.
[2018-09-12 13:59:26.948368] I [MSGID: 108005] [afr-common.c:5240:__afr_handle_child_up_event] 0-gvol0-replicate-0: Subvolume 'gvol0-client-0' came back up; going online.
[2018-09-12 13:59:30.845313] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:30.845467] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:30.845537] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-1: changing port to 49152 (from 0)
[2018-09-12 13:59:30.845785] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:30.845953] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:30.846293] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-1: Connected to gvol0-client-1, attached to remote volume '/glusterdata/brick2/gvol0'.
---

---
gfs02 mountpoint log:

[2018-09-12 13:59:26.116762] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 4.1.4 (args: /usr/sbin/glusterfs --process-name fuse --volfile-server=gfs02 --volfile-id=/gvol0 /tss/filestore)
[2018-09-12 13:59:26.142136] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-09-12 13:59:27.029834] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-09-12 13:59:27.034636] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-0: parent translators are ready, attempting connect on transport
[2018-09-12 13:59:27.034977] I [MSGID: 114020] [client.c:2328:notify] 0-gvol0-client-1: parent translators are ready, attempting connect on transport
Final graph:
+------------------------------------------------------------------------------+
  1: volume gvol0-client-0
[2018-09-12 13:59:27.035277] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:27.035328] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host gfs01
  5:     option remote-subvolume /glusterdata/brick1/gvol0
  6:     option transport-type socket
  7:     option transport.address-family inet
  8:     option username d5e3e173-156f-46c1-9eb7-a35b201fc311
  9:     option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
 10:     option transport.tcp-user-timeout 0
 11:     option transport.socket.keepalive-time 20
 12:     option transport.socket.keepalive-interval 2
 13:     option transport.socket.keepalive-count 9
 14:     option send-gids true
 15: end-volume
 16:
 17: volume gvol0-client-1
 18:     type protocol/client
 19:     option ping-timeout 42
 20:     option remote-host gfs02
 21:     option remote-subvolume /glusterdata/brick2/gvol0
 22:     option transport-type socket
 23:     option transport.address-family inet
 24:     option username d5e3e173-156f-46c1-9eb7-a35b201fc311
 25:     option password 8d3c3564-cef3-4261-90bd-c64e85c6d267
 26:     option transport.tcp-user-timeout 0
 27:     option transport.socket.keepalive-time 20
 28:     option transport.socket.keepalive-interval 2
 29:     option transport.socket.keepalive-count 9
 30:     option send-gids true
 31: end-volume
 32:
 33: volume gvol0-replicate-0
 34:     type cluster/replicate
 35:     option afr-pending-xattr gvol0-client-0,gvol0-client-1
 36:     option use-compound-fops off
 37:     subvolumes gvol0-client-0 gvol0-client-1
 38: end-volume
 39:
 40: volume gvol0-dht
 41:     type cluster/distribute
 42:     option lock-migration off
 43:     option force-migration off
 44:     subvolumes gvol0-replicate-0
 45: end-volume
 46:
 47: volume gvol0-write-behind
 48:     type performance/write-behind
 49:     subvolumes gvol0-dht
 50: end-volume
 51:
 52: volume gvol0-read-ahead
 53:     type performance/read-ahead
 54:     subvolumes gvol0-write-behind
 55: end-volume
 56:
 57: volume gvol0-readdir-ahead
 58:     type performance/readdir-ahead
 59:     option parallel-readdir off
 60:     option rda-request-size 131072
 61:     option rda-cache-limit 10MB
 62:     subvolumes gvol0-read-ahead
 63: end-volume
 64:
 65: volume gvol0-io-cache
 66:     type performance/io-cache
 67:     subvolumes gvol0-readdir-ahead
 68: end-volume
 69:
 70: volume gvol0-quick-read
 71:     type performance/quick-read
 72:     subvolumes gvol0-io-cache
 73: end-volume
 74:
 75: volume gvol0-open-behind
 76:     type performance/open-behind
 77:     subvolumes gvol0-quick-read
 78: end-volume
[2018-09-12 13:59:27.035568] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
 79:
[2018-09-12 13:59:27.035672] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
 80: volume gvol0-md-cache
 81:     type performance/md-cache
 82:     subvolumes gvol0-open-behind
 83: end-volume
 84:
 85: volume gvol0
 86:     type debug/io-stats
 87:     option log-level INFO
 88:     option latency-measurement off
 89:     option count-fop-hits off
 90:     subvolumes gvol0-md-cache
 91: end-volume
 92:
 93: volume meta-autoload
 94:     type meta
 95:     subvolumes gvol0
 96: end-volume
 97:
+------------------------------------------------------------------------------+
[2018-09-12 13:59:27.035769] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-0: changing port to 49152 (from 0)
[2018-09-12 13:59:27.036156] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:27.036187] E [MSGID: 114058] [client-handshake.c:1523:client_query_portmap_cbk] 0-gvol0-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2018-09-12 13:59:27.036230] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-gvol0-client-1: disconnected from gvol0-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2018-09-12 13:59:27.036240] E [MSGID: 108006] [afr-common.c:5317:__afr_handle_child_down_event] 0-gvol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2018-09-12 13:59:27.036411] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-0: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:27.036967] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-0: Connected to gvol0-client-0, attached to remote volume '/glusterdata/brick1/gvol0'.
[2018-09-12 13:59:27.036979] I [MSGID: 108005] [afr-common.c:5240:__afr_handle_child_up_event] 0-gvol0-replicate-0: Subvolume 'gvol0-client-0' came back up; going online.
[2018-09-12 13:59:27.037684] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.23
[2018-09-12 13:59:27.037696] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: switched to graph 0
[2018-09-12 13:59:27.038866] I [MSGID: 109005] [dht-selfheal.c:2342:dht_selfheal_directory] 0-gvol0-dht: Directory selfheal failed: Unable to form layout for directory /
[2018-09-12 13:59:30.139072] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:30.139208] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:30.139282] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-gvol0-client-1: changing port to 49152 (from 0)
[2018-09-12 13:59:30.139537] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:30.139650] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-gvol0-client-1: error returned while attempting to connect to host:(null), port:0
[2018-09-12 13:59:30.139981] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-gvol0-client-1: Connected to gvol0-client-1, attached to remote volume '/glusterdata/brick2/gvol0'.


Regards,


Johan Karlsson


________________________________
From: Pranith Kumar Karampuri <pkarampu at redhat.com>
Sent: Thursday, September 20, 2018 8:13:47 AM
To: Gowdappa, Raghavendra
Cc: Johan Karlsson; gluster-users; Ravishankar Narayanankutty
Subject: Re: [Gluster-users] Data on gluster volume gone

Please also attach the logs for the mount points and the glustershd.logs

On Thu, Sep 20, 2018 at 11:41 AM Pranith Kumar Karampuri <pkarampu at redhat.com<mailto:pkarampu at redhat.com>> wrote:
How did you do the upgrade?

On Thu, Sep 20, 2018 at 11:01 AM Raghavendra Gowdappa <rgowdapp at redhat.com<mailto:rgowdapp at redhat.com>> wrote:


On Thu, Sep 20, 2018 at 1:29 AM, Raghavendra Gowdappa <rgowdapp at redhat.com<mailto:rgowdapp at redhat.com>> wrote:
Can you give volume info? Looks like you are using 2 way replica.

Yes indeed.
    gluster volume create gvol0 replica 2 gfs01:/glusterdata/brick1/gvol0 gfs02:/glusterdata/brick2/gvol0

+Pranith. +Ravi.

Not sure whether 2 way replication has caused this. From what I understand we need either 3 way replication or arbiter for correct resolution of heals.


On Wed, Sep 19, 2018 at 9:39 AM, Johan Karlsson <Johan.Karlsson at dgc.se<mailto:Johan.Karlsson at dgc.se>> wrote:
I have two servers setup with glusterFS in replica mode, a single volume exposed via a mountpoint. The servers are running Ubuntu 16.04 LTS

After a package upgrade + reboot of both servers, it was discovered that the data was completely gone. New data written on the volume via the mountpoint is replicated correctly, and gluster status/info commands states that everything is ok (no split brain scenario or any healing needed etc). But the previous data is completely gone, not even present on any of the bricks.

The following upgrade was done:

glusterfs-server:amd64 (4.1.0-ubuntu1~xenial3 -> 4.1.4-ubuntu1~xenial1)
glusterfs-client:amd64 (4.1.0-ubuntu1~xenial3 -> 4.1.4-ubuntu1~xenial1)
glusterfs-common:amd64 (4.1.0-ubuntu1~xenial3 -> 4.1.4-ubuntu1~xenial1)

The logs only show that connection between the servers was lost, which is expected.

I can't even determine if it was the package upgrade or the reboot that caused this issue, but I've tried to recreate the issue without success.

Any idea what could have gone wrong, or if I have done some wrong during the setup. For reference, this is how I've done the setup:

---
Add a separate disk with a single partition on both servers (/dev/sdb1)

Add gfs hostnames for direct communication without DNS, on both servers:

/etc/hosts

192.168.4.45    gfs01
192.168.4.46    gfs02

On gfs01, create a new LVM Volume Group:
  vgcreate gfs01-vg /dev/sdb1

And on the gfs02:
  vgcreate gfs02-vg /dev/sdb1

Create logical volumes named "brick" on the servers:

gfs01:
  lvcreate -l 100%VG -n brick1 gfs01-vg
gfs02:
  lvcreate -l 100%VG -n brick2 gfs02-vg

Format the volumes with ext4 filesystem:

gfs01:
  mkfs.ext4 /dev/gfs01-vg/brick1
gfs02:
  mkfs.ext4 /dev/gfs02-vg/brick2

Create a mountpoint for the bricks on the servers:

gfs01:
  mkdir -p /glusterdata/brick1
gds02:
  mkdir -p /glusterdata/brick2

Make a permanent mount on the servers:

gfs01:
/dev/gfs01-vg/brick1    /glusterdata/brick1     ext4    defaults        0     0
gfs02:
/dev/gfs02-vg/brick2    /glusterdata/brick2     ext4    defaults        0     0

Mount it:
  mount -a

Create a gluster volume mount point on the bricks on the servers:

gfs01:
  mkdir -p /glusterdata/brick1/gvol0
gfs02:
  mkdir -p /glusterdata/brick2/gvol0

>From each server, peer probe the other one:

  gluster peer probe gfs01
peer probe: success

  gluster peer probe gfs02
peer probe: success

>From any single server, create the gluster volume as a "replica" with two nodes; gfs01 and gfs02:

  gluster volume create gvol0 replica 2 gfs01:/glusterdata/brick1/gvol0 gfs02:/glusterdata/brick2/gvol0

Start the volume:

  gluster volume start gvol0

On each server, mount the gluster filesystem on the /filestore mount point:

gfs01:
  mount -t glusterfs gfs01:/gvol0 /filestore
gfs02:
  mount -t glusterfs gfs02:/gvol0 /filestore

Make the mount permanent on the servers:

/etc/fstab

gfs01:
  gfs01:/gvol0 /filestore glusterfs defaults,_netdev 0 0
gfs02:
  gfs02:/gvol0 /filestore glusterfs defaults,_netdev 0 0
---

Regards,

Johan Karlsson
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users




--
Pranith


--
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180920/98d7acd8/attachment.html>


More information about the Gluster-users mailing list