<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;
        mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:#0563C1;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:#954F72;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri",sans-serif;
        mso-fareast-language:EN-US;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="en-AT" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US">Hi Gluster Community<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="en-AT"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="en-AT">We have a PVE </span><span lang="EN-US">Proxmox
</span><span lang="en-AT">cluster with two nodes. These two nodes each have 4 HDDs over which we have a glusterfs to migrate VMs live.<br>
<br>
A few days ago we had the problem that some disk files in the glusterfs got into a split-brain condition. We were able to secure the corresponding logfiles and resolve the split brain condition, but don't know how it happened. In the appendix you can find the
Glusterfs log files.<br>
<br>
Maybe one of you can tell us what caused the problem:<br>
<br>
Here is the network setup of the PVE Cluster<br>
<br>
192.168.231.0/24 --> Serverlan (reach PVE Gui port 8006)<br>
10.10.11.0 /24 --> Cluster Ha Lan<br>
10.10.12.0 /24 --> Glusterfs Storage lan<br>
<br>
Glusterfs Lan<br>
.) PVEServer1 - 10.10.12.31<br>
.) PVEServer2 - 10.10.12.32<br>
<br>
What we've seen in the mnt-pve-GlusterVol01.log log file:<br>
Server1:<br>
[2019-05-13 04:25:01.509716] I [MSGID: 100011] [glusterfsd.c:1396:reincarnate] 0-glusterfsd: Fetching the volume file from server...<br>
<br>
[2019-05-13 09:47:48.277650] W [socket.c:590:__socket_rwv] 0-glusterfs: readv on 10.10.12.31:24007 failed (No data available)<br>
<br>
[2019-05-13 09:47:48.277696] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.10.12.31 (No data available)<br>
<br>
[2019-05-13 09:47:48.277704] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers<br>
<br>
[2019-05-13 09:47:50.926948] W [glusterfsd.c:1327:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494) [0x7fe58a1eb494] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xf5) [0x55a8728115e5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55a872811444]
) 0-: received signum (15), shutting down<br>
<br>
[2019-05-13 09:47:50.926977] I [fuse-bridge.c:5794:fini] 0-fuse: Unmounting '/mnt/pve/GlusterVol01'.<br>
<br>
[2019-05-13 09:47:50.950381] I [fuse-bridge.c:5086:fuse_thread_proc] 0-fuse: unmounting /mnt/pve/GlusterVol01<br>
<br>
[2019-05-13 09:49:43.823117] I [MSGID: 100030] [glusterfsd.c:2454:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.8 (args: /usr/sbin/glusterfs --volfile-server=10.10.12.31 --volfile-id=vol0 /mnt/pve/GlusterVol01)<br>
<br>
[2019-05-13 09:49:43.828117] I [MSGID: 101190] [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1<br>
<br>
[2019-05-13 09:49:43.869885] W [MSGID: 108003] [afr.c:102:fix_quorum_options] 0-vol0-replicate-0: quorum-type none overriding quorum-count 1<br>
<br>
[2019-05-13 09:49:43.871644] I [MSGID: 101190] [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2<br>
<br>
[2019-05-13 09:49:43.880208] I [MSGID: 114020] [client.c:2356:notify] 0-vol0-client-0: parent translators are ready, attempting connect on transport<br>
<br>
[2019-05-13 09:49:43.880609] I [MSGID: 114020] [client.c:2356:notify] 0-vol0-client-1: parent translators are ready, attempting connect on transport<br>
<br>
[2019-05-13 09:49:43.880816] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-vol0-client-0: changing port to 49155 (from 0)<br>
<br>
Final graph:<br>
<br>
+------------------------------------------------------------------------------+<br>
<br>
1: volume vol0-client-0<br>
<br>
2: type protocol/client<br>
<br>
3: option ping-timeout 5<br>
<br>
4: option remote-host pvetau01-storage<br>
<br>
5: option remote-subvolume /var/lib/glusterfs/data01/brick1/vol0<br>
<br>
6: option transport-type socket<br>
<br>
7: option transport.address-family inet<br>
<br>
8: option username 4ccc2234-fba7-40f9-b97b-26d3fa8ab401<br>
<br>
9: option password cef1b5f5-b16c-4a3c-b49f-f814901a3252<br>
<br>
10: option filter-O_DIRECT enable<br>
<br>
11: option send-gids true<br>
<br>
12: end-volume<br>
<br>
13:<br>
<br>
14: volume vol0-client-1<br>
<br>
15: type protocol/client<br>
<br>
16: option ping-timeout 5<br>
<br>
17: option remote-host pvetau02-storage<br>
<br>
18: option remote-subvolume /var/lib/glusterfs/data01/brick1/vol0<br>
<br>
19: option transport-type socket<br>
<br>
20: option transport.address-family inet<br>
<br>
21: option username 4ccc2234-fba7-40f9-b97b-26d3fa8ab401<br>
<br>
22: option password cef1b5f5-b16c-4a3c-b49f-f814901a3252<br>
<br>
23: option filter-O_DIRECT enable<br>
<br>
24: option send-gids true<br>
<br>
25: end-volume<br>
<br>
26:<br>
<br>
27: volume vol0-replicate-0<br>
<br>
28: type cluster/replicate<br>
<br>
29: option eager-lock enable<br>
<br>
30: option quorum-count 1<br>
<br>
31: subvolumes vol0-client-0 vol0-client-1<br>
<br>
32: end-volume<br>
<br>
33:<br>
<br>
34: volume vol0-dht<br>
<br>
35: type cluster/distribute<br>
<br>
36: option lock-migration off<br>
<br>
37: subvolumes vol0-replicate-0<br>
<br>
38: end-volume<br>
<br>
39:<br>
<br>
40: volume vol0-write-behind<br>
<br>
41: type performance/write-behind<br>
<br>
42: subvolumes vol0-dht<br>
<br>
43: end-volume<br>
<br>
44:<br>
<br>
45: volume vol0-readdir-ahead<br>
<br>
46: type performance/readdir-ahead<br>
<br>
47: subvolumes vol0-write-behind<br>
<br>
48: end-volume<br>
<br>
49:<br>
<br>
50: volume vol0-open-behind<br>
<br>
51: type performance/open-behind<br>
<br>
52: subvolumes vol0-readdir-ahead<br>
<br>
53: end-volume<br>
<br>
54:<br>
<br>
55: volume vol0<br>
<br>
56: type debug/io-stats<br>
<br>
57: option log-level INFO<br>
<br>
58: option latency-measurement off<br>
<br>
59: option count-fop-hits off<br>
<br>
60: subvolumes vol0-open-behind<br>
<br>
61: end-volume<br>
<br>
62:<br>
<br>
63: volume meta-autoload<br>
<br>
64: type meta<br>
<br>
65: subvolumes vol0<br>
<br>
66: end-volume<br>
<br>
67:<br>
<br>
+------------------------------------------------------------------------------+<br>
<br>
[2019-05-13 09:49:43.881243] I [MSGID: 114057] [client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)<br>
<br>
[2019-05-13 09:49:43.881434] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-vol0-client-1: changing port to 49154 (from 0)<br>
<br>
[2019-05-13 09:49:43.881906] I [MSGID: 114057] [client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)<br>
<br>
[2019-05-13 09:49:43.882213] I [MSGID: 114046] [client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-1: Connected to vol0-client-1, attached to remote volume '/var/lib/glusterfs/data01/brick1/vol0'.<br>
<br>
[2019-05-13 09:49:43.882222] I [MSGID: 114047] [client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-1: Server and Client lk-version numbers are not same, reopening the fds<br>
<br>
[2019-05-13 09:49:43.882249] I [MSGID: 108005] [afr-common.c:4382:afr_notify] 0-vol0-replicate-0: Subvolume 'vol0-client-1' came back up; going online.<br>
<br>
[2019-05-13 09:49:43.882360] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-1: Server lk version = 1<br>
<br>
[2019-05-13 09:49:43.886625] I [MSGID: 114046] [client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-0: Connected to vol0-client-0, attached to remote volume '/var/lib/glusterfs/data01/brick1/vol0'.<br>
<br>
[2019-05-13 09:49:43.886633] I [MSGID: 114047] [client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-0: Server and Client lk-version numbers are not same, reopening the fds<br>
<br>
[2019-05-13 09:49:43.890995] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-0: Server lk version = 1<br>
<br>
[2019-05-13 09:49:43.891049] I [fuse-bridge.c:4153:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26<br>
<br>
[2019-05-13 09:49:43.891067] I [fuse-bridge.c:4838:fuse_graph_sync] 0-fuse: switched to graph 0<br>
<br>
[2019-05-13 09:49:43.891625] I [MSGID: 108031] [afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_child vol0-client-0<br>
<br>
[2019-05-13 10:20:38.998246] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-vol0-client-1: server 10.10.12.32:49154 has not responded in the last 5 seconds, disconnecting.<br>
<br>
[2019-05-13 10:20:38.998657] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f69df41fe83] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7f69df1e7b61] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f69df1e7c7e]
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7f69df1e92e9] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f69df1e9bb4] ))))) 0-vol0-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27))
called at 2019-05-13 10:20:33.237111 (xid=0x492)<br>
<br>
[2019-05-13 10:20:38.998681] W [MSGID: 114031] [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-vol0-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected]<br>
<br>
[2019-05-13 10:20:38.998829] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f69df41fe83] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7f69df1e7b61] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f69df1e7c7e]
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7f69df1e92e9] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f69df1e9bb4] ))))) 0-vol0-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called
at 2019-05-13 10:20:33.237115 (xid=0x493)<br>
<br>
[2019-05-13 10:20:38.998843] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 0-vol0-client-1: socket disconnected<br>
<br>
[2019-05-13 10:20:38.998854] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol0-client-1: disconnected from vol0-client-1. Client process will keep trying to connect to glusterd until brick's port is available<br>
<br>
[2019-05-13 10:20:43.355917] I [MSGID: 108031] [afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_child vol0-client-0<br>
<br>
[2019-05-13 10:21:20.850030] E [socket.c:2309:socket_connect_finish] 0-vol0-client-1: connection to 10.10.12.32:24007 failed (No route to host)<br>
<br>
[2019-05-13 10:22:07.026615] E [MSGID: 114058] [client-handshake.c:1534:client_query_portmap_cbk] 0-vol0-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.<br>
<br>
[2019-05-13 10:22:07.026663] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol0-client-1: disconnected from vol0-client-1. Client process will keep trying to connect to glusterd until brick's port is available<br>
<br>
[2019-05-13 10:22:10.010421] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-vol0-client-1: changing port to 49154 (from 0)<br>
<br>
[2019-05-13 10:22:10.011105] I [MSGID: 114057] [client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)<br>
<br>
[2019-05-13 10:22:10.011558] I [MSGID: 114046] [client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-1: Connected to vol0-client-1, attached to remote volume '/var/lib/glusterfs/data01/brick1/vol0'.<br>
<br>
[2019-05-13 10:22:10.011609] I [MSGID: 114047] [client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-1: Server and Client lk-version numbers are not same, reopening the fds<br>
<br>
[2019-05-13 10:22:10.011622] I [MSGID: 114042] [client-handshake.c:1054:client_post_handshake] 0-vol0-client-1: 2 fds open - Delaying child_up until they are re-opened<br>
<br>
[2019-05-13 10:22:10.032258] I [MSGID: 114041] [client-handshake.c:676:client_child_up_reopen_done] 0-vol0-client-1: last fd open'd/lock-self-heal'd - notifying CHILD-UP<br>
<br>
[2019-05-13 10:22:10.032492] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-1: Server lk version = 1<br>
<br>
[2019-05-13 10:22:13.790586] I [MSGID: 108031] [afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_child vol0-client-0<br>
<br>
[2019-05-13 11:12:57.300347] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]<br>
<br>
[2019-05-13 11:12:57.305284] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 4 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain)<br>
<br>
[2019-05-13 11:12:57.305712] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]<br>
<br>
[2019-05-13 11:12:57.306277] W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null)<br>
<br>
[2019-05-13 11:12:57.306938] I [MSGID: 114024] [client-helpers.c:99:this_fd_set_ctx] 0-vol0-client-0: /images/103/vm-103-disk-0.qcow2 (5f9490a8-ec56-410e-9c70-653e0da77174): trying duplicate remote fd set.<br>
<br>
[2019-05-13 11:12:57.306973] I [MSGID: 114024] [client-helpers.c:99:this_fd_set_ctx] 0-vol0-client-1: /images/103/vm-103-disk-0.qcow2 (5f9490a8-ec56-410e-9c70-653e0da77174): trying duplicate remote fd set.<br>
<br>
[2019-05-13 11:12:57.310052] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 2698: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f69d1cba184 (Input/output error)<br>
<br>
[2019-05-13 11:12:57.310137] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 2697: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f69d1cba184 (Input/output error)<br>
<br>
[2019-05-13 11:12:57.311543] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 2699: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f69d1cba184 (Input/output error)<br>
<br>
The message "E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]" repeated 2 times between [2019-05-13 11:12:57.305712]
and [2019-05-13 11:12:57.310816]<br>
<br>
The message "W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null)" repeated 2 times between [2019-05-13 11:12:57.306277] and [2019-05-13 11:12:57.311184]<br>
<br>
The message "W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 4 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain)" repeated 6 times between [2019-05-13 11:12:57.305284]
and [2019-05-13 11:12:57.311274]<br>
<br>
The message "E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]" repeated 5 times between [2019-05-13 11:12:57.300347] and
[2019-05-13 11:12:57.311531]<br>
<br>
<br>
<br>
Server 2: <br>
[2019-05-13 04:25:01.338790] I [MSGID: 100011] [glusterfsd.c:1396:reincarnate] 0-glusterfsd: Fetching the volume file from server...<br>
<br>
[2019-05-13 09:47:59.443328] E [socket.c:2309:socket_connect_finish] 0-glusterfs: connection to 10.10.12.31:24007 failed (Connection refused)<br>
<br>
[2019-05-13 09:48:17.426580] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-vol0-client-0: server 10.10.12.31:49155 has not responded in the last 5 seconds, disconnecting.<br>
<br>
[2019-05-13 09:48:17.426872] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7efebd3f9e83] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7efebd1c1b61] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7efebd1c1c7e]
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7efebd1c32e9] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7efebd1c3bb4] ))))) 0-vol0-client-0: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27))
called at 2019-05-13 09:48:12.180579 (xid=0x5663a4)<br>
<br>
[2019-05-13 09:48:17.426899] W [MSGID: 114031] [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-vol0-client-0: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected]<br>
<br>
[2019-05-13 09:48:17.427056] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7efebd3f9e83] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7efebd1c1b61] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7efebd1c1c7e]
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7efebd1c32e9] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7efebd1c3bb4] ))))) 0-vol0-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) called
at 2019-05-13 09:48:12.180591 (xid=0x5663a5)<br>
<br>
[2019-05-13 09:48:17.427067] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 0-vol0-client-0: socket disconnected<br>
<br>
[2019-05-13 09:48:17.427077] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol0-client-0: disconnected from vol0-client-0. Client process will keep trying to connect to glusterd until brick's port is available<br>
<br>
[2019-05-13 09:48:21.479100] I [MSGID: 108031] [afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_child vol0-client-1<br>
<br>
[2019-05-13 09:48:59.219302] E [socket.c:2309:socket_connect_finish] 0-vol0-client-0: connection to 10.10.12.31:24007 failed (No route to host)<br>
<br>
[2019-05-13 09:49:41.468469] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing<br>
<br>
[2019-05-13 09:49:42.505174] E [MSGID: 114058] [client-handshake.c:1534:client_query_portmap_cbk] 0-vol0-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.<br>
<br>
[2019-05-13 09:49:42.505225] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol0-client-0: disconnected from vol0-client-0. Client process will keep trying to connect to glusterd until brick's port is available<br>
<br>
[2019-05-13 09:49:45.442003] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-vol0-client-0: changing port to 49155 (from 0)<br>
<br>
[2019-05-13 09:49:45.442523] I [MSGID: 114057] [client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)<br>
<br>
[2019-05-13 09:49:45.442802] I [MSGID: 114046] [client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-0: Connected to vol0-client-0, attached to remote volume '/var/lib/glusterfs/data01/brick1/vol0'.<br>
<br>
[2019-05-13 09:49:45.442812] I [MSGID: 114047] [client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-0: Server and Client lk-version numbers are not same, reopening the fds<br>
<br>
[2019-05-13 09:49:45.442820] I [MSGID: 114042] [client-handshake.c:1054:client_post_handshake] 0-vol0-client-0: 2 fds open - Delaying child_up until they are re-opened<br>
<br>
[2019-05-13 09:49:45.443244] I [MSGID: 114041] [client-handshake.c:676:client_child_up_reopen_done] 0-vol0-client-0: last fd open'd/lock-self-heal'd - notifying CHILD-UP<br>
<br>
[2019-05-13 09:49:45.443353] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-0: Server lk version = 1<br>
<br>
[2019-05-13 09:49:49.622255] I [MSGID: 108031] [afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_child vol0-client-1<br>
<br>
[2019-05-13 10:20:06.060045] W [glusterfsd.c:1327:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494) [0x7efebc254494] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xf5) [0x55dba7a3b5e5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55dba7a3b444]
) 0-: received signum (15), shutting down<br>
<br>
[2019-05-13 10:20:06.068969] I [fuse-bridge.c:5794:fini] 0-fuse: Unmounting '/mnt/pve/GlusterVol01'.<br>
<br>
[2019-05-13 10:20:06.103235] I [fuse-bridge.c:5086:fuse_thread_proc] 0-fuse: unmounting /mnt/pve/GlusterVol01<br>
<br>
[2019-05-13 10:22:08.842734] I [MSGID: 100030] [glusterfsd.c:2454:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.8 (args: /usr/sbin/glusterfs --volfile-server=10.10.12.31 --volfile-id=vol0 /mnt/pve/GlusterVol01)<br>
<br>
[2019-05-13 10:22:08.853935] I [MSGID: 101190] [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1<br>
<br>
[2019-05-13 10:22:08.944855] W [MSGID: 108003] [afr.c:102:fix_quorum_options] 0-vol0-replicate-0: quorum-type none overriding quorum-count 1<br>
<br>
[2019-05-13 10:22:08.946502] I [MSGID: 101190] [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2<br>
<br>
[2019-05-13 10:22:08.972020] I [MSGID: 114020] [client.c:2356:notify] 0-vol0-client-0: parent translators are ready, attempting connect on transport<br>
<br>
[2019-05-13 10:22:08.972395] I [MSGID: 114020] [client.c:2356:notify] 0-vol0-client-1: parent translators are ready, attempting connect on transport<br>
<br>
<br>
[2019-05-13 10:22:08.972832] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-vol0-client-0: changing port to 49155 (from 0)<br>
<br>
[2019-05-13 10:22:08.973142] I [MSGID: 114057] [client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)<br>
<br>
[2019-05-13 10:22:08.973231] I [MSGID: 114057] [client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)<br>
<br>
[2019-05-13 10:22:08.973544] I [MSGID: 114046] [client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-1: Connected to vol0-client-1, attached to remote volume '/var/lib/glusterfs/data01/brick1/vol0'.<br>
<br>
[2019-05-13 10:22:08.973544] I [MSGID: 114046] [client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-0: Connected to vol0-client-0, attached to remote volume '/var/lib/glusterfs/data01/brick1/vol0'.<br>
<br>
[2019-05-13 10:22:08.973566] I [MSGID: 114047] [client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-0: Server and Client lk-version numbers are not same, reopening the fds<br>
<br>
[2019-05-13 10:22:08.973567] I [MSGID: 114047] [client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-1: Server and Client lk-version numbers are not same, reopening the fds<br>
<br>
[2019-05-13 10:22:08.973616] I [MSGID: 108005] [afr-common.c:4382:afr_notify] 0-vol0-replicate-0: Subvolume 'vol0-client-1' came back up; going online.<br>
<br>
[2019-05-13 10:22:08.973639] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-1: Server lk version = 1<br>
<br>
[2019-05-13 10:22:08.977940] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-0: Server lk version = 1<br>
<br>
[2019-05-13 10:22:08.978055] I [fuse-bridge.c:4153:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26<br>
<br>
[2019-05-13 10:22:08.978075] I [fuse-bridge.c:4838:fuse_graph_sync] 0-fuse: switched to graph 0<br>
<br>
[2019-05-13 10:22:08.978603] I [MSGID: 108031] [afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_child vol0-client-1<br>
<br>
[2019-05-13 10:53:46.573894] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]<br>
<br>
[2019-05-13 10:53:46.573992] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain)<br>
<br>
[2019-05-13 10:53:46.574253] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]<br>
<br>
[2019-05-13 10:53:46.574949] W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null)<br>
<br>
[2019-05-13 10:53:46.575526] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1380: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f649c00e06c (Input/output error)<br>
<br>
[2019-05-13 10:53:46.577820] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1381: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f649c00e06c (Input/output error)<br>
<br>
[2019-05-13 10:53:46.596838] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. [Input/output error]<br>
<br>
[2019-05-13 10:53:46.597759] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. (Possible split-brain)<br>
<br>
[2019-05-13 10:53:46.598916] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. [Input/output error]<br>
<br>
The message "W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null)" repeated 2 times between [2019-05-13 10:53:46.574949] and [2019-05-13 10:53:46.599257]<br>
<br>
[2019-05-13 10:53:46.599525] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. (Possible split-brain)<br>
<br>
[2019-05-13 10:53:46.599797] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. [Input/output error]<br>
<br>
[2019-05-13 10:53:46.599825] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1389: READ => -1 gfid=609bb8be-3ae8-470d-9f88-2b65095fbed4 fd=0x7f649c00e06c (Input/output error)<br>
<br>
[2019-05-13 10:53:46.599876] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. (Possible split-brain)<br>
<br>
[2019-05-13 10:53:46.600149] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. [Input/output error]<br>
<br>
[2019-05-13 10:53:46.600193] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. (Possible split-brain)<br>
<br>
[2019-05-13 10:53:46.600417] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. [Input/output error]<br>
<br>
[2019-05-13 10:53:46.600775] W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null)<br>
<br>
[2019-05-13 10:53:46.601071] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. (Possible split-brain)<br>
<br>
[2019-05-13 10:53:46.601537] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. [Input/output error]<br>
<br>
[2019-05-13 10:53:46.601577] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1390: READ => -1 gfid=609bb8be-3ae8-470d-9f88-2b65095fbed4 fd=0x7f649c00e06c (Input/output error)<br>
<br>
[2019-05-13 10:53:46.619830] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. [Input/output error]<br>
<br>
[2019-05-13 10:53:46.620701] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 79423c92-0338-4dc9-bafc-091172e8d845. (Possible split-brain)<br>
<br>
[2019-05-13 10:53:46.621098] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. [Input/output error]<br>
<br>
[2019-05-13 10:53:46.621455] W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null)<br>
<br>
[2019-05-13 10:53:46.621732] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 79423c92-0338-4dc9-bafc-091172e8d845. (Possible split-brain)<br>
<br>
<br>
[2019-05-13 10:53:46.623509] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. [Input/output error]<br>
<br>
<br>
[2019-05-13 10:53:46.624891] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. [Input/output error]<br>
<br>
[2019-05-13 10:53:46.625212] W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null)<br>
<br>
[2019-05-13 10:53:46.625314] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 79423c92-0338-4dc9-bafc-091172e8d845. (Possible split-brain)<br>
<br>
[2019-05-13 10:53:46.625721] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. [Input/output error]<br>
<br>
[2019-05-13 10:53:46.625754] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1399: READ => -1 gfid=79423c92-0338-4dc9-bafc-091172e8d845 fd=0x7f649c00e06c (Input/output error)<br>
<br>
[2019-05-13 10:53:46.576286] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]<br>
<br>
<br>
[2019-05-13 10:56:28.176786] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]<br>
<br>
[2019-05-13 10:56:28.177684] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain)<br>
<br>
[2019-05-13 10:56:28.178782] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]<br>
<br>
[2019-05-13 10:56:28.179128] W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null)<br>
<br>
[2019-05-13 10:56:28.180634] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1533: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f649c00e06c (Input/output error)<br>
<br>
[2019-05-13 10:56:28.179439] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain)<br>
<br>
[2019-05-13 10:56:28.180620] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]<br>
<br>
[2019-05-13 10:59:25.278595] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]<br>
<br>
[2019-05-13 10:59:25.279517] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain)<br>
<br>
[2019-05-13 10:59:25.280605] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]<br>
<br>
<br>
[2019-05-13 10:59:25.281649] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1685: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f649c00e06c (Input/output error)<br>
<br>
[2019-05-13 10:59:25.281250] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain)<br>
-------------------------------------------------<br>
<br>
<br>
What we can't explain is why server 1 does the following:<br>
<br>
[2019-05-13 09:47:48.277650] W [socket.c:590:__socket_rwv] 0-glusterfs: readv on 10.10.12.31:24007 failed (No data available)<br>
<br>
[2019-05-13 09:47:48.277696] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.10.12.31 (No data available)<br>
<br>
[2019-05-13 09:47:48.277704] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers<br>
<br>
<br>
then the volume will be unmounted and re-mounted with another port again.<br>
In further consequence server 2 behaves exactly like this which consequences in a a split-brain condition of the disk files of the VMs.<br>
<br>
we would be glad if someone could explain these behaviors to us.<br>
<br>
BR<br>
René<o:p></o:p></span></p>
</div>
</body>
</html>