<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;" dir="ltr">
<p>Okay so it's fixed by killing Gluster and rebooting the node again.</p>
<p><br>
</p>
<div id="Signature"><br>
<div class="ecxmoz-signature">-- <br>
<br>
<font color="#3366ff"><font color="#000000">Respectfully<b><br>
</b><b>Mahdi A. Mahdi</b></font></font><font color="#3366ff"><br>
<br>
</font><font color="#3366ff"></font></div>
</div>
</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> gluster-users-bounces@gluster.org &lt;gluster-users-bounces@gluster.org&gt; on behalf of Mahdi Adnan &lt;mahdi.adnan@outlook.com&gt;<br>
<b>Sent:</b> Wednesday, May 3, 2017 10:15:45 AM<br>
<b>To:</b> gluster-users@gluster.org<br>
<b>Subject:</b> [Gluster-users] Gluster long healing process</font>
<div>&nbsp;</div>
</div>
<div>
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;" dir="ltr">
<p>Hi,</p>
<p><br>
</p>
<p>I have a 4 node Gluster volume, each has 24 SSD brick running Gluster 3.8.10 (two volumes), i updated one of the nodes to 3.8.11 and rebooted the node, after it came back online the healing process started and it never ended.</p>
<p>It has been 24 hours and the healing is still going, gluster vol heal $VOL info return number of entries that need healing and it decrees and increase randomly.</p>
<p>The node is writing lots of Gigabytes and i dont know if this is normal or something im missing.</p>
<p>Volume details;</p>
<p><br>
</p>
<p></p>
<div>Volume Name: ovirt_imgs</div>
<div>Type: Distributed-Replicate</div>
<div>Volume ID: 40d1354b-8e85-4464-8c71-9e2efbe10a63</div>
<div>Status: Started</div>
<div>Snapshot Count: 0</div>
<div>Number of Bricks: 26 x 2 = 52</div>
<div>Transport-type: tcp</div>
<div>Bricks:</div>
<div>Brick1: gluster01:/mnt/ovirt_disk1/ovirt_imgs</div>
<div>Brick2: gluster03:/mnt/ovirt_disk1/ovirt_imgs</div>
<div>Brick3: gluster02:/mnt/ovirt_disk1/ovirt_imgs</div>
<div>Brick4: gluster04:/mnt/ovirt_disk1/ovirt_imgs</div>
<div>Brick5: gluster01:/mnt/ovirt_disk2/ovirt_imgs</div>
<div>Brick6: gluster03:/mnt/ovirt_disk2/ovirt_imgs</div>
<div>Brick7: gluster02:/mnt/ovirt_disk2/ovirt_imgs</div>
<div>Brick8: gluster04:/mnt/ovirt_disk2/ovirt_imgs</div>
<div>Brick9: gluster01:/mnt/ovirt_disk3/ovirt_imgs</div>
<div>Brick10: gluster03:/mnt/ovirt_disk3/ovirt_imgs</div>
<div>Brick11: gluster02:/mnt/ovirt_disk3/ovirt_imgs</div>
<div>Brick12: gluster04:/mnt/ovirt_disk3/ovirt_imgs</div>
<div>Brick13: gluster01:/mnt/ovirt_disk4/ovirt_imgs</div>
<div>Brick14: gluster03:/mnt/ovirt_disk4/ovirt_imgs</div>
<div>Brick15: gluster02:/mnt/ovirt_disk4/ovirt_imgs</div>
<div>Brick16: gluster04:/mnt/ovirt_disk4/ovirt_imgs</div>
<div>Brick17: gluster01:/mnt/ovirt_disk5/ovirt_imgs</div>
<div>Brick18: gluster03:/mnt/ovirt_disk5/ovirt_imgs</div>
<div>Brick19: gluster02:/mnt/ovirt_disk5/ovirt_imgs</div>
<div>Brick20: gluster04:/mnt/ovirt_disk5/ovirt_imgs</div>
<div>Brick21: gluster01:/mnt/ovirt_disk6/ovirt_imgs</div>
<div>Brick22: gluster03:/mnt/ovirt_disk6/ovirt_imgs</div>
<div>Brick23: gluster02:/mnt/ovirt_disk6/ovirt_imgs</div>
<div>Brick24: gluster04:/mnt/ovirt_disk6/ovirt_imgs</div>
<div>Brick25: gluster01:/mnt/ovirt_disk7/ovirt_imgs</div>
<div>Brick26: gluster03:/mnt/ovirt_disk7/ovirt_imgs</div>
<div>Brick27: gluster02:/mnt/ovirt_disk7/ovirt_imgs</div>
<div>Brick28: gluster04:/mnt/ovirt_disk7/ovirt_imgs</div>
<div>Brick29: gluster01:/mnt/ovirt_disk8/ovirt_imgs</div>
<div>Brick30: gluster03:/mnt/ovirt_disk8/ovirt_imgs</div>
<div>Brick31: gluster02:/mnt/ovirt_disk8/ovirt_imgs</div>
<div>Brick32: gluster04:/mnt/ovirt_disk8/ovirt_imgs</div>
<div>Brick33: gluster01:/mnt/ovirt_disk9/ovirt_imgs</div>
<div>Brick34: gluster03:/mnt/ovirt_disk9/ovirt_imgs</div>
<div>Brick35: gluster02:/mnt/ovirt_disk9/ovirt_imgs</div>
<div>Brick36: gluster04:/mnt/ovirt_disk9/ovirt_imgs</div>
<div>Brick37: gluster01:/mnt/ovirt_disk10/ovirt_imgs</div>
<div>Brick38: gluster03:/mnt/ovirt_disk10/ovirt_imgs</div>
<div>Brick39: gluster02:/mnt/ovirt_disk10/ovirt_imgs</div>
<div>Brick40: gluster04:/mnt/ovirt_disk10/ovirt_imgs</div>
<div>Brick41: gluster01:/mnt/ovirt_disk11/ovirt_imgs</div>
<div>Brick42: gluster03:/mnt/ovirt_disk11/ovirt_imgs</div>
<div>Brick43: gluster02:/mnt/ovirt_disk11/ovirt_imgs</div>
<div>Brick44: gluster04:/mnt/ovirt_disk11/ovirt_imgs</div>
<div>Brick45: gluster01:/mnt/ovirt_disk12/ovirt_imgs</div>
<div>Brick46: gluster03:/mnt/ovirt_disk12/ovirt_imgs</div>
<div>Brick47: gluster02:/mnt/ovirt_disk12/ovirt_imgs</div>
<div>Brick48: gluster04:/mnt/ovirt_disk12/ovirt_imgs</div>
<div>Brick49: gluster01:/mnt/ovirt_disk13/ovirt_imgs</div>
<div>Brick50: gluster03:/mnt/ovirt_disk13/ovirt_imgs</div>
<div>Brick51: gluster02:/mnt/ovirt_disk13/ovirt_imgs</div>
<div>Brick52: gluster04:/mnt/ovirt_disk13/ovirt_imgs</div>
<div>Options Reconfigured:</div>
<div>ganesha.enable: off</div>
<div>features.cache-invalidation: off</div>
<div>features.shard-block-size: 256MB</div>
<div>storage.owner-gid: 36</div>
<div>storage.owner-uid: 36</div>
<div>user.cifs: off</div>
<div>features.shard: on</div>
<div>cluster.shd-wait-qlength: 10000</div>
<div>cluster.shd-max-threads: 8</div>
<div>cluster.locking-scheme: granular</div>
<div>cluster.data-self-heal-algorithm: full</div>
<div>cluster.server-quorum-type: server</div>
<div>cluster.quorum-type: none</div>
<div>cluster.eager-lock: enable</div>
<div>network.remote-dio: enable</div>
<div>performance.low-prio-threads: 32</div>
<div>performance.stat-prefetch: off</div>
<div>performance.io-cache: off</div>
<div>performance.read-ahead: off</div>
<div>performance.quick-read: off</div>
<div>transport.address-family: inet</div>
<div>performance.readdir-ahead: on</div>
<div>nfs.disable: on</div>
<div>cluster.server-quorum-ratio: 51%</div>
<div>nfs-ganesha: enable</div>
<div>cluster.enable-shared-storage: enable</div>
<br>
<p></p>
<p><br>
</p>
<p>OS: Centos 7.3 latest.</p>
<p><br>
</p>
<p><br>
</p>
<p>gluster heal log sample;</p>
<p><br>
</p>
<p></p>
<div>[2017-05-03 07:01:29.487108] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-ovirt_imgs-client-45: changing port to 49571 (from 0)</div>
<div>[2017-05-03 07:01:29.489004] I [MSGID: 114020] [client.c:2356:notify] 0-ovirt_imgs-client-47: parent translators are ready, attempting connect on transport</div>
<div>[2017-05-03 07:01:29.491077] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-44: Connected to ovirt_imgs-client-44, attached to remote volume '/mnt/ovirt_disk12/ovirt_imgs'.</div>
<div>[2017-05-03 07:01:29.491092] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-44: Server and Client lk-version numbers are not same, reopening the fds</div>
<div>[2017-05-03 07:01:29.491123] I [MSGID: 108005] [afr-common.c:4387:afr_notify] 0-ovirt_imgs-replicate-22: Subvolume 'ovirt_imgs-client-44' came back up; going online.</div>
<div>[2017-05-03 07:01:29.491173] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-44: Server lk version = 1</div>
<div>[2017-05-03 07:01:29.491280] I [MSGID: 114057] [client-handshake.c:1440:select_server_supported_programs] 0-ovirt_imgs-client-45: Using Program GlusterFS 3.3, Num (1298437), Version (330)</div>
<div>[2017-05-03 07:01:29.491331] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-ovirt_imgs-client-46: changing port to 49521 (from 0)</div>
<div>[2017-05-03 07:01:29.493119] I [MSGID: 114020] [client.c:2356:notify] 0-ovirt_imgs-client-48: parent translators are ready, attempting connect on transport</div>
<div>[2017-05-03 07:01:29.495480] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-45: Connected to ovirt_imgs-client-45, attached to remote volume '/mnt/ovirt_disk12/ovirt_imgs'.</div>
<div>[2017-05-03 07:01:29.495496] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-45: Server and Client lk-version numbers are not same, reopening the fds</div>
<div>[2017-05-03 07:01:29.495670] I [MSGID: 114057] [client-handshake.c:1440:select_server_supported_programs] 0-ovirt_imgs-client-46: Using Program GlusterFS 3.3, Num (1298437), Version (330)</div>
<div>[2017-05-03 07:01:29.495729] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-45: Server lk version = 1</div>
<div>[2017-05-03 07:01:29.495798] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-ovirt_imgs-client-47: changing port to 49465 (from 0)</div>
<div>[2017-05-03 07:01:29.497438] I [MSGID: 114020] [client.c:2356:notify] 0-ovirt_imgs-client-49: parent translators are ready, attempting connect on transport</div>
<div>[2017-05-03 07:01:29.499871] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-46: Connected to ovirt_imgs-client-46, attached to remote volume '/mnt/ovirt_disk12/ovirt_imgs'.</div>
<div>[2017-05-03 07:01:29.499887] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-46: Server and Client lk-version numbers are not same, reopening the fds</div>
<div>[2017-05-03 07:01:29.499915] I [MSGID: 108005] [afr-common.c:4387:afr_notify] 0-ovirt_imgs-replicate-23: Subvolume 'ovirt_imgs-client-46' came back up; going online.</div>
<div>[2017-05-03 07:01:29.500015] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-46: Server lk version = 1</div>
<div>[2017-05-03 07:01:29.500032] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-ovirt_imgs-client-48: changing port to 49645 (from 0)</div>
<div>[2017-05-03 07:01:29.500052] I [MSGID: 114057] [client-handshake.c:1440:select_server_supported_programs] 0-ovirt_imgs-client-47: Using Program GlusterFS 3.3, Num (1298437), Version (330)</div>
<div>[2017-05-03 07:01:29.501776] I [MSGID: 114020] [client.c:2356:notify] 0-ovirt_imgs-client-50: parent translators are ready, attempting connect on transport</div>
<div>[2017-05-03 07:01:29.504191] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-47: Connected to ovirt_imgs-client-47, attached to remote volume '/mnt/ovirt_disk12/ovirt_imgs'.</div>
<div>[2017-05-03 07:01:29.504208] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-47: Server and Client lk-version numbers are not same, reopening the fds</div>
<div>[2017-05-03 07:01:29.504313] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-47: Server lk version = 1</div>
<div>[2017-05-03 07:01:29.504330] I [MSGID: 114057] [client-handshake.c:1440:select_server_supported_programs] 0-ovirt_imgs-client-48: Using Program GlusterFS 3.3, Num (1298437), Version (330)</div>
<div>[2017-05-03 07:01:29.504462] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-ovirt_imgs-client-49: changing port to 49572 (from 0)</div>
<div>[2017-05-03 07:01:29.506374] I [MSGID: 114020] [client.c:2356:notify] 0-ovirt_imgs-client-51: parent translators are ready, attempting connect on transport</div>
<div>[2017-05-03 07:01:29.508431] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-48: Connected to ovirt_imgs-client-48, attached to remote volume '/mnt/ovirt_disk13/ovirt_imgs'.</div>
<div>[2017-05-03 07:01:29.508456] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-48: Server and Client lk-version numbers are not same, reopening the fds</div>
<div>[2017-05-03 07:01:29.508498] I [MSGID: 108005] [afr-common.c:4387:afr_notify] 0-ovirt_imgs-replicate-24: Subvolume 'ovirt_imgs-client-48' came back up; going online.</div>
<div>[2017-05-03 07:01:29.508556] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-48: Server lk version = 1</div>
<div>[2017-05-03 07:01:29.508603] I [MSGID: 114057] [client-handshake.c:1440:select_server_supported_programs] 0-ovirt_imgs-client-49: Using Program GlusterFS 3.3, Num (1298437), Version (330)</div>
<div>[2017-05-03 07:01:29.508725] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-ovirt_imgs-client-50: changing port to 49522 (from 0)</div>
<div>[2017-05-03 07:01:29.510779] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-49: Connected to ovirt_imgs-client-49, attached to remote volume '/mnt/ovirt_disk13/ovirt_imgs'.</div>
<div>[2017-05-03 07:01:29.510796] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-49: Server and Client lk-version numbers are not same, reopening the fds</div>
<div>[2017-05-03 07:01:29.510903] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-49: Server lk version = 1</div>
<div>[2017-05-03 07:01:29.511062] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-ovirt_imgs-client-51: changing port to 49466 (from 0)</div>
<div>[2017-05-03 07:01:29.512828] I [MSGID: 114057] [client-handshake.c:1440:select_server_supported_programs] 0-ovirt_imgs-client-50: Using Program GlusterFS 3.3, Num (1298437), Version (330)</div>
<div>[2017-05-03 07:01:29.513197] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-50: Connected to ovirt_imgs-client-50, attached to remote volume '/mnt/ovirt_disk13/ovirt_imgs'.</div>
<div>[2017-05-03 07:01:29.513214] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-50: Server and Client lk-version numbers are not same, reopening the fds</div>
<div>[2017-05-03 07:01:29.513236] I [MSGID: 108005] [afr-common.c:4387:afr_notify] 0-ovirt_imgs-replicate-25: Subvolume 'ovirt_imgs-client-50' came back up; going online.</div>
<div>[2017-05-03 07:01:29.513314] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-50: Server lk version = 1</div>
<div>[2017-05-03 07:01:29.515127] I [MSGID: 114057] [client-handshake.c:1440:select_server_supported_programs] 0-ovirt_imgs-client-51: Using Program GlusterFS 3.3, Num (1298437), Version (330)</div>
<div>[2017-05-03 07:01:29.515520] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-51: Connected to ovirt_imgs-client-51, attached to remote volume '/mnt/ovirt_disk13/ovirt_imgs'.</div>
<div>[2017-05-03 07:01:29.515530] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-51: Server and Client lk-version numbers are not same, reopening the fds</div>
<div>[2017-05-03 07:01:29.515628] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-51: Server lk version = 1</div>
<div>[2017-05-03 07:01:30.009624] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-40: Connected to ovirt_imgs-client-40, attached to remote volume '/mnt/ovirt_disk11/ovirt_imgs'.</div>
<div>[2017-05-03 07:01:30.009653] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-40: Server and Client lk-version numbers are not same, reopening the fds</div>
<div>[2017-05-03 07:01:30.234722] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-40: Server lk version = 1</div>
<div>[2017-05-03 07:01:30.235633] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-0: selecting local read_child ovirt_imgs-client-0</div>
<div>[2017-05-03 07:01:30.236983] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-2: selecting local read_child ovirt_imgs-client-4</div>
<div>[2017-05-03 07:01:30.237492] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-4: selecting local read_child ovirt_imgs-client-8</div>
<div>[2017-05-03 07:01:30.238310] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-6: selecting local read_child ovirt_imgs-client-12</div>
<div>[2017-05-03 07:01:30.238553] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-8: selecting local read_child ovirt_imgs-client-16</div>
<div>[2017-05-03 07:01:30.238670] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-10: selecting local read_child ovirt_imgs-client-20</div>
<div>[2017-05-03 07:01:30.238791] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-12: selecting local read_child ovirt_imgs-client-24</div>
<div>[2017-05-03 07:01:30.238881] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-14: selecting local read_child ovirt_imgs-client-28</div>
<div>[2017-05-03 07:01:30.238961] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-16: selecting local read_child ovirt_imgs-client-32</div>
<div>[2017-05-03 07:01:30.239014] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-18: selecting local read_child ovirt_imgs-client-36</div>
<div>[2017-05-03 07:01:30.239100] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-22: selecting local read_child ovirt_imgs-client-44</div>
<div>[2017-05-03 07:01:30.239140] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-20: selecting local read_child ovirt_imgs-client-40</div>
<div>[2017-05-03 07:01:30.239150] I [MSGID: 104041] [glfs-resolve.c:885:__glfs_active_subvol] 0-ovirt_imgs: switched to graph 676c7573-7465-7230-312d-31333836322d (0)</div>
<div>[2017-05-03 07:01:30.239200] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-24: selecting local read_child ovirt_imgs-client-48</div>
<br>
<p></p>
<p><br>
</p>
<p><br>
</p>
<p><br>
</p>
<p><br>
</p>
<p>i appreciate the help.</p>
<p><br>
</p>
<p>Thanks&nbsp;</p>
<div id="Signature"><br>
<div class="ecxmoz-signature">-- <br>
<br>
<font color="#3366ff"><font color="#000000">Respectfully<b><br>
</b><b>Mahdi A. Mahdi</b></font></font><font color="#3366ff"><br>
<br>
</font><font color="#3366ff"></font></div>
</div>
</div>
</div>
</body>
</html>