<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Il 08/05/2017 15:49, Alessandro Briosi
ha scritto:<br>
</div>
<blockquote
cite="mid:5a70610f-75ff-453b-355a-ae0a5037a9d2@metalit.com"
type="cite">
<div class="moz-cite-prefix">Il 08/05/2017 12:57, Jesper Led
Lauridsen TS Infra server ha scritto:<br>
</div>
<blockquote
cite="mid:DB6PR0202MB2568C7EFC5390C8DEADAFE0AB2EE0@DB6PR0202MB2568.eurprd02.prod.outlook.com"
type="cite">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;mso-fareast-language:EN-US"
lang="EN-US">I dont know if this has any relation to you
issue. But I have seen several times during gluster healing
that my wm’s fail or are marked unresponsive in rhev. My
conclusion is that the load gluster puts on the wm-images
during checksum while healing, result in to much latency and
wm’s fail.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;mso-fareast-language:EN-US"
lang="EN-US">My plans is to try using sharding, so the
wm-images/files are split into smaller files, changing the
number of allowed concurrent heals
‘cluster.background-self-heal-count’ and disabling
‘cluster.self-heal-daemon’.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;mso-fareast-language:EN-US"
lang="EN-US"><o:p></o:p></span></p>
</blockquote>
<br>
The thing is that there are no heal processes running, no log
entries either.<br>
Few days ago I had a failure and the heal process started and
finished without any problems.<br>
<br>
I do not use sharding yet.<br>
</blockquote>
<br>
Well, it happened again on a different volume and a different VM.<br>
<br>
This time a self heal process was started.<br>
<br>
Why is this happening? there are no network problems on the hosts
and they all do have bonded 2x1Gbit nics dedicated to gluster...<br>
<br>
Is there any information I can give you to find out what happened?<br>
<br>
This is the only mention about heal in the logs:<br>
[2017-05-08 17:34:40.474774] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal]
0-datastore1-replicate-0: Completed data selfheal on
bc8f6a7e-31e5-4b48-946c-f779a4b2e64f. sources=[1] sinks=0 2<br>
<br>
The VM went down 1 1/2 hour before:<br>
[2017-05-08 15:54:11.781749] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-datastore1-server: disconnecting
connection from
srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0<br>
[2017-05-08 15:54:11.781749] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-datastore1-server: disconnecting
connection from
srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0<br>
[2017-05-08 15:54:11.781840] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd cleanup
on /images/101/vm-101-disk-2.qcow2<br>
[2017-05-08 15:54:11.781838] W
[inodelk.c:399:pl_inodelk_log_cleanup] 0-datastore1-server:
releasing lock on bc8f6a7e-31e5-4b48-946c-f779a4b2e64f held by
{client=0x7ffa7c0051f0, pid=0 lk-owner=5c600023827f0<br>
000}<br>
[2017-05-08 15:54:11.781863] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd cleanup
on /images/101/vm-101-disk-1.qcow2<br>
[2017-05-08 15:54:11.781947] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting down
connection
srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0<br>
[2017-05-08 15:54:11.781971] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting down
connection
srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0<br>
<br>
<br>
Any hint would be greatly apreciated.<br>
<br>
Alessandro<br>
<br>
</body>
</html>