[Gluster-users] VM going down

Tue May 9 07:29:52 UTC 2017

Il 08/05/2017 15:49, Alessandro Briosi ha scritto:
> Il 08/05/2017 12:57, Jesper Led Lauridsen TS Infra server ha scritto:
>>
>> I dont know if this has any relation to you issue. But I have seen
>> several times during gluster healing that my wm’s fail or are marked
>> unresponsive in rhev. My conclusion is that the load gluster puts on
>> the wm-images during checksum while healing, result in to much
>> latency and wm’s fail.
>>
>>  
>>
>> My plans is to try using sharding, so the wm-images/files are split
>> into smaller files, changing the number of allowed concurrent heals
>> ‘cluster.background-self-heal-count’ and disabling
>> ‘cluster.self-heal-daemon’.
>>
>
> The thing is that there are no heal processes running, no log entries
> either.
> Few days ago I had a failure and the heal process started and finished
> without any problems.
>
> I do not use sharding yet.

Well, it happened again on a different volume and a different VM.

This time a self heal process was started.

Why is this happening? there are no network problems on the hosts and
they all do have bonded 2x1Gbit nics dedicated to gluster...

Is there any information I can give you to find out what happened?

This is the only mention about heal in the logs:
[2017-05-08 17:34:40.474774] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal] 0-datastore1-replicate-0:
Completed data selfheal on bc8f6a7e-31e5-4b48-946c-f779a4b2e64f.
sources=[1]  sinks=0 2

The VM went down 1 1/2 hour before:
[2017-05-08 15:54:11.781749] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-datastore1-server: disconnecting
connection from
srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0
[2017-05-08 15:54:11.781749] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-datastore1-server: disconnecting
connection from
srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0
[2017-05-08 15:54:11.781840] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd cleanup on
/images/101/vm-101-disk-2.qcow2
[2017-05-08 15:54:11.781838] W [inodelk.c:399:pl_inodelk_log_cleanup]
0-datastore1-server: releasing lock on
bc8f6a7e-31e5-4b48-946c-f779a4b2e64f held by {client=0x7ffa7c0051f0,
pid=0 lk-owner=5c600023827f0
000}
[2017-05-08 15:54:11.781863] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd cleanup on
/images/101/vm-101-disk-1.qcow2
[2017-05-08 15:54:11.781947] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting down
connection srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0
[2017-05-08 15:54:11.781971] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting down
connection srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0

Any hint would be greatly apreciated.

Alessandro

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170509/9ca00b84/attachment.html>