[Gluster-users] VM going down

Tue May 9 08:16:56 UTC 2017

On 05/09/2017 12:59 PM, Alessandro Briosi wrote:
> Il 08/05/2017 15:49, Alessandro Briosi ha scritto:
>> Il 08/05/2017 12:57, Jesper Led Lauridsen TS Infra server ha scritto:
>>>
>>> I dont know if this has any relation to you issue. But I have seen 
>>> several times during gluster healing that my wm’s fail or are marked 
>>> unresponsive in rhev. My conclusion is that the load gluster puts on 
>>> the wm-images during checksum while healing, result in to much 
>>> latency and wm’s fail.
>>>
>>> My plans is to try using sharding, so the wm-images/files are split 
>>> into smaller files, changing the number of allowed concurrent heals 
>>> ‘cluster.background-self-heal-count’ and disabling 
>>> ‘cluster.self-heal-daemon’.
>>>
>>
>> The thing is that there are no heal processes running, no log entries 
>> either.
>> Few days ago I had a failure and the heal process started and 
>> finished without any problems.
>>
>> I do not use sharding yet.
>
> Well, it happened again on a different volume and a different VM.
>
> This time a self heal process was started.
>
> Why is this happening? there are no network problems on the hosts and 
> they all do have bonded 2x1Gbit nics dedicated to gluster...
>
> Is there any information I can give you to find out what happened?
>
> This is the only mention about heal in the logs:
> [2017-05-08 17:34:40.474774] I [MSGID: 108026] 
> [afr-self-heal-common.c:1254:afr_log_selfheal] 
> 0-datastore1-replicate-0: Completed data selfheal on 
> bc8f6a7e-31e5-4b48-946c-f779a4b2e64f. sources=[1]  sinks=0 2
>
> The VM went down 1 1/2 hour before:
> [2017-05-08 15:54:11.781749] I [MSGID: 115036] 
> [server.c:548:server_rpc_notify] 0-datastore1-server: disconnecting 
> connection from 
> srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0
> [2017-05-08 15:54:11.781749] I [MSGID: 115036] 
> [server.c:548:server_rpc_notify] 0-datastore1-server: disconnecting 
> connection from 
> srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0
> [2017-05-08 15:54:11.781840] I [MSGID: 115013] 
> [server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd cleanup 
> on /images/101/vm-101-disk-2.qcow2
> [2017-05-08 15:54:11.781838] W [inodelk.c:399:pl_inodelk_log_cleanup] 
> 0-datastore1-server: releasing lock on 
> bc8f6a7e-31e5-4b48-946c-f779a4b2e64f held by {client=0x7ffa7c0051f0, 
> pid=0 lk-owner=5c600023827f0
> 000}
> [2017-05-08 15:54:11.781863] I [MSGID: 115013] 
> [server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd cleanup 
> on /images/101/vm-101-disk-1.qcow2
> [2017-05-08 15:54:11.781947] I [MSGID: 101055] 
> [client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting down 
> connection 
> srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0
> [2017-05-08 15:54:11.781971] I [MSGID: 101055] 
> [client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting down 
> connection 
> srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0
>
>
> Any hint would be greatly apreciated.

Can you share the log of the fuse mount 
(/var/log/glusterfs/<path-to-mout-point.log> on which the VM was 
running? When you say 'VM going down' , do you mean it paused/ became 
unresponsive?
-Ravi
>
> Alessandro
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170509/6fcd1430/attachment.html>