[Gluster-users] Freezing during heal

Sun Apr 17 15:56:37 UTC 2016

Could you share the client logs and information about the approx time/day
when you saw this issue?

-Krutika

On Sat, Apr 16, 2016 at 12:57 AM, Kevin Lemonnier <lemonnierk at ulrar.net>
wrote:

> Hi,
>
> We have a small glusterFS 3.7.6 cluster with 3 nodes running with proxmox
> VM's on it. I did set up the different recommended option like the virt
> group, but
> by hand since it's on debian. The shards are 256MB, if that matters.
>
> This morning the second node crashed, and as it came back up started a
> heal, but that basically froze all the VM's running on that volume. Since
> we really really
> can't have 40 minutes down time in the middle of the day, I just removed
> the node from the network and that stopped the heal, allowing the VM's to
> access
> their disks again. The plan was to re-connecte the node in a couple of
> hours to let it heal at night.
> But a VM crashed now, and it can't boot up again : seems to freez trying
> to access the disks.
>
> Looking at the heal info for the volume, it has gone way up since this
> morning, it looks like the VM's aren't writing to both nodes, just the one
> they are on.
> It seems pretty bad, we have 2 nodes on 3 up, I would expect the volume to
> work just fine since it has quorum. What am I missing ?
>
> It is still too early to start the heal, is there a way to start the VM
> anyway right now ? I mean, it was running a moment ago so the data is
> there, it just needs
> to let the VM access it.
>
>
>
> Volume Name: vm-storage
> Type: Replicate
> Volume ID: a5b19324-f032-4136-aaac-5e9a4c88aaef
> Status: Started
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: first_node:/mnt/vg1-storage
> Brick2: second_node:/mnt/vg1-storage
> Brick3: third_node:/mnt/vg1-storage
> Options Reconfigured:
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.readdir-ahead: on
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> features.shard: on
> features.shard-block-size: 256MB
> cluster.server-quorum-ratio: 51%
>
>
> Thanks for your help
>
> --
> Kevin Lemonnier
> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160417/dfacb4c5/attachment.html>