[Gluster-users] Freezing during heal

Mon Apr 18 06:58:28 UTC 2016

Sorry, I was referring to the glusterfs client logs.

Assuming you are using FUSE mount, your log file will be in
/var/log/glusterfs/<hyphenated-mount-point-path>.log

-Krutika

On Sun, Apr 17, 2016 at 9:37 PM, Kevin Lemonnier <lemonnierk at ulrar.net>
wrote:

> I believe Proxmox is just an interface to KVM that uses the lib, so if I'm
> not mistaken there isn't client logs ?
>
> It's not the first time I have the issue, it happens on every heal on the
> 2 clusters I have.
>
> I did let the heal finish that night and the VMs are working now, but it
> is pretty scarry for future crashes or brick replacement.
> Should I maybe lower the shard size ? Won't solve the fact that 2 bricks
> on 3 aren't keeping the filesystem usable but might make the healing
> quicker right ?
>
> Thanks
>
> Le 17 avril 2016 17:56:37 GMT+02:00, Krutika Dhananjay <
> kdhananj at redhat.com> a écrit :
> >Could you share the client logs and information about the approx
> >time/day
> >when you saw this issue?
> >
> >-Krutika
> >
> >On Sat, Apr 16, 2016 at 12:57 AM, Kevin Lemonnier
> ><lemonnierk at ulrar.net>
> >wrote:
> >
> >> Hi,
> >>
> >> We have a small glusterFS 3.7.6 cluster with 3 nodes running with
> >proxmox
> >> VM's on it. I did set up the different recommended option like the
> >virt
> >> group, but
> >> by hand since it's on debian. The shards are 256MB, if that matters.
> >>
> >> This morning the second node crashed, and as it came back up started
> >a
> >> heal, but that basically froze all the VM's running on that volume.
> >Since
> >> we really really
> >> can't have 40 minutes down time in the middle of the day, I just
> >removed
> >> the node from the network and that stopped the heal, allowing the
> >VM's to
> >> access
> >> their disks again. The plan was to re-connecte the node in a couple
> >of
> >> hours to let it heal at night.
> >> But a VM crashed now, and it can't boot up again : seems to freez
> >trying
> >> to access the disks.
> >>
> >> Looking at the heal info for the volume, it has gone way up since
> >this
> >> morning, it looks like the VM's aren't writing to both nodes, just
> >the one
> >> they are on.
> >> It seems pretty bad, we have 2 nodes on 3 up, I would expect the
> >volume to
> >> work just fine since it has quorum. What am I missing ?
> >>
> >> It is still too early to start the heal, is there a way to start the
> >VM
> >> anyway right now ? I mean, it was running a moment ago so the data is
> >> there, it just needs
> >> to let the VM access it.
> >>
> >>
> >>
> >> Volume Name: vm-storage
> >> Type: Replicate
> >> Volume ID: a5b19324-f032-4136-aaac-5e9a4c88aaef
> >> Status: Started
> >> Number of Bricks: 1 x 3 = 3
> >> Transport-type: tcp
> >> Bricks:
> >> Brick1: first_node:/mnt/vg1-storage
> >> Brick2: second_node:/mnt/vg1-storage
> >> Brick3: third_node:/mnt/vg1-storage
> >> Options Reconfigured:
> >> cluster.quorum-type: auto
> >> cluster.server-quorum-type: server
> >> network.remote-dio: enable
> >> cluster.eager-lock: enable
> >> performance.readdir-ahead: on
> >> performance.quick-read: off
> >> performance.read-ahead: off
> >> performance.io-cache: off
> >> performance.stat-prefetch: off
> >> features.shard: on
> >> features.shard-block-size: 256MB
> >> cluster.server-quorum-ratio: 51%
> >>
> >>
> >> Thanks for your help
> >>
> >> --
> >> Kevin Lemonnier
> >> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-users
> >>
>
> --
> Envoyé de mon appareil Android avec K-9 Mail. Veuillez excuser ma brièveté.
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160418/6eea4578/attachment.html>