[Gluster-users] Freezing during heal

Sun Apr 17 16:07:22 UTC 2016

I believe Proxmox is just an interface to KVM that uses the lib, so if I'm not mistaken there isn't client logs ?

It's not the first time I have the issue, it happens on every heal on the 2 clusters I have.

I did let the heal finish that night and the VMs are working now, but it is pretty scarry for future crashes or brick replacement. 
Should I maybe lower the shard size ? Won't solve the fact that 2 bricks on 3 aren't keeping the filesystem usable but might make the healing quicker right ?

Thanks

Le 17 avril 2016 17:56:37 GMT+02:00, Krutika Dhananjay <kdhananj at redhat.com> a écrit :
>Could you share the client logs and information about the approx
>time/day
>when you saw this issue?
>
>-Krutika
>
>On Sat, Apr 16, 2016 at 12:57 AM, Kevin Lemonnier
><lemonnierk at ulrar.net>
>wrote:
>
>> Hi,
>>
>> We have a small glusterFS 3.7.6 cluster with 3 nodes running with
>proxmox
>> VM's on it. I did set up the different recommended option like the
>virt
>> group, but
>> by hand since it's on debian. The shards are 256MB, if that matters.
>>
>> This morning the second node crashed, and as it came back up started
>a
>> heal, but that basically froze all the VM's running on that volume.
>Since
>> we really really
>> can't have 40 minutes down time in the middle of the day, I just
>removed
>> the node from the network and that stopped the heal, allowing the
>VM's to
>> access
>> their disks again. The plan was to re-connecte the node in a couple
>of
>> hours to let it heal at night.
>> But a VM crashed now, and it can't boot up again : seems to freez
>trying
>> to access the disks.
>>
>> Looking at the heal info for the volume, it has gone way up since
>this
>> morning, it looks like the VM's aren't writing to both nodes, just
>the one
>> they are on.
>> It seems pretty bad, we have 2 nodes on 3 up, I would expect the
>volume to
>> work just fine since it has quorum. What am I missing ?
>>
>> It is still too early to start the heal, is there a way to start the
>VM
>> anyway right now ? I mean, it was running a moment ago so the data is
>> there, it just needs
>> to let the VM access it.
>>
>>
>>
>> Volume Name: vm-storage
>> Type: Replicate
>> Volume ID: a5b19324-f032-4136-aaac-5e9a4c88aaef
>> Status: Started
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: first_node:/mnt/vg1-storage
>> Brick2: second_node:/mnt/vg1-storage
>> Brick3: third_node:/mnt/vg1-storage
>> Options Reconfigured:
>> cluster.quorum-type: auto
>> cluster.server-quorum-type: server
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> performance.readdir-ahead: on
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> features.shard: on
>> features.shard-block-size: 256MB
>> cluster.server-quorum-ratio: 51%
>>
>>
>> Thanks for your help
>>
>> --
>> Kevin Lemonnier
>> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>

-- 
Envoyé de mon appareil Android avec K-9 Mail. Veuillez excuser ma brièveté.