[Gluster-users] VM disks corruption on 3.7.11

Wed May 18 13:41:08 UTC 2016

Hi,

I will try to recreate this issue tomorrow on my machines with the steps
that Lindsay provided in this thread. I will let you know the result soon
after that.

-Krutika

On Wednesday, May 18, 2016, Kevin Lemonnier <lemonnierk at ulrar.net> wrote:
> Hi,
>
> Some news on this.
> Over the week end the RAID Card of the node ipvr2 died, and I thought
> that maybe that was the problem all along. The RAID Card was changed
> and yesterday I reinstalled everything.
> Same problem just now.
>
> My test is simple, using the website hosted on the VMs all the time
> I reboot ipvr50, wait for the heal to complete, migrate all the VMs off
> ipvr2 then reboot it, wait for the heal to complete then migrate all
> the VMs off ipvr3 then reboot it.
> Everytime the first database VM (which is the only one really using the
disk
> durign the heal) starts showing I/O errors on it's disk.
>
> Am I really the only one with that problem ?
> Maybe one of the drives is dying too, who knows, but SMART isn't saying
anything ..
>
>
> On Thu, May 12, 2016 at 04:03:02PM +0200, Kevin Lemonnier wrote:
>> Hi,
>>
>> I had a problem some time ago with 3.7.6 and freezing during heals,
>> and multiple persons advised to use 3.7.11 instead. Indeed, with that
>> version the freez problem is fixed, it works like a dream ! You can
>> almost not tell that a node is down or healing, everything keeps working
>> except for a little freez when the node just went down and I assume
>> hasn't timed out yet, but that's fine.
>>
>> Now I have a 3.7.11 volume on 3 nodes for testing, and the VM are proxmox
>> VMs with qCow2 disks stored on the gluster volume.
>> Here is the config :
>>
>> Volume Name: gluster
>> Type: Replicate
>> Volume ID: e4f01509-beaf-447d-821f-957cc5c20c0a
>> Status: Started
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: ipvr2.client:/mnt/storage/gluster
>> Brick2: ipvr3.client:/mnt/storage/gluster
>> Brick3: ipvr50.client:/mnt/storage/gluster
>> Options Reconfigured:
>> cluster.quorum-type: auto
>> cluster.server-quorum-type: server
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> features.shard: on
>> features.shard-block-size: 64MB
>> cluster.data-self-heal-algorithm: full
>> performance.readdir-ahead: on
>>
>>
>> As mentioned, I rebooted one of the nodes to test the freezing issue I
had
>> on previous versions and appart from the initial timeout, nothing, the
website
>> hosted on the VMs keeps working like a charm even during heal.
>> Since it's testing, there isn't any load on it though, and I just tried
to refresh
>> the database by importing the production one on the two MySQL VMs, and
both of them
>> started doing I/O errors. I tried shutting them down and powering them
on again,
>> but same thing, even starting full heals by hand doesn't solve the
problem, the disks are
>> corrupted. They still work, but sometimes they remount their partitions
read only ..
>>
>> I believe there is a few people already using 3.7.11, no one noticed
corruption problems ?
>> Anyone using Proxmox ? As already mentionned in multiple other threads
on this mailing list
>> by other users, I also have pretty much always shards in heal info, but
nothing "stuck" there,
>> they always go away in a few seconds getting replaced by other shards.
>>
>> Thanks
>>
>> --
>> Kevin Lemonnier
>> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>
>
>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
> --
> Kevin Lemonnier
> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160518/c4c87f85/attachment.html>