[Gluster-users] VM disks corruption on 3.7.11
Kevin Lemonnier
lemonnierk at ulrar.net
Tue May 24 10:24:44 UTC 2016
So the VM were configured with cache set to none, I just tried with
cache=directsync and it seems to be fixing the issue. Still need to run
more test, but did a couple already with that option and no I/O errors.
Never had to do this before, is it known ? Found the clue in some old mail
from this mailing list, did I miss some doc saying you should be using
directsync with glusterfs ?
On Tue, May 24, 2016 at 11:33:28AM +0200, Kevin Lemonnier wrote:
> Hi,
>
> Some news on this.
> I actually don't need to trigger a heal to get corruption, so the problem
> is not the healing. Live migrating the VM seems to trigger corruption every
> time, and even without that just doing a database import, rebooting then
> doing another import seems to corrupt as well.
>
> To check I created local storages on each node on the same partition as the
> gluster bricks, on XFS, and moved the VM disk on each local storage and tested
> the same procedure one by one, no corruption. It seems to happen only on
> glusterFS, so I'm not so sure it's hardware anymore : if it was using local storage
> would corrupt too, right ?
> Could I be missing some critical configuration for VM storage on my gluster volume ?
>
>
> On Mon, May 23, 2016 at 01:54:30PM +0200, Kevin Lemonnier wrote:
> > Hi,
> >
> > I didn't specify it but I use "localhost" to add the storage in proxmox.
> > My thinking is that every proxmox node is also a glusterFS node, so that
> > should work fine.
> >
> > I don't want to use the "normal" way of setting a regular address in there
> > because you can't change it afterwards in proxmox, but could that be the source of
> > the problem, maybe during livre migration there is write comming from
> > two different servers at the same time ?
> >
> >
> >
> > On Wed, May 18, 2016 at 07:11:08PM +0530, Krutika Dhananjay wrote:
> > > Hi,
> > >
> > > I will try to recreate this issue tomorrow on my machines with the steps
> > > that Lindsay provided in this thread. I will let you know the result soon
> > > after that.
> > >
> > > -Krutika
> > >
> > > On Wednesday, May 18, 2016, Kevin Lemonnier <lemonnierk at ulrar.net> wrote:
> > > > Hi,
> > > >
> > > > Some news on this.
> > > > Over the week end the RAID Card of the node ipvr2 died, and I thought
> > > > that maybe that was the problem all along. The RAID Card was changed
> > > > and yesterday I reinstalled everything.
> > > > Same problem just now.
> > > >
> > > > My test is simple, using the website hosted on the VMs all the time
> > > > I reboot ipvr50, wait for the heal to complete, migrate all the VMs off
> > > > ipvr2 then reboot it, wait for the heal to complete then migrate all
> > > > the VMs off ipvr3 then reboot it.
> > > > Everytime the first database VM (which is the only one really using the
> > > disk
> > > > durign the heal) starts showing I/O errors on it's disk.
> > > >
> > > > Am I really the only one with that problem ?
> > > > Maybe one of the drives is dying too, who knows, but SMART isn't saying
> > > anything ..
> > > >
> > > >
> > > > On Thu, May 12, 2016 at 04:03:02PM +0200, Kevin Lemonnier wrote:
> > > >> Hi,
> > > >>
> > > >> I had a problem some time ago with 3.7.6 and freezing during heals,
> > > >> and multiple persons advised to use 3.7.11 instead. Indeed, with that
> > > >> version the freez problem is fixed, it works like a dream ! You can
> > > >> almost not tell that a node is down or healing, everything keeps
> > > working
> > > >> except for a little freez when the node just went down and I assume
> > > >> hasn't timed out yet, but that's fine.
> > > >>
> > > >> Now I have a 3.7.11 volume on 3 nodes for testing, and the VM are
> > > proxmox
> > > >> VMs with qCow2 disks stored on the gluster volume.
> > > >> Here is the config :
> > > >>
> > > >> Volume Name: gluster
> > > >> Type: Replicate
> > > >> Volume ID: e4f01509-beaf-447d-821f-957cc5c20c0a
> > > >> Status: Started
> > > >> Number of Bricks: 1 x 3 = 3
> > > >> Transport-type: tcp
> > > >> Bricks:
> > > >> Brick1: ipvr2.client:/mnt/storage/gluster
> > > >> Brick2: ipvr3.client:/mnt/storage/gluster
> > > >> Brick3: ipvr50.client:/mnt/storage/gluster
> > > >> Options Reconfigured:
> > > >> cluster.quorum-type: auto
> > > >> cluster.server-quorum-type: server
> > > >> network.remote-dio: enable
> > > >> cluster.eager-lock: enable
> > > >> performance.quick-read: off
> > > >> performance.read-ahead: off
> > > >> performance.io-cache: off
> > > >> performance.stat-prefetch: off
> > > >> features.shard: on
> > > >> features.shard-block-size: 64MB
> > > >> cluster.data-self-heal-algorithm: full
> > > >> performance.readdir-ahead: on
> > > >>
> > > >>
> > > >> As mentioned, I rebooted one of the nodes to test the freezing issue I
> > > had
> > > >> on previous versions and appart from the initial timeout, nothing, the
> > > website
> > > >> hosted on the VMs keeps working like a charm even during heal.
> > > >> Since it's testing, there isn't any load on it though, and I just tried
> > > to refresh
> > > >> the database by importing the production one on the two MySQL VMs, and
> > > both of them
> > > >> started doing I/O errors. I tried shutting them down and powering them
> > > on again,
> > > >> but same thing, even starting full heals by hand doesn't solve the
> > > problem, the disks are
> > > >> corrupted. They still work, but sometimes they remount their partitions
> > > read only ..
> > > >>
> > > >> I believe there is a few people already using 3.7.11, no one noticed
> > > corruption problems ?
> > > >> Anyone using Proxmox ? As already mentionned in multiple other threads
> > > on this mailing list
> > > >> by other users, I also have pretty much always shards in heal info, but
> > > nothing "stuck" there,
> > > >> they always go away in a few seconds getting replaced by other shards.
> > > >>
> > > >> Thanks
> > > >>
> > > >> --
> > > >> Kevin Lemonnier
> > > >> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
> > > >
> > > >
> > > >
> > > >> _______________________________________________
> > > >> Gluster-users mailing list
> > > >> Gluster-users at gluster.org
> > > >> http://www.gluster.org/mailman/listinfo/gluster-users
> > > >
> > > >
> > > > --
> > > > Kevin Lemonnier
> > > > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
> > > >
> >
> > --
> > Kevin Lemonnier
> > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>
>
>
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
>
>
> --
> Kevin Lemonnier
> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160524/32f7aaa8/attachment.sig>
More information about the Gluster-users
mailing list