[Gluster-users] self-heal stops some vms (virtual machines)

Fri Feb 28 04:34:43 UTC 2014

Nick, Joao,
    Data self-heal in afr is designed to be I/O friendly. Could you please help us identify the root cause for I/O lock up and fix it if possible.
It seems like this problem only happens for some of the VMs and not all. Do you guys think we can find steps to re-create the problem consistently? May be with a predefined workload inside the VM which can cause this problem while self-heal is in progress?

Pranith.
----- Original Message -----
> From: "Nick Majeran" <nmajeran at gmail.com>
> To: "João Pagaime" <joao.pagaime at gmail.com>
> Cc: Gluster-users at gluster.org
> Sent: Thursday, February 27, 2014 6:43:07 PM
> Subject: Re: [Gluster-users] self-heal stops some vms (virtual machines)
> 
> I've had similar issues adding bricks and running a fix-layout as well.
> 
> > On Feb 27, 2014, at 3:56 AM, João Pagaime <joao.pagaime at gmail.com> wrote:
> > 
> > yes, a real problem, enough to start thinking real hard on architecture
> > scenarios
> > 
> > sorry but I can't share any solutions at this time
> > 
> > one (complicated) workaround would be to "medically induce a coma" on a VM
> > as the self-heal starts on it, and  resurrect it afterwards.
> > I mean something like this:
> > $ virsh suspend <vm-id>
> > (do self-heal on vm's disks)
> > $ virsh resume <vm-id>
> > problems: several, including to the vm users, but better than a kernel
> > lock-up. Feasibility problem: how to detect efficiently when the self-heal
> > starts on a specific file on the brick
> > 
> > another related problem may be how to mitigate IO starvation on the brick,
> > when self-healing kicks in, since that process maybe a IO-hog. But I think
> > this is a lesser problem
> > 
> > best regards
> > Joao
> > 
> > 
> > Em 27-02-2014 09:08, Fabio Rosati escreveu:
> >> Hi All,
> >> 
> >> I run in exactly the same problem encountered by Joao.
> >> After rebooting one of the GlusterFS nodes, self-heal starts and some VMs
> >> can't access their disk images anymore.
> >> 
> >> Logs from one of the VMs after one gluster node has rebooted:
> >> 
> >> Feb 25 23:35:47 fwrt2 kernel: EXT4-fs error (device dm-2):
> >> __ext4_get_inode_loc: unable to read inode block - inode=2145, block=417
> >> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector
> >> 15032608
> >> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector
> >> 15307504
> >> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector
> >> 15307552
> >> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector
> >> 15307568
> >> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector
> >> 15307504
> >> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector
> >> 12972672
> >> Feb 25 23:35:47 fwrt2 kernel: EXT4-fs error (device dm-1):
> >> ext4_find_entry: reading directory #123 offset 0
> >> Feb 25 23:35:47 fwrt2 kernel: Core dump to |/usr/libexec/abrt-hook-ccpp 7
> >> 0 2757 0 23 1393367747 e pipe failed
> >> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector
> >> 9250632
> >> Feb 25 23:35:47 fwrt2 kernel: Read-error on swap-device (253:0:30536)
> >> Feb 25 23:35:47 fwrt2 kernel: Read-error on swap-device (253:0:30544)
> >> [...]
> >> 
> >> 
> >> I few hours later the VM seemed to be freezed and I had to kill and
> >> restart it, no more problems after reboot.
> >> 
> >> This is the volume layout:
> >> 
> >> Volume Name: gv_pri
> >> Type: Distributed-Replicate
> >> Volume ID: 3d91b91e-4d72-484f-8655-e5ed8d38bb28
> >> Status: Started
> >> Number of Bricks: 2 x 2 = 4
> >> Transport-type: tcp
> >> Bricks:
> >> Brick1: nw1glus.gem.local:/glustexp/pri1/brick
> >> Brick2: nw2glus.gem.local:/glustexp/pri1/brick
> >> Brick3: nw3glus.gem.local:/glustexp/pri2/brick
> >> Brick4: nw4glus.gem.local:/glustexp/pri2/brick
> >> Options Reconfigured:
> >> storage.owner-gid: 107
> >> storage.owner-uid: 107
> >> server.allow-insecure: on
> >> network.remote-dio: on
> >> performance.write-behind-window-size: 16MB
> >> performance.cache-size: 128MB
> >> 
> >> OS: CentOS 6.5
> >> GlusterFS version: 3.4.2
> >> 
> >> The qemu-kvm VMs access their qcow2 disk images using the native Gluster
> >> support (no fuse mount).
> >> In the Gluster logs I didn't find anything special logged during self-heal
> >> but I can post them if needed.
> >> 
> >> Anyone have an idea of what can cause these problems?
> >> 
> >> Thank you
> >> Fabio
> >> 
> >> 
> >> ----- Messaggio originale -----
> >> Da: "João Pagaime" <joao.pagaime at gmail.com>
> >> A: Gluster-users at gluster.org
> >> Inviato: Venerdì, 7 febbraio 2014 13:13:59
> >> Oggetto: [Gluster-users] self-heal stops some vms (virtual machines)
> >> 
> >> hello all
> >> 
> >> I have a replicate volume that holds kvm  vms (virtual machines)
> >> 
> >> I had to stop one gluster-server for maintenance . That part of the
> >> operation went well: no vms problems after shutdown
> >> 
> >> the problems started after booting the gluster-server. Self-healing
> >> started as expected, but some vms  locked up with disk problems
> >> (time-outs), as self-healing goes by them.
> >> Some VMs did survive the self-healing . I suppose the ones with low IO
> >> activity or less sensitive to disk problems
> >> 
> >> is there some specific gluster configuration to enable a self-healing
> >> ride-through on running-vms? (cluster.data-self-heal-algorithm is
> >> already on the diff mode)
> >> 
> >> is there some tweaks recommended to do on vms running on top of gluster?
> >> 
> >> current config:
> >> 
> >> gluster:   3.3.0-1.el6.x86_64
> >> 
> >> --------------------- volume:
> >> # gluster volume info VOL
> >> 
> >> Volume Name: VOL
> >> Type: Distributed-Replicate
> >> Volume ID: f44182d9-24eb-4953-9cdd-71464f9517e0
> >> Status: Started
> >> Number of Bricks: 2 x 2 = 4
> >> Transport-type: tcp
> >> Bricks:
> >> Brick1: one-gluster01:/san02-v2
> >> Brick2: one-gluster02:/san02-v2
> >> Brick3: one-gluster01:/san03
> >> Brick4: one-gluster02:/san04
> >> Options Reconfigured:
> >> diagnostics.count-fop-hits: on
> >> diagnostics.latency-measurement: on
> >> nfs.disable: on
> >> auth.allow:x
> >> performance.flush-behind: off
> >> cluster.self-heal-window-size: 1
> >> performance.cache-size: 67108864
> >> cluster.data-self-heal-algorithm: diff
> >> performance.io-thread-count: 32
> >> cluster.min-free-disk: 250GB
> >> 
> >> thanks,
> >> best regards,
> >> joao
> >> 
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > 
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users