[Gluster-users] XEN VPS unresponsive because of selfhealing

Mon Apr 18 12:56:38 UTC 2011

Hi,
sorry for the mess with email addresses. I'v got this reply

> Please keep us updated on how you make out using Gluster as storage
for VPS.
>It seems like most people have settled on ISCSI for VPS.
>thanks,
>-Drew

I have blade servers which are running Xen on Debian Lenny . Storage
consist from 6 nodes (another four are prepared) where every brick is
mirroring with its pair, eg.:

gnode002.local:/data1/images <-> gnode004.local:/data1/images

then I mount storage using standard GlusterFS client using

mount -t glusterfs wsp.local:/images /mnt/wsp

where wsp.local is DNS RR around all nodes.

VPS then boots from file on /mnt/wsp
eg /mnt/wsp/img/uid10019/uuid/disk.img using tap:aio driver .

On Mon, 2011-04-18 at 14:36 +0200, Tomas Corej wrote:
> Hello,
> 
> I've been actively watching this project since its early 2.0 releases
> and think it has made great progress. Personally, the problems it's
> solving and the way it does it are interesting to me.
> 
> We are a webhosting company and have used GlusterFS for serving some of
> the hostings from GlusterFS due to their size.
> 
> While serving XEN domUs from GlusterFS, yesterday we were
> trying to upgrade GlusterFS 3.1.2 to the latest version 3.1.4 . Our
> setup is pretty much the standard distribute-replicate:
> 
> Volume Name: images
> Type: Distributed-Replicate
> Status: Started
> Number of Bricks: 12 x 2 = 24
> Transport-type: tcp
> Bricks:
> Brick1: gnode002.local:/data1/images
> Brick2: gnode004.local:/data1/images
> Brick3: gnode002.local:/data2/images
> Brick4: gnode004.local:/data2/images
> Brick5: gnode002.local:/data3/images
> Brick6: gnode004.local:/data3/images
> Brick7: gnode002.local:/data4/images
> Brick8: gnode004.local:/data4/images
> Brick9: gnode006.local:/data1/images
> Brick10: gnode008.local:/data1/images
> Brick11: gnode006.local:/data2/images
> Brick12: gnode008.local:/data2/images
> Brick13: gnode006.local:/data3/images
> Brick14: gnode008.local:/data3/images
> Brick15: gnode006.local:/data4/images
> Brick16: gnode008.local:/data4/images
> Brick17: gnode010.local:/data1/images
> Brick18: gnode012.local:/data1/images
> Brick19: gnode010.local:/data2/images
> Brick20: gnode012.local:/data2/images
> Brick21: gnode010.local:/data3/images
> Brick22: gnode012.local:/data3/images
> Brick23: gnode010.local:/data4/images
> Brick24: gnode012.local:/data4/images
> Options Reconfigured:
> performance.quick-read: off
> network.ping-timeout: 30
> 
> XEN servers have mounted images through the GlusterFS native client and
> served using tap:aio driver.
> 
> We wanted to upgrade gluster on each node, one at a time (but we did
> only gnode002) . So we did this:
> 
> root at gnode002.local: /etc/init.d/glusterd stop && killall glusterfsd
> && /etc/init.d/glusterd start
> 
> we had to kill processess because glusterd didn't shutdown properly. The
> problem was, that after execution, self-healing immediately started to
> check consistency. glusterfsd process could have been down for 5-6
> seconds so we expected selfhealing not to initiate, but it did. This
> would not be a problem on its own, if selfhealing itself wouldn't make
> our VPS totally unresponsive for 90 minutes until it stopped because
> gluster has locked (or the access to image was so slow ?) the image.
> 
> So question is - is there a way to avoid this or minimize these
> effects? Has anyone had the same experience with selfhealing in
> GlusterFS+XEN environment?
> 
> Regards,
> Tomas Corej
> 
> S pozdravom

S pozdravom
-- 
[ Ohodnotte kvalitu emailu: http://nicereply.com/websupport/Corej/ ]

Tomáš Čorej | admin section

+421 (0)2 20 60 80 89
+421 (0)2 20 60 80 80

http://WebSupport.sk
*** BERTE A VYCHUTNAVAJTE ***