[Gluster-devel] problems running glusterfs 2.5 patch 800 and xen

Jordi Moles Blanco jordi at cdmon.com
Wed Dec 10 08:39:04 UTC 2008


En/na Anand Avati ha escrit:
>> Right now, with this new version, when i ask Xen to create a new machine in
>> "/mnt/gluster", i fails.
>>     
>
> Can you describe the failure better?
>
>   
>> Even if it doesn't, then the machine will apper
>> "broken" on boot without any chance of recovering it.
>>
>> What i have to do then is create it in my local disk and then move it to
>> /mnt/gluster, where it will start and "freeze" at the point that i said in
>> the previous mails.
>>
>> Also, if i copy an image file of a vm from another Xen Server straight into
>> /mnt/gluster through "scp" command, it won't work. Instead, i have to "scp"
>> the file to my local harddisk form an external machine and then move it into
>> /mnt/gluster.
>>
>> I don't know if this helps you identify the problem.
>>     
>
> can you post your client log files somewhere? that can help us debug your issue.
>
> avati
>   

hi,

the information you are asking for was posted on the first message of 
this thread. In that mail i wrote the specs files for both nodes and Xen 
servers (clients) and also the log files for both.

I'm pasting the content of my first mail and i'll updated with the new 
things i've experienced from then.


*****************************************************************************************************************

i'm having trouble running Xen virtual machines on a glusterfs 2.5, 
patch 800.

i've got two xen servers, version 3.2 that store their machines on 
gluster. They are debian-lenny distros. I also have 3 nodes which 
provide the storage unit with glusterfs, also lenny distros.

the thing is that when i ran "./configure --enable-kernel-module" for 
fuse 2.7.3glfs10 on server's side, i got this:

***********
warning: fuse module is already present on kernel, it won't compile
***********

so...

i ran:

*********
./configure
make
make install
**********



when compiling glusterfs--patch-800 i didn't get any error or warning 
message at all.

On Xen's side, i ran the proposed configure with "enable-fuse-client" 
and so on, and i got no problems.

anyway...

nodes have this specs:

***************

volume esp
    type storage/posix
    option directory /glu0/data
end-volume

volume espai
    type performance/io-threads
    option thread-count 15
    option cache-size 512MB
    subvolumes esp
end-volume

volume nm
    type storage/posix
    option directory /glu0/ns
end-volume

volume ultim
   type protocol/server
   subvolumes espai nm
   option transport-type tcp/server
   option auth.ip.espai.allow *
   option auth.ip.nm.allow *
end-volume


***************

and Xen have these specs:

***********

volume espai1
       type protocol/client
       option transport-type tcp/client
       option remote-host 10.0.0.3
       option remote-subvolume espai
end-volume

volume espai2
       type protocol/client
       option transport-type tcp/client
       option remote-host 10.0.0.4
       option remote-subvolume espai
end-volume

volume espai3
       type protocol/client
       option transport-type tcp/client
       option remote-host 10.0.0.5
       option remote-subvolume espai
end-volume

volume namespace1
       type protocol/client
       option transport-type tcp/client
       option remote-host 10.0.0.4
       option remote-subvolume nm
end-volume

volume namespace2
       type protocol/client
       option transport-type tcp/client
       option remote-host 10.0.0.5
       option remote-subvolume nm
end-volume

volume grup1
       type cluster/afr
       subvolumes espai1 espai3
end-volume

volume grup2
       type cluster/afr
       subvolumes espai2
end-volume

volume nm
       type cluster/afr
       subvolumes namespace1 namespace2
end-volume

volume g01
       type cluster/unify
       subvolumes grup1 grup2
       option scheduler rr
       option namespace nm
end-volume

volume io-cache       
    type performance/io-cache       
    option cache-size 512MB       
    option page-size 1MB
    option force-revalidate-timeout 2       
    subvolumes g01
end-volume 

***********

so... everything seams to work fine at first, Xens are able to mount the 
glusterfs unit, but after a few seconds... i keep getting this on Xen's 
side:


*********
2008-12-04 18:48:56 E [client-protocol.c:4579:client_checksum] espai2: 
/domains: returning EINVAL
2008-12-04 18:48:56 E [client-protocol.c:4579:client_checksum] espai2: 
/domains/xen-gluton02: returning EINVAL
*********

there's no more log about the problem, only that.

on node's side:

************
2008-12-04 19:48:50 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:48:51 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:48:53 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:48:56 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:49:01 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:49:09 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:49:22 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:49:43 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:50:17 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:51:00 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:51:01 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:51:12 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:51:12 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:51:15 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:52:41 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:55:05 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:55:59 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:56:00 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:56:10 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:56:14 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 19:58:58 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:00:59 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:00:59 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:01:03 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:01:06 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:05:15 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:05:59 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:06:00 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:06:09 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:06:13 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:10:59 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:11:00 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:11:08 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:11:12 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:15:25 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:15:59 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:16:00 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:16:05 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:16:12 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:20:59 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:21:00 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:21:07 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:21:12 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:26:00 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:26:00 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:26:07 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null
2008-12-04 20:26:13 E [server-protocol.c:6050:server_protocol_interpret] 
ultim: bound_xl is null

***********

At first i didn't pay attention to that because we can operate on the 
storage unit, we can for example move some GB into it... but the thing 
is that when i try to run the virtual machine, it will freeze after a 
few seconds and this error i'm reporting will appear more often than 
before. However, i don't have to run a machine to make it appear, it 
does appear from the beginning.

finally... when i mount gluster from Xen, i do it this way:

**********
glusterfs -l /var/log/glusterfs/glusterfs.log -L WARNING -d disable -f 
/etc/glusterfs/glusterfs-client.vol /mnt/glusterfs
**********

i mean, with "-d disable" option which is supposed to be the thing to do 
with Xen.

and this is the point where my virtual machine freezes:

***********
[    1.104884] blkfront: sda2: barriers enabled
[    1.189612] XENBUS: Device with no driver: device/console/0
[    1.189620] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[    1.189630] Freeing unused kernel memory: 216k freed
[    1.461468] thermal: Unknown symbol acpi_processor_set_thermal_limit
[    2.047300] md: raid1 personality registered for level 1
[    2.089518] md: md0 stopped.
[    2.091932] md: md1 stopped.
[    2.096341] md: md2 stopped.
[    2.244344] EXT3-fs: INFO: recovery required on readonly filesystem.
[    2.244358] EXT3-fs: write access will be enabled during recovery.
[    2.286384] kjournald starting.  Commit interval 5 seconds
[    2.286398] EXT3-fs: recovery complete.
[    2.287274] EXT3-fs: mounted filesystem with ordered data mode.
[    3.128883] Adding 524280k swap on /dev/sda1.  Priority:-1 extents:1 
across:524280k
[    3.208470] EXT3 FS on sda2, internal journal
[    3.641153] device-mapper: uevent: version 1.0.3
[    3.641208] device-mapper: ioctl: 4.13.0-ioctl (2007-10-18) 
initialised: dm-devel at redhat.com
[    4.756035] NET: Registered protocol family 10
[    4.756035] lo: Disabled Privacy Extensions
***********

*********************

So.... i've also seen that i can't create virtual machines directly to 
/mnt/glusterfs, i have to create them on my local disk and then move 
them to the gluster mounted point. Is this a normal behaviour? It's not 
really a problem, but it would be much easier if i could create them 
directly to the glusterfs storage unit.

Finally... the main issue is that virtual machines won't work as they 
used to in earlier versions of fuse/gluster, they just freeze and 
gluster logs the error "bound_xl is null" all the time.

I've been trying to find out why on node's side i can't run "./configure 
--enable-kernel-module". Is it possible that "by default" there's a fuse 
module in the lenny's kernel? Is it possible that it causes the 
"bound_xl is null" problem?
I mean, you say that this is mainly caused when you use different 
versions of gluster or fuse. Well... the only thing i cant think of is 
that "default fuse module" in lenny that is from an older version and 
interacts with fuse on Xen's and cause this message? Does that make any 
sense.

I would be very pleased if anyone could through some light on this.

Thanks.





More information about the Gluster-devel mailing list