[Gluster-devel] problems running glusterfs 2.5 patch 800 and xen
Jordi Moles Blanco
jordi at cdmon.com
Wed Dec 10 08:39:04 UTC 2008
En/na Anand Avati ha escrit:
>> Right now, with this new version, when i ask Xen to create a new machine in
>> "/mnt/gluster", i fails.
>>
>
> Can you describe the failure better?
>
>
>> Even if it doesn't, then the machine will apper
>> "broken" on boot without any chance of recovering it.
>>
>> What i have to do then is create it in my local disk and then move it to
>> /mnt/gluster, where it will start and "freeze" at the point that i said in
>> the previous mails.
>>
>> Also, if i copy an image file of a vm from another Xen Server straight into
>> /mnt/gluster through "scp" command, it won't work. Instead, i have to "scp"
>> the file to my local harddisk form an external machine and then move it into
>> /mnt/gluster.
>>
>> I don't know if this helps you identify the problem.
>>
>
> can you post your client log files somewhere? that can help us debug your issue.
>
> avati
>
hi,
the information you are asking for was posted on the first message of
this thread. In that mail i wrote the specs files for both nodes and Xen
servers (clients) and also the log files for both.
I'm pasting the content of my first mail and i'll updated with the new
things i've experienced from then.
*****************************************************************************************************************
i'm having trouble running Xen virtual machines on a glusterfs 2.5,
patch 800.
i've got two xen servers, version 3.2 that store their machines on
gluster. They are debian-lenny distros. I also have 3 nodes which
provide the storage unit with glusterfs, also lenny distros.
the thing is that when i ran "./configure --enable-kernel-module" for
fuse 2.7.3glfs10 on server's side, i got this:
***********
warning: fuse module is already present on kernel, it won't compile
***********
so...
i ran:
*********
./configure
make
make install
**********
when compiling glusterfs--patch-800 i didn't get any error or warning
message at all.
On Xen's side, i ran the proposed configure with "enable-fuse-client"
and so on, and i got no problems.
anyway...
nodes have this specs:
***************
volume esp
type storage/posix
option directory /glu0/data
end-volume
volume espai
type performance/io-threads
option thread-count 15
option cache-size 512MB
subvolumes esp
end-volume
volume nm
type storage/posix
option directory /glu0/ns
end-volume
volume ultim
type protocol/server
subvolumes espai nm
option transport-type tcp/server
option auth.ip.espai.allow *
option auth.ip.nm.allow *
end-volume
***************
and Xen have these specs:
***********
volume espai1
type protocol/client
option transport-type tcp/client
option remote-host 10.0.0.3
option remote-subvolume espai
end-volume
volume espai2
type protocol/client
option transport-type tcp/client
option remote-host 10.0.0.4
option remote-subvolume espai
end-volume
volume espai3
type protocol/client
option transport-type tcp/client
option remote-host 10.0.0.5
option remote-subvolume espai
end-volume
volume namespace1
type protocol/client
option transport-type tcp/client
option remote-host 10.0.0.4
option remote-subvolume nm
end-volume
volume namespace2
type protocol/client
option transport-type tcp/client
option remote-host 10.0.0.5
option remote-subvolume nm
end-volume
volume grup1
type cluster/afr
subvolumes espai1 espai3
end-volume
volume grup2
type cluster/afr
subvolumes espai2
end-volume
volume nm
type cluster/afr
subvolumes namespace1 namespace2
end-volume
volume g01
type cluster/unify
subvolumes grup1 grup2
option scheduler rr
option namespace nm
end-volume
volume io-cache
type performance/io-cache
option cache-size 512MB
option page-size 1MB
option force-revalidate-timeout 2
subvolumes g01
end-volume
***********
so... everything seams to work fine at first, Xens are able to mount the
glusterfs unit, but after a few seconds... i keep getting this on Xen's
side:
*********
2008-12-04 18:48:56 E [client-protocol.c:4579:client_checksum] espai2:
/domains: returning EINVAL
2008-12-04 18:48:56 E [client-protocol.c:4579:client_checksum] espai2:
/domains/xen-gluton02: returning EINVAL
*********
there's no more log about the problem, only that.
on node's side:
************
2008-12-04 19:48:50 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:48:51 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:48:53 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:48:56 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:49:01 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:49:09 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:49:22 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:49:43 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:50:17 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:51:00 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:51:01 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:51:12 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:51:12 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:51:15 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:52:41 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:55:05 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:55:59 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:56:00 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:56:10 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:56:14 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 19:58:58 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:00:59 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:00:59 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:01:03 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:01:06 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:05:15 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:05:59 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:06:00 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:06:09 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:06:13 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:10:59 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:11:00 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:11:08 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:11:12 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:15:25 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:15:59 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:16:00 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:16:05 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:16:12 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:20:59 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:21:00 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:21:07 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:21:12 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:26:00 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:26:00 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:26:07 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
2008-12-04 20:26:13 E [server-protocol.c:6050:server_protocol_interpret]
ultim: bound_xl is null
***********
At first i didn't pay attention to that because we can operate on the
storage unit, we can for example move some GB into it... but the thing
is that when i try to run the virtual machine, it will freeze after a
few seconds and this error i'm reporting will appear more often than
before. However, i don't have to run a machine to make it appear, it
does appear from the beginning.
finally... when i mount gluster from Xen, i do it this way:
**********
glusterfs -l /var/log/glusterfs/glusterfs.log -L WARNING -d disable -f
/etc/glusterfs/glusterfs-client.vol /mnt/glusterfs
**********
i mean, with "-d disable" option which is supposed to be the thing to do
with Xen.
and this is the point where my virtual machine freezes:
***********
[ 1.104884] blkfront: sda2: barriers enabled
[ 1.189612] XENBUS: Device with no driver: device/console/0
[ 1.189620] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[ 1.189630] Freeing unused kernel memory: 216k freed
[ 1.461468] thermal: Unknown symbol acpi_processor_set_thermal_limit
[ 2.047300] md: raid1 personality registered for level 1
[ 2.089518] md: md0 stopped.
[ 2.091932] md: md1 stopped.
[ 2.096341] md: md2 stopped.
[ 2.244344] EXT3-fs: INFO: recovery required on readonly filesystem.
[ 2.244358] EXT3-fs: write access will be enabled during recovery.
[ 2.286384] kjournald starting. Commit interval 5 seconds
[ 2.286398] EXT3-fs: recovery complete.
[ 2.287274] EXT3-fs: mounted filesystem with ordered data mode.
[ 3.128883] Adding 524280k swap on /dev/sda1. Priority:-1 extents:1
across:524280k
[ 3.208470] EXT3 FS on sda2, internal journal
[ 3.641153] device-mapper: uevent: version 1.0.3
[ 3.641208] device-mapper: ioctl: 4.13.0-ioctl (2007-10-18)
initialised: dm-devel at redhat.com
[ 4.756035] NET: Registered protocol family 10
[ 4.756035] lo: Disabled Privacy Extensions
***********
*********************
So.... i've also seen that i can't create virtual machines directly to
/mnt/glusterfs, i have to create them on my local disk and then move
them to the gluster mounted point. Is this a normal behaviour? It's not
really a problem, but it would be much easier if i could create them
directly to the glusterfs storage unit.
Finally... the main issue is that virtual machines won't work as they
used to in earlier versions of fuse/gluster, they just freeze and
gluster logs the error "bound_xl is null" all the time.
I've been trying to find out why on node's side i can't run "./configure
--enable-kernel-module". Is it possible that "by default" there's a fuse
module in the lenny's kernel? Is it possible that it causes the
"bound_xl is null" problem?
I mean, you say that this is mainly caused when you use different
versions of gluster or fuse. Well... the only thing i cant think of is
that "default fuse module" in lenny that is from an older version and
interacts with fuse on Xen's and cause this message? Does that make any
sense.
I would be very pleased if anyone could through some light on this.
Thanks.
More information about the Gluster-devel
mailing list