[Gluster-devel] AFR setup with Virtual Servers crashes
Urban Loesch
ul at enas.net
Thu May 10 10:15:13 UTC 2007
Hi,
I'm new to this list.
First: sorry for my bad english.
I was searching for some easy and transparent Clusterfilesystem with
failover feature and I found on Wikipedia the GlusterFS project.
It's a nice project and tried it on my test environment. I thought when
it works good I use it in production too.
A very nice feature for me is the AFR setup. So I can replicate all the
data over 2 Servers in RAID-1 Mode.
But it seems that I make something wrong, because the "glusterfsd"
crashes on both nodes.
But let me explain form the beginning.
Here's my setup:
Hardware:
2 different servers for storage
1 server as client
On top of the server I use a virtual server setup (details
http://linux-vserver.org).
OS:
Debian Sarge with self compiled 2.6.19.2 (uname -r 2.6.19.2-vs2.2.0) and
latest stable virtual server patch.
glusterfs-1.3.0-pre3.tar.gz
What I'm trying to do:
- Create a AFR Mirror over the 2 Servers.
- Mount the Volume on Server 3 (Client).
- Install on the mounted volume the hole virtual Server with Apache,
MySql and so on.
So I have a full redundant Virtual Server mirrored over two bricks .
Here my current confuguration:
- Serverconfig on Server 1 (brick)
### Export volume "brick" with the contents of "/home/export" directory.
volume brick
type storage/posix # POSIX FS translator
option directory /gluster # Export this directory
end-volume
### File Locking
volume locks
type features/posix-locks
subvolumes brick
end-volume
### Add network serving capability to above brick.
volume server
type protocol/server
option transport-type tcp/server # For TCP/IP transport
option listen-port 6996 # Default is 6996
subvolumes locks
option auth.ip.locks.allow * # access to "brick" volume
end-volume
- Serverconfig on Server 2 (brick-afr)
### Export volume "brick" with the contents of "/home/export" directory.
volume brick-afr
type storage/posix # POSIX FS translator
option directory /gluster-afr # Export this directory
end-volume
### File Locking
volume locks-afr
type features/posix-locks
subvolumes brick-afr
end-volume
### Add network serving capability to above brick.
volume server
type protocol/server
option transport-type tcp/server # For TCP/IP transport
option listen-port 6996 # Default is 6996
subvolumes locks-afr
option auth.ip.locks-afr.allow * # access to "brick" volume
end-volume
- Clientconfiguration on Server 3 (
### Add client feature and attach to remote subvolume of server1
volume brick
type protocol/client
option transport-type tcp/client # for TCP/IP transport
option remote-host 192.168.0.1 # IP address of the remote brick
option remote-port 6996 # default server port is 6996
option remote-subvolume locks # name of the remote volume
end-volume
### Add client feature and attach to remote subvolume of brick1
volume brick-afr
type protocol/client
option transport-type tcp/client # for TCP/IP transport
option remote-host 192.168.0.2 # IP address of the remote brick
option remote-port 6996 # default server port is 6996
option remote-subvolume locks-afr # name of the remote volume
end-volume
### Add AFR feature to brick
volume afr
type cluster/afr
subvolumes brick brick-afr
option replicate *:2 # All files 2 copies (RAID-1)
end-volume
----------------------------------------------------------------------------------------------------------------------
I started the two Bricks in debug mode and it starts without problems.
- Server1
glusterfsd --no-daemon --log-file=/dev/stdout --log-level=DEBUG
....
[May 10 11:52:11] [DEBUG/proto-srv.c:2919/init()]
protocol/server:protocol/server xlator loaded
[May 10 11:52:11] [DEBUG/transport.c:83/transport_load()]
libglusterfs/transport:attempt to load type tcp/server
[May 10 11:52:11] [DEBUG/transport.c:88/transport_load()]
libglusterfs/transport:attempt to load file
/usr/lib/glusterfs/1.3.0-pre3/transport/tcp/server.so
- Server2
glusterfsd --no-daemon --log-file=/dev/stdout --log-level=DEBUG
....
[May 10 11:51:44] [DEBUG/proto-srv.c:2919/init()]
protocol/server:protocol/server xlator loaded
[May 10 11:51:44] [DEBUG/transport.c:83/transport_load()]
libglusterfs/transport:attempt to load type tcp/server
[May 10 11:51:44] [DEBUG/transport.c:88/transport_load()]
libglusterfs/transport:attempt to load file
/usr/lib/glusterfs/1.3.0-pre3/transport/tcp/server.so
------------------------------------------------------------------------------------------------------------------------------
So far so good.
After I mounted the volume on server 3 (client). It mounts without any
problems.
glusterfs --no-daemon --log-file=/dev/stdout --log-level=DEBUG
--spec-file=/etc/glusterfs/glusterfs-client.vol /var/lib/vservers/mastersql
...
[May 10 13:59:00] [DEBUG/client-protocol.c:2796/init()]
protocol/client:defaulting transport-timeout to 120
[May 10 13:59:00] [DEBUG/transport.c:83/transport_load()]
libglusterfs/transport:attempt to load type tcp/client
[May 10 13:59:00] [DEBUG/transport.c:88/transport_load()]
libglusterfs/transport:attempt to load file
/usr/lib/glusterfs/1.3.0-pre3/transport/tcp/client.so
[May 10 13:59:00] [DEBUG/tcp-client.c:174/tcp_connect()] transport: tcp:
:try_connect: socket fd = 8
[May 10 13:59:00] [DEBUG/tcp-client.c:196/tcp_connect()] transport: tcp:
:try_connect: finalized on port `1022'
[May 10 13:59:00] [DEBUG/tcp-client.c:255/tcp_connect()]
tcp/client:connect on 8 in progress (non-blocking)
[May 10 13:59:00] [DEBUG/tcp-client.c:293/tcp_connect()]
tcp/client:connection on 8 still in progress - try later
OK. Nice.
A short check on the client:
df -HT
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda1 ext3 13G 2.6G 8.9G 23% /
tmpfs tmpfs 1.1G 0 1.1G 0% /lib/init/rw
udev tmpfs 11M 46k 11M 1% /dev
tmpfs tmpfs 1.1G 0 1.1G 0% /dev/shm
glusterfs:24914
fuse 9.9G 2.5G 6.9G 27% /var/lib/vservers/mastersql
Wow it works. Now I can add, remove or edit files and directories
without problems. The file are written to all two bricks without
problems. Performance is good too.
But then I tried to start my virtual Server (called mastersql).
The virtual server not starts and I get the a lot of following debug
output on the client:
[May 10 14:04:43] [DEBUG/tcp-client.c:174/tcp_connect()] transport: tcp:
:try_connect: socket fd = 4
[May 10 14:04:43] [DEBUG/tcp-client.c:196/tcp_connect()] transport: tcp:
:try_connect: finalized on port `1023'
[May 10 14:04:43] [DEBUG/tcp-client.c:255/tcp_connect()]
tcp/client:connect on 4 in progress (non-blocking)
[May 10 14:04:43] [DEBUG/tcp-client.c:293/tcp_connect()]
tcp/client:connection on 4 still in progress - try later
[May 10 14:04:43] [ERROR/client-protocol.c:204/client_protocol_xfer()]
protocol/client:transport_submit failed
[May 10 14:04:43]
[DEBUG/client-protocol.c:2604/client_protocol_cleanup()]
protocol/client:cleaning up state in transport object 0x8076cf0
[May 10 14:04:43] [DEBUG/tcp-client.c:174/tcp_connect()] transport: tcp:
:try_connect: socket fd = 7
[May 10 14:04:43] [DEBUG/tcp-client.c:196/tcp_connect()] transport: tcp:
:try_connect: finalized on port `1022'
[May 10 14:04:43] [DEBUG/tcp-client.c:255/tcp_connect()]
tcp/client:connect on 7 in progress (non-blocking)
[May 10 14:04:43] [DEBUG/tcp-client.c:293/tcp_connect()]
tcp/client:connection on 7 still in progress - try later
[May 10 14:04:43] [ERROR/client-protocol.c:204/client_protocol_xfer()]
protocol/client:transport_submit failed
[May 10 14:04:43]
[DEBUG/client-protocol.c:2604/client_protocol_cleanup()]
protocol/client:cleaning up state in transport object 0x80762d0
The two mirrorservers are crashing with the following debug code:
[May 10 11:54:26] [DEBUG/tcp-server.c:134/tcp_server_notify()]
tcp/server:Registering socket (5) for new transport object of 192.168.0.3
[May 10 11:55:22] [DEBUG/proto-srv.c:2418/mop_setvolume()]
server-protocol:mop_setvolume: received port = 1022
[May 10 11:55:22] [DEBUG/proto-srv.c:2434/mop_setvolume()]
server-protocol:mop_setvolume: IP addr = *, received ip addr = 192.168.0.3
[May 10 11:55:22] [DEBUG/proto-srv.c:2444/mop_setvolume()]
server-protocol:mop_setvolume: accepted client from 192.168.0.3
Trying to set: READ Is grantable: READ Inserting: READTrying to set:
UNLOCK Is grantable: UNLOCK Conflict with: READTrying to set: WRITE
Is grantable: WRITE Inserting: WRITETrying to set: UNLOCK Is
grantable: UNLOCK Conflict with: WRITETrying to set: WRITE Is
grantable: WRITE Inserting: WRITETrying to set: UNLOCK Is grantable:
UNLOCK Conflict with: WRITETrying to set: WRITE Is grantable: WRITE
Inserting: WRITE[May 10 12:00:09]
[CRITICAL/common-utils.c:215/gf_print_trace()] debug-backtrace:Got
signal (11), printing backtrace
[May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:/usr/lib/libglusterfs.so.0(gf_print_trace+0x2e) [0xb7f53a7e]
[May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:[0xb7f60420]
[May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:/usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so
[0xb75d1192]
[May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:/usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so
[0xb75cded7]
[May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:/usr/lib/libglusterfs.so.0(transport_notify+0x1d)
[0xb7f54ecd]
[May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:/usr/lib/libglusterfs.so.0(sys_epoll_iteration+0xe9)
[0xb7f55b79]
[May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:/usr/lib/libglusterfs.so.0(poll_iteration+0x1d) [0xb7f54f7d]
[May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:glusterfsd [0x804924e]
[May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xc8)
[0xb7e17ea8]
[May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
debug-backtrace:glusterfsd [0x8048c51]
Segmentation fault (core dumped)
It seems that there are come conflicts with "READ, WRITE, UNLOCK". But
I'm not an expert on filesystems an locking features.
As you can see the filesystem is just mounted but not connected to the
two bricks.
df -HT
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda1 ext3 13G 2.6G 8.9G 23% /
tmpfs tmpfs 1.1G 0 1.1G 0% /lib/init/rw
udev tmpfs 11M 46k 11M 1% /dev
tmpfs tmpfs 1.1G 0 1.1G 0% /dev/shm
df: `/var/lib/vservers/mastersql': Transport endpoint is not connected
I'm not sure if i make something wrong (configuration) or if it is a bug!
Can you experts please help me?
If you need any further information or something please let me know.
Thanks and regards
Urban
More information about the Gluster-devel
mailing list