[Gluster-devel] AFR setup with Virtual Servers crashes
Anand Avati
avati at zresearch.com
Thu May 10 10:25:58 UTC 2007
Urban,
which version of glusterfs are you using? if it is from TLA checkout
what is the patchset number?
you have a core dump generated from the segfault, can you please get a
backtrace from it? (gdb glusterfsd -c core.<pid> or gdb glusterfsd -c
core, type 'bt' command and paste the output) please.
is this easily reproducible? have you checked with the latest TLA checkout?
thanks,
avati
2007/5/10, Urban Loesch <ul at enas.net>:
> Hi,
>
> I'm new to this list.
> First: sorry for my bad english.
>
> I was searching for some easy and transparent Clusterfilesystem with
> failover feature and I found on Wikipedia the GlusterFS project.
> It's a nice project and tried it on my test environment. I thought when
> it works good I use it in production too.
>
> A very nice feature for me is the AFR setup. So I can replicate all the
> data over 2 Servers in RAID-1 Mode.
> But it seems that I make something wrong, because the "glusterfsd"
> crashes on both nodes.
> But let me explain form the beginning.
>
> Here's my setup:
> Hardware:
> 2 different servers for storage
> 1 server as client
> On top of the server I use a virtual server setup (details
> http://linux-vserver.org).
>
> OS:
> Debian Sarge with self compiled 2.6.19.2 (uname -r 2.6.19.2-vs2.2.0) and
> latest stable virtual server patch.
> glusterfs-1.3.0-pre3.tar.gz
>
> What I'm trying to do:
> - Create a AFR Mirror over the 2 Servers.
> - Mount the Volume on Server 3 (Client).
> - Install on the mounted volume the hole virtual Server with Apache,
> MySql and so on.
> So I have a full redundant Virtual Server mirrored over two bricks .
>
> Here my current confuguration:
> - Serverconfig on Server 1 (brick)
>
> ### Export volume "brick" with the contents of "/home/export" directory.
> volume brick
> type storage/posix # POSIX FS translator
> option directory /gluster # Export this directory
> end-volume
>
> ### File Locking
> volume locks
> type features/posix-locks
> subvolumes brick
> end-volume
>
> ### Add network serving capability to above brick.
> volume server
> type protocol/server
> option transport-type tcp/server # For TCP/IP transport
> option listen-port 6996 # Default is 6996
> subvolumes locks
> option auth.ip.locks.allow * # access to "brick" volume
> end-volume
>
> - Serverconfig on Server 2 (brick-afr)
> ### Export volume "brick" with the contents of "/home/export" directory.
> volume brick-afr
> type storage/posix # POSIX FS translator
> option directory /gluster-afr # Export this directory
> end-volume
>
> ### File Locking
> volume locks-afr
> type features/posix-locks
> subvolumes brick-afr
> end-volume
>
> ### Add network serving capability to above brick.
> volume server
> type protocol/server
> option transport-type tcp/server # For TCP/IP transport
> option listen-port 6996 # Default is 6996
> subvolumes locks-afr
> option auth.ip.locks-afr.allow * # access to "brick" volume
> end-volume
>
> - Clientconfiguration on Server 3 (
> ### Add client feature and attach to remote subvolume of server1
> volume brick
> type protocol/client
> option transport-type tcp/client # for TCP/IP transport
> option remote-host 192.168.0.1 # IP address of the remote brick
> option remote-port 6996 # default server port is 6996
> option remote-subvolume locks # name of the remote volume
> end-volume
>
> ### Add client feature and attach to remote subvolume of brick1
> volume brick-afr
> type protocol/client
> option transport-type tcp/client # for TCP/IP transport
> option remote-host 192.168.0.2 # IP address of the remote brick
> option remote-port 6996 # default server port is 6996
> option remote-subvolume locks-afr # name of the remote volume
> end-volume
>
> ### Add AFR feature to brick
> volume afr
> type cluster/afr
> subvolumes brick brick-afr
> option replicate *:2 # All files 2 copies (RAID-1)
> end-volume
>
> ----------------------------------------------------------------------------------------------------------------------
> I started the two Bricks in debug mode and it starts without problems.
>
> - Server1
> glusterfsd --no-daemon --log-file=/dev/stdout --log-level=DEBUG
> ....
> [May 10 11:52:11] [DEBUG/proto-srv.c:2919/init()]
> protocol/server:protocol/server xlator loaded
> [May 10 11:52:11] [DEBUG/transport.c:83/transport_load()]
> libglusterfs/transport:attempt to load type tcp/server
> [May 10 11:52:11] [DEBUG/transport.c:88/transport_load()]
> libglusterfs/transport:attempt to load file
> /usr/lib/glusterfs/1.3.0-pre3/transport/tcp/server.so
>
> - Server2
> glusterfsd --no-daemon --log-file=/dev/stdout --log-level=DEBUG
> ....
> [May 10 11:51:44] [DEBUG/proto-srv.c:2919/init()]
> protocol/server:protocol/server xlator loaded
> [May 10 11:51:44] [DEBUG/transport.c:83/transport_load()]
> libglusterfs/transport:attempt to load type tcp/server
> [May 10 11:51:44] [DEBUG/transport.c:88/transport_load()]
> libglusterfs/transport:attempt to load file
> /usr/lib/glusterfs/1.3.0-pre3/transport/tcp/server.so
> ------------------------------------------------------------------------------------------------------------------------------
>
> So far so good.
>
> After I mounted the volume on server 3 (client). It mounts without any
> problems.
> glusterfs --no-daemon --log-file=/dev/stdout --log-level=DEBUG
> --spec-file=/etc/glusterfs/glusterfs-client.vol /var/lib/vservers/mastersql
> ...
> [May 10 13:59:00] [DEBUG/client-protocol.c:2796/init()]
> protocol/client:defaulting transport-timeout to 120
> [May 10 13:59:00] [DEBUG/transport.c:83/transport_load()]
> libglusterfs/transport:attempt to load type tcp/client
> [May 10 13:59:00] [DEBUG/transport.c:88/transport_load()]
> libglusterfs/transport:attempt to load file
> /usr/lib/glusterfs/1.3.0-pre3/transport/tcp/client.so
> [May 10 13:59:00] [DEBUG/tcp-client.c:174/tcp_connect()] transport: tcp:
> :try_connect: socket fd = 8
> [May 10 13:59:00] [DEBUG/tcp-client.c:196/tcp_connect()] transport: tcp:
> :try_connect: finalized on port `1022'
> [May 10 13:59:00] [DEBUG/tcp-client.c:255/tcp_connect()]
> tcp/client:connect on 8 in progress (non-blocking)
> [May 10 13:59:00] [DEBUG/tcp-client.c:293/tcp_connect()]
> tcp/client:connection on 8 still in progress - try later
>
> OK. Nice.
> A short check on the client:
> df -HT
> Filesystem Type Size Used Avail Use% Mounted on
> /dev/sda1 ext3 13G 2.6G 8.9G 23% /
> tmpfs tmpfs 1.1G 0 1.1G 0% /lib/init/rw
> udev tmpfs 11M 46k 11M 1% /dev
> tmpfs tmpfs 1.1G 0 1.1G 0% /dev/shm
> glusterfs:24914
> fuse 9.9G 2.5G 6.9G 27% /var/lib/vservers/mastersql
>
> Wow it works. Now I can add, remove or edit files and directories
> without problems. The file are written to all two bricks without
> problems. Performance is good too.
>
> But then I tried to start my virtual Server (called mastersql).
> The virtual server not starts and I get the a lot of following debug
> output on the client:
>
> [May 10 14:04:43] [DEBUG/tcp-client.c:174/tcp_connect()] transport: tcp:
> :try_connect: socket fd = 4
> [May 10 14:04:43] [DEBUG/tcp-client.c:196/tcp_connect()] transport: tcp:
> :try_connect: finalized on port `1023'
> [May 10 14:04:43] [DEBUG/tcp-client.c:255/tcp_connect()]
> tcp/client:connect on 4 in progress (non-blocking)
> [May 10 14:04:43] [DEBUG/tcp-client.c:293/tcp_connect()]
> tcp/client:connection on 4 still in progress - try later
> [May 10 14:04:43] [ERROR/client-protocol.c:204/client_protocol_xfer()]
> protocol/client:transport_submit failed
> [May 10 14:04:43]
> [DEBUG/client-protocol.c:2604/client_protocol_cleanup()]
> protocol/client:cleaning up state in transport object 0x8076cf0
> [May 10 14:04:43] [DEBUG/tcp-client.c:174/tcp_connect()] transport: tcp:
> :try_connect: socket fd = 7
> [May 10 14:04:43] [DEBUG/tcp-client.c:196/tcp_connect()] transport: tcp:
> :try_connect: finalized on port `1022'
> [May 10 14:04:43] [DEBUG/tcp-client.c:255/tcp_connect()]
> tcp/client:connect on 7 in progress (non-blocking)
> [May 10 14:04:43] [DEBUG/tcp-client.c:293/tcp_connect()]
> tcp/client:connection on 7 still in progress - try later
> [May 10 14:04:43] [ERROR/client-protocol.c:204/client_protocol_xfer()]
> protocol/client:transport_submit failed
> [May 10 14:04:43]
> [DEBUG/client-protocol.c:2604/client_protocol_cleanup()]
> protocol/client:cleaning up state in transport object 0x80762d0
>
> The two mirrorservers are crashing with the following debug code:
>
> [May 10 11:54:26] [DEBUG/tcp-server.c:134/tcp_server_notify()]
> tcp/server:Registering socket (5) for new transport object of 192.168.0.3
> [May 10 11:55:22] [DEBUG/proto-srv.c:2418/mop_setvolume()]
> server-protocol:mop_setvolume: received port = 1022
> [May 10 11:55:22] [DEBUG/proto-srv.c:2434/mop_setvolume()]
> server-protocol:mop_setvolume: IP addr = *, received ip addr = 192.168.0.3
> [May 10 11:55:22] [DEBUG/proto-srv.c:2444/mop_setvolume()]
> server-protocol:mop_setvolume: accepted client from 192.168.0.3
>
> Trying to set: READ Is grantable: READ Inserting: READTrying to set:
> UNLOCK Is grantable: UNLOCK Conflict with: READTrying to set: WRITE
> Is grantable: WRITE Inserting: WRITETrying to set: UNLOCK Is
> grantable: UNLOCK Conflict with: WRITETrying to set: WRITE Is
> grantable: WRITE Inserting: WRITETrying to set: UNLOCK Is grantable:
> UNLOCK Conflict with: WRITETrying to set: WRITE Is grantable: WRITE
> Inserting: WRITE[May 10 12:00:09]
> [CRITICAL/common-utils.c:215/gf_print_trace()] debug-backtrace:Got
> signal (11), printing backtrace
> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:/usr/lib/libglusterfs.so.0(gf_print_trace+0x2e) [0xb7f53a7e]
> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:[0xb7f60420]
> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:/usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so
> [0xb75d1192]
> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:/usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so
> [0xb75cded7]
> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:/usr/lib/libglusterfs.so.0(transport_notify+0x1d)
> [0xb7f54ecd]
> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:/usr/lib/libglusterfs.so.0(sys_epoll_iteration+0xe9)
> [0xb7f55b79]
> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:/usr/lib/libglusterfs.so.0(poll_iteration+0x1d) [0xb7f54f7d]
> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:glusterfsd [0x804924e]
> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xc8)
> [0xb7e17ea8]
> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:glusterfsd [0x8048c51]
> Segmentation fault (core dumped)
>
> It seems that there are come conflicts with "READ, WRITE, UNLOCK". But
> I'm not an expert on filesystems an locking features.
>
> As you can see the filesystem is just mounted but not connected to the
> two bricks.
> df -HT
> Filesystem Type Size Used Avail Use% Mounted on
> /dev/sda1 ext3 13G 2.6G 8.9G 23% /
> tmpfs tmpfs 1.1G 0 1.1G 0% /lib/init/rw
> udev tmpfs 11M 46k 11M 1% /dev
> tmpfs tmpfs 1.1G 0 1.1G 0% /dev/shm
> df: `/var/lib/vservers/mastersql': Transport endpoint is not connected
>
> I'm not sure if i make something wrong (configuration) or if it is a bug!
> Can you experts please help me?
>
> If you need any further information or something please let me know.
>
> Thanks and regards
> Urban
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
--
Anand V. Avati
More information about the Gluster-devel
mailing list