[Gluster-devel] AFR setup with Virtual Servers crashes
Urban Loesch
ul at enas.net
Thu May 10 11:40:21 UTC 2007
Hi Avati,
many thanks. I will try out the lastest code from the source repository.
thanks,
Urban
Anand Avati wrote:
> Urban,
> this bug has alredy been fixed in the source repository.
> thanks,
> avati
>
> 2007/5/10, Urban Loesch <ul at enas.net>:
>> Hi Avati,
>>
>> thanks for your fast answer.
>>
>> I use the version glusterfs-1.3.0-pre3 downloaded form your server
>> (http://ftp.zresearch.com/pub/gluster/glusterfs/1.3-pre/).
>> I will try the latest version from TLA today afternoon and let you know
>> what happens.
>>
>> Here's the backtrace from the core dump
>> # gdb glusterfsd -c core.15160
>> ..
>> Core was generated by `glusterfsd --no-daemon --log-file=/dev/stdout
>> --log-level=DEBUG'.
>> Program terminated with signal 11, Segmentation fault.
>> #0 0xb75d8fd3 in posix_locks_flush () from
>> /usr/lib/glusterfs/1.3.0-pre3/xlator/features/posix-locks.so
>> (gdb) bt
>> #0 0xb75d8fd3 in posix_locks_flush () from
>> /usr/lib/glusterfs/1.3.0-pre3/xlator/features/posix-locks.so
>> #1 0xb75d1192 in fop_flush () from
>> /usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so
>> #2 0xb75cded7 in proto_srv_notify () from
>> /usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so
>> #3 0xb7f54ecd in transport_notify (this=0x804b1a0, event=1) at
>> transport.c:148
>> #4 0xb7f55b79 in sys_epoll_iteration (ctx=0xbfbc2ff0) at epoll.c:53
>> #5 0xb7f54f7d in poll_iteration (ctx=0xbfbc2ff0) at transport.c:251
>> #6 0x0804924e in main ()
>>
>> Yes it is reproducible. It happens every time when I try to start my
>> virtual server.
>>
>> Thanks
>> Urban
>>
>>
>> Anand Avati wrote:
>> > Urban,
>> > which version of glusterfs are you using? if it is from TLA checkout
>> > what is the patchset number?
>> >
>> > you have a core dump generated from the segfault, can you please get a
>> > backtrace from it? (gdb glusterfsd -c core.<pid> or gdb glusterfsd -c
>> > core, type 'bt' command and paste the output) please.
>> >
>> > is this easily reproducible? have you checked with the latest TLA
>> > checkout?
>> >
>> > thanks,
>> > avati
>> >
>> > 2007/5/10, Urban Loesch <ul at enas.net>:
>> >> Hi,
>> >>
>> >> I'm new to this list.
>> >> First: sorry for my bad english.
>> >>
>> >> I was searching for some easy and transparent Clusterfilesystem with
>> >> failover feature and I found on Wikipedia the GlusterFS project.
>> >> It's a nice project and tried it on my test environment. I thought
>> when
>> >> it works good I use it in production too.
>> >>
>> >> A very nice feature for me is the AFR setup. So I can replicate
>> all the
>> >> data over 2 Servers in RAID-1 Mode.
>> >> But it seems that I make something wrong, because the "glusterfsd"
>> >> crashes on both nodes.
>> >> But let me explain form the beginning.
>> >>
>> >> Here's my setup:
>> >> Hardware:
>> >> 2 different servers for storage
>> >> 1 server as client
>> >> On top of the server I use a virtual server setup (details
>> >> http://linux-vserver.org).
>> >>
>> >> OS:
>> >> Debian Sarge with self compiled 2.6.19.2 (uname -r
>> 2.6.19.2-vs2.2.0) and
>> >> latest stable virtual server patch.
>> >> glusterfs-1.3.0-pre3.tar.gz
>> >>
>> >> What I'm trying to do:
>> >> - Create a AFR Mirror over the 2 Servers.
>> >> - Mount the Volume on Server 3 (Client).
>> >> - Install on the mounted volume the hole virtual Server with Apache,
>> >> MySql and so on.
>> >> So I have a full redundant Virtual Server mirrored over two bricks .
>> >>
>> >> Here my current confuguration:
>> >> - Serverconfig on Server 1 (brick)
>> >>
>> >> ### Export volume "brick" with the contents of "/home/export"
>> directory.
>> >> volume brick
>> >> type storage/posix # POSIX FS translator
>> >> option directory /gluster # Export this directory
>> >> end-volume
>> >>
>> >> ### File Locking
>> >> volume locks
>> >> type features/posix-locks
>> >> subvolumes brick
>> >> end-volume
>> >>
>> >> ### Add network serving capability to above brick.
>> >> volume server
>> >> type protocol/server
>> >> option transport-type tcp/server # For TCP/IP transport
>> >> option listen-port 6996 # Default is 6996
>> >> subvolumes locks
>> >> option auth.ip.locks.allow * # access to "brick" volume
>> >> end-volume
>> >>
>> >> - Serverconfig on Server 2 (brick-afr)
>> >> ### Export volume "brick" with the contents of "/home/export"
>> directory.
>> >> volume brick-afr
>> >> type storage/posix # POSIX FS translator
>> >> option directory /gluster-afr # Export this directory
>> >> end-volume
>> >>
>> >> ### File Locking
>> >> volume locks-afr
>> >> type features/posix-locks
>> >> subvolumes brick-afr
>> >> end-volume
>> >>
>> >> ### Add network serving capability to above brick.
>> >> volume server
>> >> type protocol/server
>> >> option transport-type tcp/server # For TCP/IP transport
>> >> option listen-port 6996 # Default is 6996
>> >> subvolumes locks-afr
>> >> option auth.ip.locks-afr.allow * # access to "brick" volume
>> >> end-volume
>> >>
>> >> - Clientconfiguration on Server 3 (
>> >> ### Add client feature and attach to remote subvolume of server1
>> >> volume brick
>> >> type protocol/client
>> >> option transport-type tcp/client # for TCP/IP transport
>> >> option remote-host 192.168.0.1 # IP address of the remote
>> brick
>> >> option remote-port 6996 # default server port is 6996
>> >> option remote-subvolume locks # name of the remote volume
>> >> end-volume
>> >>
>> >> ### Add client feature and attach to remote subvolume of brick1
>> >> volume brick-afr
>> >> type protocol/client
>> >> option transport-type tcp/client # for TCP/IP transport
>> >> option remote-host 192.168.0.2 # IP address of the remote
>> brick
>> >> option remote-port 6996 # default server port is 6996
>> >> option remote-subvolume locks-afr # name of the remote
>> volume
>> >> end-volume
>> >>
>> >> ### Add AFR feature to brick
>> >> volume afr
>> >> type cluster/afr
>> >> subvolumes brick brick-afr
>> >> option replicate *:2 # All files 2 copies (RAID-1)
>> >> end-volume
>> >>
>> >>
>> ----------------------------------------------------------------------------------------------------------------------
>>
>> >>
>> >> I started the two Bricks in debug mode and it starts without
>> problems.
>> >>
>> >> - Server1
>> >> glusterfsd --no-daemon --log-file=/dev/stdout --log-level=DEBUG
>> >> ....
>> >> [May 10 11:52:11] [DEBUG/proto-srv.c:2919/init()]
>> >> protocol/server:protocol/server xlator loaded
>> >> [May 10 11:52:11] [DEBUG/transport.c:83/transport_load()]
>> >> libglusterfs/transport:attempt to load type tcp/server
>> >> [May 10 11:52:11] [DEBUG/transport.c:88/transport_load()]
>> >> libglusterfs/transport:attempt to load file
>> >> /usr/lib/glusterfs/1.3.0-pre3/transport/tcp/server.so
>> >>
>> >> - Server2
>> >> glusterfsd --no-daemon --log-file=/dev/stdout --log-level=DEBUG
>> >> ....
>> >> [May 10 11:51:44] [DEBUG/proto-srv.c:2919/init()]
>> >> protocol/server:protocol/server xlator loaded
>> >> [May 10 11:51:44] [DEBUG/transport.c:83/transport_load()]
>> >> libglusterfs/transport:attempt to load type tcp/server
>> >> [May 10 11:51:44] [DEBUG/transport.c:88/transport_load()]
>> >> libglusterfs/transport:attempt to load file
>> >> /usr/lib/glusterfs/1.3.0-pre3/transport/tcp/server.so
>> >>
>> ------------------------------------------------------------------------------------------------------------------------------
>>
>> >>
>> >>
>> >> So far so good.
>> >>
>> >> After I mounted the volume on server 3 (client). It mounts without
>> any
>> >> problems.
>> >> glusterfs --no-daemon --log-file=/dev/stdout --log-level=DEBUG
>> >> --spec-file=/etc/glusterfs/glusterfs-client.vol
>> >> /var/lib/vservers/mastersql
>> >> ...
>> >> [May 10 13:59:00] [DEBUG/client-protocol.c:2796/init()]
>> >> protocol/client:defaulting transport-timeout to 120
>> >> [May 10 13:59:00] [DEBUG/transport.c:83/transport_load()]
>> >> libglusterfs/transport:attempt to load type tcp/client
>> >> [May 10 13:59:00] [DEBUG/transport.c:88/transport_load()]
>> >> libglusterfs/transport:attempt to load file
>> >> /usr/lib/glusterfs/1.3.0-pre3/transport/tcp/client.so
>> >> [May 10 13:59:00] [DEBUG/tcp-client.c:174/tcp_connect()]
>> transport: tcp:
>> >> :try_connect: socket fd = 8
>> >> [May 10 13:59:00] [DEBUG/tcp-client.c:196/tcp_connect()]
>> transport: tcp:
>> >> :try_connect: finalized on port `1022'
>> >> [May 10 13:59:00] [DEBUG/tcp-client.c:255/tcp_connect()]
>> >> tcp/client:connect on 8 in progress (non-blocking)
>> >> [May 10 13:59:00] [DEBUG/tcp-client.c:293/tcp_connect()]
>> >> tcp/client:connection on 8 still in progress - try later
>> >>
>> >> OK. Nice.
>> >> A short check on the client:
>> >> df -HT
>> >> Filesystem Type Size Used Avail Use% Mounted on
>> >> /dev/sda1 ext3 13G 2.6G 8.9G 23% /
>> >> tmpfs tmpfs 1.1G 0 1.1G 0% /lib/init/rw
>> >> udev tmpfs 11M 46k 11M 1% /dev
>> >> tmpfs tmpfs 1.1G 0 1.1G 0% /dev/shm
>> >> glusterfs:24914
>> >> fuse 9.9G 2.5G 6.9G 27%
>> >> /var/lib/vservers/mastersql
>> >>
>> >> Wow it works. Now I can add, remove or edit files and directories
>> >> without problems. The file are written to all two bricks without
>> >> problems. Performance is good too.
>> >>
>> >> But then I tried to start my virtual Server (called mastersql).
>> >> The virtual server not starts and I get the a lot of following debug
>> >> output on the client:
>> >>
>> >> [May 10 14:04:43] [DEBUG/tcp-client.c:174/tcp_connect()]
>> transport: tcp:
>> >> :try_connect: socket fd = 4
>> >> [May 10 14:04:43] [DEBUG/tcp-client.c:196/tcp_connect()]
>> transport: tcp:
>> >> :try_connect: finalized on port `1023'
>> >> [May 10 14:04:43] [DEBUG/tcp-client.c:255/tcp_connect()]
>> >> tcp/client:connect on 4 in progress (non-blocking)
>> >> [May 10 14:04:43] [DEBUG/tcp-client.c:293/tcp_connect()]
>> >> tcp/client:connection on 4 still in progress - try later
>> >> [May 10 14:04:43]
>> [ERROR/client-protocol.c:204/client_protocol_xfer()]
>> >> protocol/client:transport_submit failed
>> >> [May 10 14:04:43]
>> >> [DEBUG/client-protocol.c:2604/client_protocol_cleanup()]
>> >> protocol/client:cleaning up state in transport object 0x8076cf0
>> >> [May 10 14:04:43] [DEBUG/tcp-client.c:174/tcp_connect()]
>> transport: tcp:
>> >> :try_connect: socket fd = 7
>> >> [May 10 14:04:43] [DEBUG/tcp-client.c:196/tcp_connect()]
>> transport: tcp:
>> >> :try_connect: finalized on port `1022'
>> >> [May 10 14:04:43] [DEBUG/tcp-client.c:255/tcp_connect()]
>> >> tcp/client:connect on 7 in progress (non-blocking)
>> >> [May 10 14:04:43] [DEBUG/tcp-client.c:293/tcp_connect()]
>> >> tcp/client:connection on 7 still in progress - try later
>> >> [May 10 14:04:43]
>> [ERROR/client-protocol.c:204/client_protocol_xfer()]
>> >> protocol/client:transport_submit failed
>> >> [May 10 14:04:43]
>> >> [DEBUG/client-protocol.c:2604/client_protocol_cleanup()]
>> >> protocol/client:cleaning up state in transport object 0x80762d0
>> >>
>> >> The two mirrorservers are crashing with the following debug code:
>> >>
>> >> [May 10 11:54:26] [DEBUG/tcp-server.c:134/tcp_server_notify()]
>> >> tcp/server:Registering socket (5) for new transport object of
>> >> 192.168.0.3
>> >> [May 10 11:55:22] [DEBUG/proto-srv.c:2418/mop_setvolume()]
>> >> server-protocol:mop_setvolume: received port = 1022
>> >> [May 10 11:55:22] [DEBUG/proto-srv.c:2434/mop_setvolume()]
>> >> server-protocol:mop_setvolume: IP addr = *, received ip addr =
>> >> 192.168.0.3
>> >> [May 10 11:55:22] [DEBUG/proto-srv.c:2444/mop_setvolume()]
>> >> server-protocol:mop_setvolume: accepted client from 192.168.0.3
>> >>
>> >> Trying to set: READ Is grantable: READ Inserting: READTrying to
>> set:
>> >> UNLOCK Is grantable: UNLOCK Conflict with: READTrying to set: WRITE
>> >> Is grantable: WRITE Inserting: WRITETrying to set: UNLOCK Is
>> >> grantable: UNLOCK Conflict with: WRITETrying to set: WRITE Is
>> >> grantable: WRITE Inserting: WRITETrying to set: UNLOCK Is
>> grantable:
>> >> UNLOCK Conflict with: WRITETrying to set: WRITE Is grantable: WRITE
>> >> Inserting: WRITE[May 10 12:00:09]
>> >> [CRITICAL/common-utils.c:215/gf_print_trace()] debug-backtrace:Got
>> >> signal (11), printing backtrace
>> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
>> >> debug-backtrace:/usr/lib/libglusterfs.so.0(gf_print_trace+0x2e)
>> >> [0xb7f53a7e]
>> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
>> >> debug-backtrace:[0xb7f60420]
>> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
>> >>
>> debug-backtrace:/usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so
>> >> [0xb75d1192]
>> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
>> >>
>> debug-backtrace:/usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so
>> >> [0xb75cded7]
>> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
>> >> debug-backtrace:/usr/lib/libglusterfs.so.0(transport_notify+0x1d)
>> >> [0xb7f54ecd]
>> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
>> >> debug-backtrace:/usr/lib/libglusterfs.so.0(sys_epoll_iteration+0xe9)
>> >> [0xb7f55b79]
>> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
>> >> debug-backtrace:/usr/lib/libglusterfs.so.0(poll_iteration+0x1d)
>> >> [0xb7f54f7d]
>> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
>> >> debug-backtrace:glusterfsd [0x804924e]
>> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
>> >> debug-backtrace:/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xc8)
>> >> [0xb7e17ea8]
>> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
>> >> debug-backtrace:glusterfsd [0x8048c51]
>> >> Segmentation fault (core dumped)
>> >>
>> >> It seems that there are come conflicts with "READ, WRITE, UNLOCK".
>> But
>> >> I'm not an expert on filesystems an locking features.
>> >>
>> >> As you can see the filesystem is just mounted but not connected to
>> the
>> >> two bricks.
>> >> df -HT
>> >> Filesystem Type Size Used Avail Use% Mounted on
>> >> /dev/sda1 ext3 13G 2.6G 8.9G 23% /
>> >> tmpfs tmpfs 1.1G 0 1.1G 0% /lib/init/rw
>> >> udev tmpfs 11M 46k 11M 1% /dev
>> >> tmpfs tmpfs 1.1G 0 1.1G 0% /dev/shm
>> >> df: `/var/lib/vservers/mastersql': Transport endpoint is not
>> connected
>> >>
>> >> I'm not sure if i make something wrong (configuration) or if it is a
>> >> bug!
>> >> Can you experts please help me?
>> >>
>> >> If you need any further information or something please let me know.
>> >>
>> >> Thanks and regards
>> >> Urban
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Gluster-devel mailing list
>> >> Gluster-devel at nongnu.org
>> >> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>> >>
>> >
>> >
>>
>>
>
>
More information about the Gluster-devel
mailing list