[Gluster-devel] AFR setup with Virtual Servers crashes

Anand Avati avati at zresearch.com
Thu May 10 11:22:34 UTC 2007


Urban,
  this bug has alredy been fixed in the source repository.
thanks,
avati

2007/5/10, Urban Loesch <ul at enas.net>:
> Hi Avati,
>
> thanks for your fast answer.
>
> I use the version glusterfs-1.3.0-pre3 downloaded form your server
> (http://ftp.zresearch.com/pub/gluster/glusterfs/1.3-pre/).
> I will try the latest version from TLA today afternoon and let you know
> what happens.
>
> Here's the backtrace from the core dump
> # gdb glusterfsd -c core.15160
> ..
> Core was generated by `glusterfsd --no-daemon --log-file=/dev/stdout
> --log-level=DEBUG'.
> Program terminated with signal 11, Segmentation fault.
> #0  0xb75d8fd3 in posix_locks_flush () from
> /usr/lib/glusterfs/1.3.0-pre3/xlator/features/posix-locks.so
> (gdb) bt
> #0  0xb75d8fd3 in posix_locks_flush () from
> /usr/lib/glusterfs/1.3.0-pre3/xlator/features/posix-locks.so
> #1  0xb75d1192 in fop_flush () from
> /usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so
> #2  0xb75cded7 in proto_srv_notify () from
> /usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so
> #3  0xb7f54ecd in transport_notify (this=0x804b1a0, event=1) at
> transport.c:148
> #4  0xb7f55b79 in sys_epoll_iteration (ctx=0xbfbc2ff0) at epoll.c:53
> #5  0xb7f54f7d in poll_iteration (ctx=0xbfbc2ff0) at transport.c:251
> #6  0x0804924e in main ()
>
> Yes it is reproducible. It happens every time when I try to start my
> virtual server.
>
> Thanks
> Urban
>
>
> Anand Avati wrote:
> > Urban,
> > which version of glusterfs are you using? if it is from TLA checkout
> > what is the patchset number?
> >
> > you have a core dump generated from the segfault, can you please get a
> > backtrace from it? (gdb glusterfsd -c core.<pid> or gdb glusterfsd -c
> > core, type 'bt' command and paste the output) please.
> >
> > is this easily reproducible? have you checked with the latest TLA
> > checkout?
> >
> > thanks,
> > avati
> >
> > 2007/5/10, Urban Loesch <ul at enas.net>:
> >> Hi,
> >>
> >> I'm new to this list.
> >> First: sorry for my bad english.
> >>
> >> I was searching for some easy and transparent Clusterfilesystem with
> >> failover feature and I found on Wikipedia the GlusterFS project.
> >> It's a nice project and tried it on my test environment. I thought when
> >> it works good I use it in production too.
> >>
> >> A very nice feature for me is the AFR setup. So I can replicate all the
> >> data over 2 Servers in RAID-1 Mode.
> >> But it seems that I make something wrong, because the "glusterfsd"
> >> crashes on both nodes.
> >> But let me explain form the beginning.
> >>
> >> Here's my setup:
> >> Hardware:
> >> 2 different servers for storage
> >> 1 server as client
> >> On top of the server I use a virtual server setup (details
> >> http://linux-vserver.org).
> >>
> >> OS:
> >> Debian Sarge with self compiled 2.6.19.2 (uname -r 2.6.19.2-vs2.2.0) and
> >> latest stable virtual server patch.
> >> glusterfs-1.3.0-pre3.tar.gz
> >>
> >> What I'm trying to do:
> >> - Create a AFR Mirror over the 2 Servers.
> >> - Mount the Volume on Server 3 (Client).
> >> - Install on the mounted volume the hole virtual Server with Apache,
> >> MySql and so on.
> >> So I have a full redundant Virtual Server mirrored over two bricks .
> >>
> >> Here my current confuguration:
> >> - Serverconfig on Server 1 (brick)
> >>
> >> ### Export volume "brick" with the contents of "/home/export" directory.
> >> volume brick
> >>   type storage/posix                   # POSIX FS translator
> >>   option directory /gluster        # Export this directory
> >> end-volume
> >>
> >> ### File Locking
> >> volume locks
> >>   type features/posix-locks
> >>   subvolumes brick
> >> end-volume
> >>
> >> ### Add network serving capability to above brick.
> >> volume server
> >>   type protocol/server
> >>   option transport-type tcp/server     # For TCP/IP transport
> >> option listen-port 6996               # Default is 6996
> >>   subvolumes locks
> >>   option auth.ip.locks.allow *         # access to "brick" volume
> >> end-volume
> >>
> >> - Serverconfig on Server 2 (brick-afr)
> >> ### Export volume "brick" with the contents of "/home/export" directory.
> >> volume brick-afr
> >>   type storage/posix                   # POSIX FS translator
> >>   option directory /gluster-afr        # Export this directory
> >> end-volume
> >>
> >> ### File Locking
> >> volume locks-afr
> >>   type features/posix-locks
> >>   subvolumes brick-afr
> >> end-volume
> >>
> >> ### Add network serving capability to above brick.
> >> volume server
> >>   type protocol/server
> >>   option transport-type tcp/server     # For TCP/IP transport
> >> option listen-port 6996               # Default is 6996
> >>   subvolumes locks-afr
> >>   option auth.ip.locks-afr.allow *         # access to "brick" volume
> >> end-volume
> >>
> >> - Clientconfiguration on Server 3 (
> >> ### Add client feature and attach to remote subvolume of server1
> >> volume brick
> >>   type protocol/client
> >>   option transport-type tcp/client     # for TCP/IP transport
> >>   option remote-host 192.168.0.1      # IP address of the remote brick
> >>   option remote-port 6996              # default server port is 6996
> >>   option remote-subvolume locks        # name of the remote volume
> >> end-volume
> >>
> >> ### Add client feature and attach to remote subvolume of brick1
> >> volume brick-afr
> >>   type protocol/client
> >>   option transport-type tcp/client     # for TCP/IP transport
> >>   option remote-host 192.168.0.2      # IP address of the remote brick
> >>   option remote-port 6996              # default server port is 6996
> >>   option remote-subvolume locks-afr        # name of the remote volume
> >> end-volume
> >>
> >> ### Add AFR feature to brick
> >> volume afr
> >>   type cluster/afr
> >>   subvolumes brick brick-afr
> >>   option replicate *:2                 # All files 2 copies (RAID-1)
> >> end-volume
> >>
> >> ----------------------------------------------------------------------------------------------------------------------
> >>
> >> I started the two Bricks in debug mode and it starts without problems.
> >>
> >> - Server1
> >> glusterfsd --no-daemon --log-file=/dev/stdout --log-level=DEBUG
> >> ....
> >> [May 10 11:52:11] [DEBUG/proto-srv.c:2919/init()]
> >> protocol/server:protocol/server xlator loaded
> >> [May 10 11:52:11] [DEBUG/transport.c:83/transport_load()]
> >> libglusterfs/transport:attempt to load type tcp/server
> >> [May 10 11:52:11] [DEBUG/transport.c:88/transport_load()]
> >> libglusterfs/transport:attempt to load file
> >> /usr/lib/glusterfs/1.3.0-pre3/transport/tcp/server.so
> >>
> >> - Server2
> >> glusterfsd --no-daemon --log-file=/dev/stdout --log-level=DEBUG
> >> ....
> >> [May 10 11:51:44] [DEBUG/proto-srv.c:2919/init()]
> >> protocol/server:protocol/server xlator loaded
> >> [May 10 11:51:44] [DEBUG/transport.c:83/transport_load()]
> >> libglusterfs/transport:attempt to load type tcp/server
> >> [May 10 11:51:44] [DEBUG/transport.c:88/transport_load()]
> >> libglusterfs/transport:attempt to load file
> >> /usr/lib/glusterfs/1.3.0-pre3/transport/tcp/server.so
> >> ------------------------------------------------------------------------------------------------------------------------------
> >>
> >>
> >> So far so good.
> >>
> >> After I mounted the volume on server 3 (client). It mounts without any
> >> problems.
> >> glusterfs --no-daemon --log-file=/dev/stdout --log-level=DEBUG
> >> --spec-file=/etc/glusterfs/glusterfs-client.vol
> >> /var/lib/vservers/mastersql
> >> ...
> >> [May 10 13:59:00] [DEBUG/client-protocol.c:2796/init()]
> >> protocol/client:defaulting transport-timeout to 120
> >> [May 10 13:59:00] [DEBUG/transport.c:83/transport_load()]
> >> libglusterfs/transport:attempt to load type tcp/client
> >> [May 10 13:59:00] [DEBUG/transport.c:88/transport_load()]
> >> libglusterfs/transport:attempt to load file
> >> /usr/lib/glusterfs/1.3.0-pre3/transport/tcp/client.so
> >> [May 10 13:59:00] [DEBUG/tcp-client.c:174/tcp_connect()] transport: tcp:
> >> :try_connect: socket fd = 8
> >> [May 10 13:59:00] [DEBUG/tcp-client.c:196/tcp_connect()] transport: tcp:
> >> :try_connect: finalized on port `1022'
> >> [May 10 13:59:00] [DEBUG/tcp-client.c:255/tcp_connect()]
> >> tcp/client:connect on 8 in progress (non-blocking)
> >> [May 10 13:59:00] [DEBUG/tcp-client.c:293/tcp_connect()]
> >> tcp/client:connection on 8 still in progress - try later
> >>
> >> OK. Nice.
> >> A short check on the client:
> >> df -HT
> >> Filesystem    Type     Size   Used  Avail Use% Mounted on
> >> /dev/sda1     ext3      13G   2.6G   8.9G  23% /
> >> tmpfs        tmpfs     1.1G      0   1.1G   0% /lib/init/rw
> >> udev         tmpfs      11M    46k    11M   1% /dev
> >> tmpfs        tmpfs     1.1G      0   1.1G   0% /dev/shm
> >> glusterfs:24914
> >>               fuse     9.9G   2.5G   6.9G  27%
> >> /var/lib/vservers/mastersql
> >>
> >> Wow it works. Now I can add, remove or edit files and directories
> >> without problems. The file are written to all two bricks without
> >> problems. Performance is good too.
> >>
> >> But then I tried to start my virtual Server (called mastersql).
> >> The virtual server not starts and I get the a lot of following debug
> >> output on the client:
> >>
> >> [May 10 14:04:43] [DEBUG/tcp-client.c:174/tcp_connect()] transport: tcp:
> >> :try_connect: socket fd = 4
> >> [May 10 14:04:43] [DEBUG/tcp-client.c:196/tcp_connect()] transport: tcp:
> >> :try_connect: finalized on port `1023'
> >> [May 10 14:04:43] [DEBUG/tcp-client.c:255/tcp_connect()]
> >> tcp/client:connect on 4 in progress (non-blocking)
> >> [May 10 14:04:43] [DEBUG/tcp-client.c:293/tcp_connect()]
> >> tcp/client:connection on 4 still in progress - try later
> >> [May 10 14:04:43] [ERROR/client-protocol.c:204/client_protocol_xfer()]
> >> protocol/client:transport_submit failed
> >> [May 10 14:04:43]
> >> [DEBUG/client-protocol.c:2604/client_protocol_cleanup()]
> >> protocol/client:cleaning up state in transport object 0x8076cf0
> >> [May 10 14:04:43] [DEBUG/tcp-client.c:174/tcp_connect()] transport: tcp:
> >> :try_connect: socket fd = 7
> >> [May 10 14:04:43] [DEBUG/tcp-client.c:196/tcp_connect()] transport: tcp:
> >> :try_connect: finalized on port `1022'
> >> [May 10 14:04:43] [DEBUG/tcp-client.c:255/tcp_connect()]
> >> tcp/client:connect on 7 in progress (non-blocking)
> >> [May 10 14:04:43] [DEBUG/tcp-client.c:293/tcp_connect()]
> >> tcp/client:connection on 7 still in progress - try later
> >> [May 10 14:04:43] [ERROR/client-protocol.c:204/client_protocol_xfer()]
> >> protocol/client:transport_submit failed
> >> [May 10 14:04:43]
> >> [DEBUG/client-protocol.c:2604/client_protocol_cleanup()]
> >> protocol/client:cleaning up state in transport object 0x80762d0
> >>
> >> The two mirrorservers are crashing with the following debug code:
> >>
> >> [May 10 11:54:26] [DEBUG/tcp-server.c:134/tcp_server_notify()]
> >> tcp/server:Registering socket (5) for new transport object of
> >> 192.168.0.3
> >> [May 10 11:55:22] [DEBUG/proto-srv.c:2418/mop_setvolume()]
> >> server-protocol:mop_setvolume: received port = 1022
> >> [May 10 11:55:22] [DEBUG/proto-srv.c:2434/mop_setvolume()]
> >> server-protocol:mop_setvolume: IP addr = *, received ip addr =
> >> 192.168.0.3
> >> [May 10 11:55:22] [DEBUG/proto-srv.c:2444/mop_setvolume()]
> >> server-protocol:mop_setvolume: accepted client from 192.168.0.3
> >>
> >> Trying to set: READ  Is grantable: READ   Inserting: READTrying to set:
> >> UNLOCK  Is grantable: UNLOCK  Conflict with: READTrying to set: WRITE
> >> Is grantable: WRITE   Inserting: WRITETrying to set: UNLOCK  Is
> >> grantable: UNLOCK  Conflict with: WRITETrying to set: WRITE  Is
> >> grantable: WRITE   Inserting: WRITETrying to set: UNLOCK  Is grantable:
> >> UNLOCK  Conflict with: WRITETrying to set: WRITE  Is grantable: WRITE
> >> Inserting: WRITE[May 10 12:00:09]
> >> [CRITICAL/common-utils.c:215/gf_print_trace()] debug-backtrace:Got
> >> signal (11), printing backtrace
> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> >> debug-backtrace:/usr/lib/libglusterfs.so.0(gf_print_trace+0x2e)
> >> [0xb7f53a7e]
> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> >> debug-backtrace:[0xb7f60420]
> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> >> debug-backtrace:/usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so
> >> [0xb75d1192]
> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> >> debug-backtrace:/usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so
> >> [0xb75cded7]
> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> >> debug-backtrace:/usr/lib/libglusterfs.so.0(transport_notify+0x1d)
> >> [0xb7f54ecd]
> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> >> debug-backtrace:/usr/lib/libglusterfs.so.0(sys_epoll_iteration+0xe9)
> >> [0xb7f55b79]
> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> >> debug-backtrace:/usr/lib/libglusterfs.so.0(poll_iteration+0x1d)
> >> [0xb7f54f7d]
> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> >> debug-backtrace:glusterfsd [0x804924e]
> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> >> debug-backtrace:/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xc8)
> >> [0xb7e17ea8]
> >> [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()]
> >> debug-backtrace:glusterfsd [0x8048c51]
> >> Segmentation fault (core dumped)
> >>
> >> It seems that there are come conflicts with "READ, WRITE, UNLOCK". But
> >> I'm not an expert on filesystems an locking features.
> >>
> >> As you can see the filesystem is just mounted but not connected to the
> >> two bricks.
> >> df -HT
> >> Filesystem    Type     Size   Used  Avail Use% Mounted on
> >> /dev/sda1     ext3      13G   2.6G   8.9G  23% /
> >> tmpfs        tmpfs     1.1G      0   1.1G   0% /lib/init/rw
> >> udev         tmpfs      11M    46k    11M   1% /dev
> >> tmpfs        tmpfs     1.1G      0   1.1G   0% /dev/shm
> >> df: `/var/lib/vservers/mastersql': Transport endpoint is not connected
> >>
> >> I'm not sure if i make something wrong (configuration) or if it is a
> >> bug!
> >> Can you experts please help me?
> >>
> >> If you need any further information or something please let me know.
> >>
> >> Thanks and regards
> >> Urban
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> Gluster-devel at nongnu.org
> >> http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >>
> >
> >
>
>


-- 
Anand V. Avati





More information about the Gluster-devel mailing list