[Gluster-devel] GlusterFS 1.3 mainlin 2.5-patch-317 crashes when second afr-node goes down
Anand Avati
avati at zresearch.com
Thu Jul 19 06:44:31 UTC 2007
Urban,
this bug fix will be available in mainline when glusterfs--tmp--2.5 is
merged (very soon)
thanks,
avati
2007/7/18, Urban Loesch <ul at enas.net>:
>
> Hi,
>
> I checked out the latest source and compiled it with debian testing.
> I tried the setup with 2 Servers as described on
>
> http://www.gluster.org/docs/index.php/GlusterFS_High_Availability_Storage_with_GlusterFS
>
> (thanks to Paul England).
>
> Attached you can find server and client configuration.
> - I use 3 Servers with debian testing installed
> - fuse version:
> ii fuse-utils 2.6.5-1 Filesystem
> in USErspace (utilities)
> ii libfuse-dev 2.6.5-1 Filesystem
> in USErspace (development files)
> ii libfuse2 2.6.5-1 Filesystem
> in USErspace library
> fuse init (API version 7.8)
> - 2 servers for Storage
> - 1 server as a client
>
> Everything works until I shutdown the second storage server, and try to
> write a file or do some ls on the client the "glusterfd" on server one
> will crash and the client gives me the error message:
>
> ls: /mnt/gluster/data1/: Transport endpoint is not connected
> ls: /mnt/gluster/data1/: Transport endpoint is not connected
>
> The log from server 1 is attached (error-server1.txt)
>
> Have you any idea where the error could be?
> If you need further information please let me know.
>
> Thanks and regards
> Urban Loesch
>
> 2007-07-18 16:50:14 E [tcp-client.c:170:tcp_connect] data1-gluster2-ds:
> non-blocking connect() returned: 111 (Connection refused)
> 2007-07-18 16:50:14 W [client-protocol.c:340:client_protocol_xfer]
> data1-gluster2-ds: not connected at the moment to submit frame type(0)
> op(22)
> 2007-07-18 16:50:14 D [tcp-client.c:70:tcp_connect] data1-gluster2-ds:
> socket fd = 3
> 2007-07-18 16:50:14 D [tcp-client.c:88:tcp_connect] data1-gluster2-ds:
> finalized on port `1023'
> 2007-07-18 16:50:14 D [tcp-client.c:109:tcp_connect] data1-gluster2-ds:
> defaulting remote-port to 6996
> 2007-07-18 16:50:14 D [tcp-client.c:141:tcp_connect] data1-gluster2-ds:
> connect on 3 in progress (non-blocking)
> 2007-07-18 16:50:14 E [tcp-client.c:170:tcp_connect] data1-gluster2-ds:
> non-blocking connect() returned: 111 (Connection refused)
> 2007-07-18 16:50:14 W [client-protocol.c:340:client_protocol_xfer]
> data1-gluster2-ds: not connected at the moment to submit frame type(0)
> op(20)
> 2007-07-18 16:50:14 E [afr.c:576:afr_getxattr_cbk] data1-ds-afr: (path=)
> op_ret=-1 op_errno=107
> 2007-07-18 16:50:14 C [common-utils.c:208:gf_print_trace] debug-backtrace:
> Got signal (11), printing backtrace
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/libglusterfs.so.0(gf_print_trace+0x2b) [0xb7fc44fb]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /lib/i686/cmov/libc.so.6 [0xb7e7e208]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/glusterfs/1.3.pre6/xlator/protocol/server.so [0xb7607e1f]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/libglusterfs.so.0 [0xb7fc32a7]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/glusterfs/1.3.pre6/xlator/cluster/unify.so [0xb762410c]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/glusterfs/1.3.pre6/xlator/cluster/afr.so [0xb762cf4c]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/glusterfs/1.3.pre6/xlator/protocol/client.so [0xb7645da2]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/glusterfs/1.3.pre6/xlator/protocol/client.so [0xb763f160]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/glusterfs/1.3.pre6/xlator/protocol/client.so [0xb7640c9c]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/glusterfs/1.3.pre6/xlator/cluster/afr.so [0xb762d0da]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/glusterfs/1.3.pre6/xlator/cluster/unify.so(unify_getxattr+0x140)
> [0xb7624253]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/libglusterfs.so.0(default_getxattr+0xe1) [0xb7fc338f]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/glusterfs/1.3.pre6/xlator/protocol/server.so [0xb760ce6f]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/libglusterfs.so.0 [0xb7fcc151]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/libglusterfs.so.0(call_resume+0x33) [0xb7fce6b5]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/glusterfs/1.3.pre6/xlator/protocol/server.so [0xb760d067]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/glusterfs/1.3.pre6/xlator/protocol/server.so [0xb76114f1]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/glusterfs/1.3.pre6/xlator/protocol/server.so(notify+0xc9)
> [0xb7611e8b]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/libglusterfs.so.0(transport_notify+0x62) [0xb7fc631f]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/libglusterfs.so.0 [0xb7fc6a28]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/libglusterfs.so.0(sys_epoll_iteration+0x147) [0xb7fc6d0c]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /usr/lib/libglusterfs.so.0(poll_iteration+0x1d) [0xb7fc654c]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> [glusterfsd] [0x8049340]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> /lib/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7e6a030]
> 2007-07-18 16:50:14 C [common-utils.c:210:gf_print_trace] debug-backtrace:
> [glusterfsd] [0x8048d51]
>
> ### Add client feature and attach to remote subvolume
> volume gluster1
> type protocol/client
> option transport-type tcp/client # for TCP/IP transport
> option remote-host 10.137.252.137 # IP address of the remote
> brick
> option remote-subvolume data1 # name of the remote volume
> end-volume
>
> volume gluster2
> type protocol/client
> option transport-type tcp/client # for TCP/IP transport
> option remote-host 10.137.252.138 # IP address of the remote brick
> option remote-subvolume data1 # name of the remote volume
> end-volume
>
> ### Add writeback feature
> volume writeback
> type performance/write-behind
> option aggregate-size 131072 # unit in bytes
> subvolumes gluster1
> end-volume
>
> ### Add readahead feature
> volume readahead
> type performance/read-ahead
> option page-size 65536 # unit in bytes
> option page-count 16 # cache per file = (page-count x
> page-size)
> subvolumes writeback
> end-volume
>
> volume data1-ds
> type storage/posix # POSIX FS
> translator
> option directory /glusterfs/data1 # Export this
> directory
> end-volume
>
> volume data1-ns
> type storage/posix # POSIX FS
> translator
> option directory /glusterfs/namespace1 # Export this
> directory
> end-volume
>
> volume data1-gluster1-ds
> type protocol/client
> option transport-type tcp/client
> option remote-host 127.0.0.1
> option remote-subvolume data1-ds
> end-volume
>
> volume data1-gluster1-ns
> type protocol/client
> option transport-type tcp/client
> option remote-host 127.0.0.1
> option remote-subvolume data1-ns
> end-volume
>
> volume data1-gluster2-ds
> type protocol/client
> option transport-type tcp/client
> option remote-host 192.168.0.138
> option remote-subvolume data1-ds
> end-volume
>
> volume data1-gluster2-ns
> type protocol/client
> option transport-type tcp/client
> option remote-host 192.168.0.138
> option remote-subvolume data1-ns
> end-volume
>
> # Add AFR to Datastorage
> volume data1-ds-afr
> type cluster/afr
> # There appears to be a bug with AFR and Local Posix Volumes.
> # To get around this we pretend the local volume is remote with
> an extra client volume named mailspool-santa1-ds.
> subvolumes data1-gluster1-ds data1-gluster2-ds
> option replicate *:2
> end-volume
>
> # Add AFR to Namespacestorage
> volume data1-ns-afr
> type cluster/afr
> # There appears to be a bug with AFR and Local Posix Volumes.
> # To get around this we pretend the local volume is remote with
> an extra client volume named mailspool-santa1-ns.
> # subvolumes mailspool-ns mailspool-santa2-ns
> mailspool-santa3-ns
> subvolumes data1-gluster1-ns data1-gluster2-ns
> option replicate *:2
> end-volume
>
> # Unify
> volume data1-unify
> type cluster/unify
> subvolumes data1-ds-afr
> option namespace data1-ns-afr
> option scheduler rr
> end-volume
>
> # Performance
> volume data1
> type performance/io-threads
> option thread-count 8
> option cache-size 64MB
> subvolumes data1-unify
> end-volume
>
> ### Add network serving capability to above brick.
> volume server
> type protocol/server
> option transport-type tcp/server # For TCP/IP transport
> subvolumes data1
> option auth.ip.data1-ds.allow 192.168.0.*,127.0.0.1 # Allow access to
> "brick" volume
> option auth.ip.data1-ns.allow 192.168.0.*,127.0.0.1 # Allow access to
> "brick" volume
> option auth.ip.data1.allow * # Allow access to "brick" volume
> end-volume
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
--
Anand V. Avati
More information about the Gluster-devel
mailing list