[Gluster-devel] Self Heal/Recovery Problem

Wed Oct 17 14:47:23 UTC 2007

Kamil,
I was looking at the servers.
can you install tla, gcc and other binaries needed to compile
glusterfs on both of your machines? And with debugging symbols
compile glusterfs from source in both the machines instead of
copying the binaries from one machine (just to be extra sure).
Then try to reproduce the problem with a smaller directory tree
structure so that it is easy to pin point the problem.
Catch you over IRC.
Thanks
Krishna

On 10/15/07, Kamil Srot <kamil.srot at nlogy.com> wrote:
> Dear Gluster developers, fans,
>
> as first thing, I want to say big THANK YOU for the work you do. From
> what I saw and tried out (OCFS2, GFS2, CODA, NFS), your system is the
> first one I like and in some way understand the logic behind it. The
> others seems to be too complex, hard to understand and possibly
> reconfigure in case something wents wrong.
> I worked like 2 months with my test setup of OCFS (which is the simplest
> "other" solution of FS clustering) and dont have so nice feeling about
> it than after few days with GlusterFS...
>
> Well, it wouldn't a good post into devel group w/o questions - so I'm
> composing in another window few questions regarding performance/tuning
> of my setup, but recently I run into issue.
>
> I have quite simple setup with two servers doing mirror of data with afr
> *:2 and unify and io-threads...
> The setup worked fine for several days of stress testing but recently I
> found article recommending to use some format parameter of underlaying
> XFS filesystem...
> So I stopped glfs and glfsd on one of the servers and formatted the
> device... have created the exported directories and started the glfsd &
> glfs again... then I tried to kick start the self heal do remirror the
> testing data fith the find -mountpoint -type f ... ops, the glfsd
> segfaults after few seconds - in the log, I have:
>
> The glfs is: mainline--2.5--patch-518
>
> ---------
> got signal (11), printing backtrace
> ---------
> [0xb7f7f420]
> /cluster/lib/glusterfs/1.3.5/xlator/protocol/client.so[0xb7604432]
> /cluster/lib/glusterfs/1.3.5/xlator/protocol/client.so[0xb7606a4b]
> /cluster/lib/glusterfs/1.3.5/xlator/protocol/client.so(notify+0xe5)[0xb7607666]
> /cluster/lib/libglusterfs.so.0(transport_notify+0x62)[0xb7f70a92]
> /cluster/lib/libglusterfs.so.0[0xb7f712fc]
> /cluster/lib/libglusterfs.so.0(sys_epoll_iteration+0x16b)[0xb7f71642]
> /cluster/lib/libglusterfs.so.0(poll_iteration+0x3b)[0xb7f70dce]
> [glusterfsd](main+0x4e3)[0x804991d]
> /lib/tls/libc.so.6(__libc_start_main+0xc8)[0xb7e27ea8]
> [glusterfsd][0x8048e51]
> ---------
>
> And core file in root directory... the backtrace is:
> #0  0xb75574f8 in afr_sync_ownership_permission ()
>    from /cluster/lib/glusterfs/1.3.5/xlator/cluster/afr.so
> #1  0xb7576432 in client_closedir_cbk ()
>    from /cluster/lib/glusterfs/1.3.5/xlator/protocol/client.so
> #2  0xb7578a4b in client_protocol_interpret ()
>    from /cluster/lib/glusterfs/1.3.5/xlator/protocol/client.so
> #3  0xb7579666 in notify () from
> /cluster/lib/glusterfs/1.3.5/xlator/protocol/client.so
> #4  0xb7edfa92 in transport_notify (this=0x8053848, event=1) at
> transport.c:154
> #5  0xb7ee02fc in epoll_notify (eevent=1, data=0x8053848) at epoll.c:53
> #6  0xb7ee0642 in sys_epoll_iteration (ctx=0xbfb026d4) at epoll.c:155
> #7  0xb7edfdce in poll_iteration (ctx=0xbfb026d4) at transport.c:300
> #8  0x0804991d in main ()
>
> It seems to be some problem with permissions?
>
> Any hints/help is greatly appreciated!
>
> *glusterfs-server.vol*
> volume mailspool-ds
>     type storage/posix
>     option directory /data/mailspool-ds
> end-volume
>
> volume mailspool-ns
>     type storage/posix
>     option directory /data/mailspool-ns
> end-volume
>
> volume mailspool-san1-ds
>     type protocol/client
>     option transport-type tcp/client
>     option remote-host 10.0.0.110
>     option remote-subvolume mailspool-ds
> end-volume
>
> volume mailspool-san1-ns
>     type protocol/client
>     option transport-type tcp/client
>     option remote-host 10.0.0.110
>     option remote-subvolume mailspool-ns
> end-volume
>
> volume mailspool-ns-afr
>     type cluster/afr
>     subvolumes mailspool-ns mailspool-san1-ns
>     option replicate *:2
> end-volume
>
> volume mailspool-ds-afr
>     type cluster/afr
>     subvolumes mailspool-ds mailspool-san1-ds
>     option replicate *:2
> end-volume
>
> volume mailspool-unify
>     type cluster/unify
>     subvolumes mailspool-ds-afr
>     option namespace mailspool-ns-afr
>     option scheduler random
> end-volume
> volume mailspool
>     type performance/io-threads
>     option thread-count 8
>     option cache-size 64MB
>     subvolumes mailspool-unify
> end-volume
>
> volume server
>     type protocol/server
>     option transport-type tcp/server
>     subvolumes mailspool
>     option auth.ip.mailspool-ds.allow 10.0.0.*,127.0.0.1
>     option auth.ip.mailspool-ns.allow 10.0.0.*,127.0.0.1
>     option auth.ip.mailspool.allow *
> end-volume
>
> *glusterfs-client.vol
> *volume client
>     type protocol/client
>     option transport-type tcp/client
>     option remote-host 127.0.0.1
>     option remote-subvolume mailspool
> end-volume
>
> volume writebehind
>     type performance/write-behind
>     option aggregate-size 131072 # aggregate block size in bytes
>     subvolumes client
> end-volume
>
> volume readahead
>     type performance/read-ahead
>     option page-size 131072
>     option page-count 2
>     subvolumes writebehind
> end-volume
>
> volume iothreads    #iothreads can give performance a boost
>     type performance/io-threads
>     option thread-count 8
>     option cache-size 64MB
>     subvolumes readahead
> end-volume*
> *
>
> Best Regards,
> --
> Kamil
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>