[Gluster-devel] Self Heal/Recovery Problem

Mon Oct 15 16:36:34 UTC 2007

Dear Gluster developers, fans,

as first thing, I want to say big THANK YOU for the work you do. From 
what I saw and tried out (OCFS2, GFS2, CODA, NFS), your system is the 
first one I like and in some way understand the logic behind it. The 
others seems to be too complex, hard to understand and possibly 
reconfigure in case something wents wrong.
I worked like 2 months with my test setup of OCFS (which is the simplest 
"other" solution of FS clustering) and dont have so nice feeling about 
it than after few days with GlusterFS...

Well, it wouldn't a good post into devel group w/o questions - so I'm 
composing in another window few questions regarding performance/tuning 
of my setup, but recently I run into issue.

I have quite simple setup with two servers doing mirror of data with afr 
*:2 and unify and io-threads...
The setup worked fine for several days of stress testing but recently I 
found article recommending to use some format parameter of underlaying 
XFS filesystem...
So I stopped glfs and glfsd on one of the servers and formatted the 
device... have created the exported directories and started the glfsd & 
glfs again... then I tried to kick start the self heal do remirror the 
testing data fith the find -mountpoint -type f ... ops, the glfsd 
segfaults after few seconds - in the log, I have:

The glfs is: mainline--2.5--patch-518

---------
got signal (11), printing backtrace
---------
[0xb7f7f420]
/cluster/lib/glusterfs/1.3.5/xlator/protocol/client.so[0xb7604432]
/cluster/lib/glusterfs/1.3.5/xlator/protocol/client.so[0xb7606a4b]
/cluster/lib/glusterfs/1.3.5/xlator/protocol/client.so(notify+0xe5)[0xb7607666]
/cluster/lib/libglusterfs.so.0(transport_notify+0x62)[0xb7f70a92]
/cluster/lib/libglusterfs.so.0[0xb7f712fc]
/cluster/lib/libglusterfs.so.0(sys_epoll_iteration+0x16b)[0xb7f71642]
/cluster/lib/libglusterfs.so.0(poll_iteration+0x3b)[0xb7f70dce]
[glusterfsd](main+0x4e3)[0x804991d]
/lib/tls/libc.so.6(__libc_start_main+0xc8)[0xb7e27ea8]
[glusterfsd][0x8048e51]
---------

And core file in root directory... the backtrace is:
#0  0xb75574f8 in afr_sync_ownership_permission ()
   from /cluster/lib/glusterfs/1.3.5/xlator/cluster/afr.so
#1  0xb7576432 in client_closedir_cbk ()
   from /cluster/lib/glusterfs/1.3.5/xlator/protocol/client.so
#2  0xb7578a4b in client_protocol_interpret ()
   from /cluster/lib/glusterfs/1.3.5/xlator/protocol/client.so
#3  0xb7579666 in notify () from 
/cluster/lib/glusterfs/1.3.5/xlator/protocol/client.so
#4  0xb7edfa92 in transport_notify (this=0x8053848, event=1) at 
transport.c:154
#5  0xb7ee02fc in epoll_notify (eevent=1, data=0x8053848) at epoll.c:53
#6  0xb7ee0642 in sys_epoll_iteration (ctx=0xbfb026d4) at epoll.c:155
#7  0xb7edfdce in poll_iteration (ctx=0xbfb026d4) at transport.c:300
#8  0x0804991d in main ()

It seems to be some problem with permissions?

Any hints/help is greatly appreciated!

*glusterfs-server.vol*
volume mailspool-ds
    type storage/posix
    option directory /data/mailspool-ds
end-volume

volume mailspool-ns
    type storage/posix
    option directory /data/mailspool-ns
end-volume

volume mailspool-san1-ds
    type protocol/client
    option transport-type tcp/client
    option remote-host 10.0.0.110
    option remote-subvolume mailspool-ds
end-volume

volume mailspool-san1-ns
    type protocol/client
    option transport-type tcp/client
    option remote-host 10.0.0.110
    option remote-subvolume mailspool-ns
end-volume

volume mailspool-ns-afr
    type cluster/afr
    subvolumes mailspool-ns mailspool-san1-ns
    option replicate *:2
end-volume

volume mailspool-ds-afr
    type cluster/afr
    subvolumes mailspool-ds mailspool-san1-ds
    option replicate *:2
end-volume

volume mailspool-unify
    type cluster/unify
    subvolumes mailspool-ds-afr
    option namespace mailspool-ns-afr
    option scheduler random
end-volume
volume mailspool
    type performance/io-threads
    option thread-count 8
    option cache-size 64MB
    subvolumes mailspool-unify
end-volume

volume server
    type protocol/server
    option transport-type tcp/server
    subvolumes mailspool
    option auth.ip.mailspool-ds.allow 10.0.0.*,127.0.0.1
    option auth.ip.mailspool-ns.allow 10.0.0.*,127.0.0.1
    option auth.ip.mailspool.allow *
end-volume

*glusterfs-client.vol
*volume client
    type protocol/client
    option transport-type tcp/client
    option remote-host 127.0.0.1
    option remote-subvolume mailspool
end-volume

volume writebehind
    type performance/write-behind
    option aggregate-size 131072 # aggregate block size in bytes
    subvolumes client
end-volume

volume readahead
    type performance/read-ahead
    option page-size 131072
    option page-count 2
    subvolumes writebehind
end-volume

volume iothreads    #iothreads can give performance a boost
    type performance/io-threads
    option thread-count 8
    option cache-size 64MB
    subvolumes readahead
end-volume*
*

Best Regards,
-- 
Kamil