[Gluster-devel] Crashing glusterfs server / sync not working

Guido Smit guido at comlog.nl
Thu Jan 31 13:19:18 UTC 2008


I upgraded to the latest tla (644) and added posix/locks to the server 
configs.

Self-heal works now and the server didn't crash now. It' still up and 
running. LS is a bit slow, but I think it's because the self-heal is 
working very hard in the background (I can see a lot of messages in the 
logs).


Guido Smit wrote:
> Anand,
>
> The client log was
> 2008-01-31 10:04:24 W [client-protocol.c:288:client_protocol_xfer] 
> mailspool: attempting to pipeline request type(0) op(22) with handshake
> 2008-01-31 10:08:00 W [client-protocol.c:209:call_bail] mailspool: 
> activating bail-out. pending frames = 2. last sent = 2008-01-31 
> 10:04:24. last received = 1970-01-01 01:00:00 transport-timeout = 108
> 2008-01-31 10:08:00 C [client-protocol.c:217:call_bail] mailspool: 
> bailing transport
> 2008-01-31 10:08:00 W [client-protocol.c:4503:client_protocol_cleanup] 
> mailspool: cleaning up state in transport object 0x96a9930
> 2008-01-31 10:08:00 E [client-protocol.c:4555:client_protocol_cleanup] 
> mailspool: forced unwinding frame type(0) op(34) reply=@0x9738e00
> 2008-01-31 10:08:00 E [fuse-bridge.c:431:fuse_entry_cbk] 
> glusterfs-fuse: 18395: / => -1 (107)
> 2008-01-31 10:08:00 E [client-protocol.c:4555:client_protocol_cleanup] 
> mailspool: forced unwinding frame type(0) op(22) reply=@0x9738e00
> 2008-01-31 10:08:00 E [fuse-bridge.c:670:fuse_fd_cbk] glusterfs-fuse: 
> 18396: /blockbox.nl => -1 (107)
> 2008-01-31 10:08:00 C [tcp.c:81:tcp_disconnect] mailspool: connection 
> disconnected
>
> How do I get a backtrace from gdb?
>
> Anand Avati wrote:
>> Guido,
>>  can you get a backtrace from gdb of the core? also what was the 
>> client log at that time?
>>
>> avati
>>
>> 2008/1/31, Guido Smit <guido at comlog.nl <mailto:guido at comlog.nl>>:
>>
>>     Hi all,
>>
>>     I have on my 2 Centos5 machines fuse2.7.2gls8 and glusterfs tla .
>>     Everything works fine, as long as I don't sync the machines.
>>     One of the things I see al the time is that most files are not
>>     synced on
>>     both servers. When I try to force a sync using find /mail -type f
>>     -exec
>>     head -c 1 {} \; >/dev/null
>>
>>     I get the following crash after a few minutes:
>>
>>     2008-01-31 10:08:00 E [server-protocol.c:178:generic_reply] server:
>>     transport_writev failed
>>     2008-01-31 10:08:00 D [inode.c:308:__destroy_inode] mail/inode:
>>     destroy
>>     inode(0) [@0xb7e23a60]
>>
>>     ---------
>>     got signal (11), printing backtrace
>>     ---------
>>     [0x537420]
>>     //lib/libglusterfs.so.0[0xd4390c]
>>     //lib/libglusterfs.so.0[0xd4390c]
>>     //lib/libglusterfs.so.0[0xd4390c]
>>     
>> //lib/glusterfs/1.3.8/xlator/cluster/unify.so(unify_opendir_cbk+0xa3)[0x2e07f3] 
>>
>>     
>> //lib/glusterfs/1.3.8/xlator/cluster/afr.so(afr_opendir_cbk+0x138)[0x91c9f8] 
>>
>>     //lib/glusterfs/1.3.8/xlator/protocol/client.so[0x1125b8]
>>     
>> //lib/glusterfs/1.3.8/xlator/protocol/client.so(notify+0xa97)[0x116717]
>>     //lib/libglusterfs.so.0(transport_notify+0x37)[0xd47aa7]
>>     //lib/libglusterfs.so.0(sys_epoll_iteration+0xd7)[0xd487e7]
>>     //lib/libglusterfs.so.0(poll_iteration+0x7c)[0xd47bdc]
>>     [glusterfsd][0x8049432]
>>     //lib/libc.so.6(__libc_start_main+0xdc)[0xbe0dec]
>>     [glusterfsd][0x8048cf1]
>>     ---------
>>
>>     My glusterfs-server.vol:
>>
>>     volume pop1-mail-ns
>>             type protocol/client
>>             option transport-type tcp/client
>>             option remote-host 62.59.252.41 <http://62.59.252.41>
>>             option remote-subvolume pop1-mail-ns
>>             option transport-timeout 10
>>     end-volume
>>
>>     volume pop1-mail-ds
>>             type protocol/client
>>             option transport-type tcp/client
>>             option remote-host 62.59.252.41 <http://62.59.252.41>
>>             option remote-subvolume pop1-mail-ds
>>             option transport-timeout 10
>>     end-volume
>>
>>     volume pop2-mail-ns
>>             type storage/posix
>>             option directory /home/export/namespace
>>     end-volume
>>
>>     volume pop2-mail-ds
>>             type storage/posix
>>             option directory /home/export/mailspool
>>     end-volume
>>
>>     volume ns-afr
>>             type cluster/afr
>>             subvolumes pop1-mail-ns pop2-mail-ns
>>             option scheduler random
>>     end-volume
>>
>>     volume ds-afr
>>             type cluster/afr
>>             subvolumes pop1-mail-ds pop2-mail-ds
>>             option scheduler random
>>     end-volume
>>
>>     volume mail-unify
>>             type cluster/unify
>>             subvolumes ds-afr
>>             option namespace ns-afr
>>             option scheduler alu
>>             option alu.limits.max-open-files 10000   # Don't create
>>     files on
>>     a volume with more than 10000 files open
>>             option alu.order
>>     disk-usage:read-usage:write-usage:open-files-usage:disk-speed-usage
>>             option alu.disk-usage.entry-threshold 2GB   # Kick in if the
>>     discrepancy in disk-usage between volumes is more than 2GB
>>             option alu.disk-usage.exit-threshold  60MB   # Don't stop
>>     writing to the least-used volume until the discrepancy is 1988MB
>>             option alu.open-files-usage.entry-threshold 1024   # Kick
>>     in if
>>     the discrepancy in open files is 1024
>>             option alu.open-files-usage.exit-threshold 32   # Don't stop
>>     until 992 files have been written the least-used volume
>>             option alu.stat-refresh.interval 10sec   # Refresh the
>>     statistics used for decision-making every 10 seconds
>>     end-volume
>>
>>     volume mail-iothreads
>>             type performance/io-threads
>>             option thread-count 8
>>             option cache-size 64MB
>>             subvolumes mail-unify
>>     end-volume
>>
>>     volume mail-wb
>>             type performance/write-behind
>>             subvolumes mail-iothreads
>>     end-volume
>>
>>     volume mail
>>             type performance/read-ahead
>>             subvolumes mail-wb
>>     end-volume
>>
>>     volume server
>>             type protocol/server
>>             option transport-type tcp/server
>>             subvolumes mail
>>             option auth.ip.pop2-mail-ds.allow 62.59.252.*,127.0.0.1
>>     <http://127.0.0.1>
>>             option auth.ip.pop2-mail-ns.allow 62.59.252.*,127.0.0.1
>>     <http://127.0.0.1>
>>             option auth.ip.mail.allow 62.59.252.*,127.0.0.1
>>     <http://127.0.0.1>
>>     end-volume
>>
>>
>>     My glusterfs-client.vol:
>>
>>     volume mailspool
>>             type protocol/client
>>             option transport-type tcp/client
>>             option remote-host 127.0.0.1 <http://127.0.0.1>
>>             option remote-subvolume mail
>>     end-volume
>>
>>     volume writeback
>>             type performance/write-behind
>>             option aggregate-size 131072
>>             subvolumes mailspool
>>     end-volume
>>
>>     volume readahead
>>             type performance/read-ahead
>>             option page-size 65536
>>             option page-count 16
>>             subvolumes writeback
>>     end-volume
>>
>>     --
>>     Regards,
>>
>>     Guido Smit
>>     DevInet
>>
>>
>>
>>
>>     _______________________________________________
>>     Gluster-devel mailing list
>>     Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
>>     http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>>
>>
>>
>> -- 
>> If I traveled to the end of the rainbow
>> As Dame Fortune did intend,
>> Murphy would be there to tell me
>> The pot's at the other end.
>> ------------------------------------------------------------------------
>>
>> No virus found in this incoming message.
>> Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 
>> 269.19.17/1252 - Release Date: 1/30/2008 8:51 PM
>>   
>

-- 
Met vriendelijke groet,

Guido Smit
ComLog B.V.

Televisieweg 133
1322 BE Almere
T. 036 5470500
F. 036 5470481






More information about the Gluster-devel mailing list