[Gluster-devel] Segfault in read_ahead in 1.3.0_pre4

Harris Landgarten harrisl at lhjonline.com
Thu May 24 23:25:47 UTC 2007


Brent,

I have now switched all clients and servers to 2.4-Mainline. The read-ahead crash is fixed. The spurious disconnect errors remain but I suspect it may be tied to large reads. I cannot get through a large mailbox reindex without a disconnect in the middle which is now recovered from but leaves 300-1000 unindexed files. I turned off stat-prefetch because my application didn't need it but I didn't notice any errors with it on. 

One thing I am noticing. When I run:

  find /mnt/gluster/folder -type f -exec md5sum {} \;

on a tree with over 50,000 files, it does not run to completion. After about 30,000 files or so if fails with cannot find file or folder errors. It does not seem to be tied to any errors in the logs. find without the exec and du run correctly on the same tree. Do you have a way of duplicating this test? 

Harris
 
----- Original Message -----
From: "Brent A Nelson" <brent at phys.ufl.edu>
To: "Anand Avati" <avati at zresearch.com>
Cc: "Harris Landgarten" <harrisl at lhjonline.com>, gluster-devel at nongnu.org
Sent: Thursday, May 24, 2007 11:30:16 AM (GMT-0500) America/New_York
Subject: Re: [Gluster-devel] Segfault in read_ahead in 1.3.0_pre4

I was getting the same behavior in my testing.  After I reported it, the 
readahead crash was quickly patched, but the random disconnect is still 
very much a mystery...

I noticed that you are using stat-prefetch; have you encountered any 
issues? I was finding that du's on complex directories could return 
abnormal results and/or errors, so it seemed that heavy metadata queries 
were occassionally glitchy.  Without stat-prefetch, it's been fine.  If 
you've been having good luck with it, maybe I should try again.

Thanks,

Brent

On Thu, 24 May 2007, Anand Avati wrote:

> Harris,
> this bug was fixed a few days back and the fix is available in the
> glusterfs--mainline--2.4 repository latest checkout.
>
> thanks,
> avati
>
> 2007/5/24, Harris Landgarten <harrisl at lhjonline.com>:
>> I am running glusterfs in a very basic configuration on Amazon EC 
>> instances. I have a 2 brick cluster and 2 clients. One of the clients is 
>> running Zimbra and I am using the cluster as secondary storage for the mail 
>> store. I have repeatedly tried to reindex a mailbox with 31000 items. Most 
>> of the email is on the cluster. The entire process takes about 2 hours. 
>> Part way through I get at least one TCP Disconnect which seems random. With 
>> read_ahead enabled on the client, the disconnect results in a segfault and 
>> the mount point disappears. When I disabled read_ahead on the client, the 
>> disconnect was recovered from, and the process completed. This is the 
>> backtrace from the read_ahead segfault:
>> 
>> [May 23 20:00:37] [CRITICAL/client-protocol.c:218/call_bail()] 
>> client/protocol:bailing transport
>> [May 23 20:00:37] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write 
>> to break on blocked socket (if any)
>> [May 23 20:00:37] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 
>> 0 bytes r/w instead of 113 (errno=115)
>> [May 23 20:00:37] [DEBUG/protocol.c:244/gf_block_unserialize_transport()] 
>> libglusterfs/protocol:gf_block_unserialize_transport: full_read of header 
>> failed
>> [May 23 20:00:37] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()] 
>> protocol/client:cleaning up state in transport object 0x8078a08
>> [May 23 20:00:37] [CRITICAL/common-utils.c:215/gf_print_trace()] 
>> debug-backtrace:Got signal (11), printing backtrace
>> [May 23 20:00:37] [CRITICAL/common-utils.c:217/gf_print_trace()] 
>> debug-backtrace:/usr/lib/libglusterfs.so.0(gf_print_trace+0x2d) 
>> [0xb7f2584d]
>> [May 23 20:00:37] [CRITICAL/common-utils.c:217/gf_print_trace()] 
>> debug-backtrace:[0xbfffe420]
>> [May 23 20:00:37] [CRITICAL/common-utils.c:217/gf_print_trace()] 
>> debug-backtrace:/usr/lib/glusterfs/1.3.0-pre4/xlator/performance/read-ahead.so(ra_page_error+0x47) 
>> [0xb755e587]
>> [May 23 20:00:37] [CRITICAL/common-utils.c:217/gf_print_trace()] 
>> debug-backtrace:/usr/lib/glusterfs/1.3.0-pre4/xlator/performance/read-ahead.so 
>> [0xb755ecf0]
>> [May 23 20:00:37] [CRITICAL/common-utils.c:217/gf_print_trace()] 
>> debug-backtrace:/usr/lib/glusterfs/1.3.0-pre4/xlator/performance/write-behind.so 
>> [0xb7561809]
>> [May 23 20:00:37] [CRITICAL/common-utils.c:217/gf_print_trace()] 
>> debug-backtrace:/usr/lib/glusterfs/1.3.0-pre4/xlator/cluster/unify.so 
>> [0xb7564919]
>> [May 23 20:00:37] [CRITICAL/common-utils.c:217/gf_print_trace()] 
>> debug-backtrace:/usr/lib/glusterfs/1.3.0-pre4/xlator/protocol/client.so 
>> [0xb756d17b]
>> [May 23 20:00:37] [CRITICAL/common-utils.c:217/gf_print_trace()] 
>> debug-backtrace:/usr/lib/glusterfs/1.3.0-pre4/xlator/protocol/client.so 
>> [0xb75717a5]
>> [May 23 20:00:37] [CRITICAL/common-utils.c:217/gf_print_trace()] 
>> debug-backtrace:/usr/lib/libglusterfs.so.0(transport_notify+0x1d) 
>> [0xb7f26d2d]
>> [May 23 20:00:37] [CRITICAL/common-utils.c:217/gf_print_trace()] 
>> debug-backtrace:/usr/lib/libglusterfs.so.0(sys_epoll_iteration+0xe7) 
>> [0xb7f279d7]
>> [May 23 20:00:37] [CRITICAL/common-utils.c:217/gf_print_trace()] 
>> debug-backtrace:/usr/lib/libglusterfs.so.0(poll_iteration+0x1d) 
>> [0xb7f26ddd]
>> [May 23 20:00:37] [CRITICAL/common-utils.c:217/gf_print_trace()] 
>> debug-backtrace:glusterfs [0x804a15e]
>> [May 23 20:00:37] [CRITICAL/common-utils.c:217/gf_print_trace()] 
>> debug-backtrace:/lib/libc.so.6(__libc_start_main+0xdc) [0xb7dca8cc]
>> [May 23 20:00:37] [CRITICAL/common-utils.c:217/gf_print_trace()] 
>> debug-backtrace:glusterfs [0x8049e71]
>> Segmentation fault (core dumped)
>> 
>> This is a sample of the debug log with read_ahead turned off
>> 
>> [May 24 05:35:05] [CRITICAL/client-protocol.c:218/call_bail()] 
>> client/protocol:bailing transport
>> [May 24 05:35:05] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write 
>> to break on blocked socket (if any)
>> [May 24 05:35:05] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 
>> 0 bytes r/w instead of 113 (errno=115)
>> [May 24 05:35:05] [DEBUG/protocol.c:244/gf_block_unserialize_transport()] 
>> libglusterfs/protocol:gf_block_unserialize_transport: full_read of header 
>> failed
>> [May 24 05:35:05] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()] 
>> protocol/client:cleaning up state in transport object 0x80783d0
>> [May 24 05:35:05] [CRITICAL/tcp.c:81/tcp_disconnect()] 
>> transport/tcp:client1: connection to server disconnected
>> [May 24 05:35:05] [DEBUG/tcp-client.c:180/tcp_connect()] transport: tcp: 
>> :try_connect: socket fd = 4
>> [May 24 05:35:05] [DEBUG/tcp-client.c:202/tcp_connect()] transport: tcp: 
>> :try_connect: finalized on port `1022'
>> [May 24 05:35:05] [DEBUG/tcp-client.c:226/tcp_connect()] 
>> tcp/client:try_connect: defaulting remote-port to 6996
>> [May 24 05:35:05] [DEBUG/tcp-client.c:262/tcp_connect()] tcp/client:connect 
>> on 4 in progress (non-blocking)
>> [May 24 05:35:05] [DEBUG/tcp-client.c:301/tcp_connect()] 
>> tcp/client:connection on 4 still in progress - try later
>> [May 24 05:35:05] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
>> protocol/client:transport_submit failed
>> [May 24 05:35:05] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()] 
>> protocol/client:cleaning up state in transport object 0x80783d0
>> [May 24 05:35:11] [DEBUG/tcp-client.c:310/tcp_connect()] 
>> tcp/client:connection on 4 success, attempting to handshake
>> [May 24 05:35:11] [DEBUG/tcp-client.c:54/do_handshake()] 
>> transport/tcp-client:dictionary length = 50
>> [May 24 07:20:10] [DEBUG/stat-prefetch.c:58/stat_prefetch_cache_flush()] 
>> stat-prefetch:flush on: /
>> [May 24 07:20:20] [DEBUG/stat-prefetch.c:58/stat_prefetch_cache_flush()] 
>> stat-prefetch:flush on: /backups/sessions
>> [May 24 07:57:12] [CRITICAL/client-protocol.c:218/call_bail()] 
>> client/protocol:bailing transport
>> [May 24 07:57:12] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write 
>> to break on blocked socket (if any)
>> [May 24 07:57:12] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 
>> 0 bytes r/w instead of 113 (errno=115)
>> [May 24 07:57:12] [DEBUG/protocol.c:244/gf_block_unserialize_transport()] 
>> libglusterfs/protocol:gf_block_unserialize_transport: full_read of header 
>> failed
>> [May 24 07:57:12] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()] 
>> protocol/client:cleaning up state in transport object 0x80783d0
>> [May 24 07:57:12] [CRITICAL/tcp.c:81/tcp_disconnect()] 
>> transport/tcp:client1: connection to server disconnected
>> [May 24 07:57:12] [DEBUG/tcp-client.c:180/tcp_connect()] transport: tcp: 
>> :try_connect: socket fd = 4
>> [May 24 07:57:12] [DEBUG/tcp-client.c:202/tcp_connect()] transport: tcp: 
>> :try_connect: finalized on port `1023'
>> [May 24 07:57:12] [DEBUG/tcp-client.c:226/tcp_connect()] 
>> tcp/client:try_connect: defaulting remote-port to 6996
>> [May 24 07:57:12] [DEBUG/tcp-client.c:262/tcp_connect()] tcp/client:connect 
>> on 4 in progress (non-blocking)
>> [May 24 07:57:12] [DEBUG/tcp-client.c:301/tcp_connect()] 
>> tcp/client:connection on 4 still in progress - try later
>> [May 24 07:57:12] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
>> protocol/client:transport_submit failed
>> [May 24 07:57:12] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()] 
>> protocol/client:cleaning up state in transport object 0x80783d0
>> [May 24 07:57:12] [DEBUG/tcp-client.c:310/tcp_connect()] 
>> tcp/client:connection on 4 success, attempting to handshake
>> [May 24 07:57:12] [DEBUG/tcp-client.c:54/do_handshake()] 
>> transport/tcp-client:dictionary length = 50
>> 
>> This is the client config with read_ahead
>> 
>> ### Add client feature and attach to remote subvolume
>> volume client1
>>   type protocol/client
>>   option transport-type tcp/client     # for TCP/IP transport
>> # option ibv-send-work-request-size  131072
>> # option ibv-send-work-request-count 64
>> # option ibv-recv-work-request-size  131072
>> # option ibv-recv-work-request-count 64
>> # option transport-type ib-sdp/client  # for Infiniband transport
>> # option transport-type ib-verbs/client # for ib-verbs transport
>>   option remote-host xx.xxx.xx.xxx     # IP address of the remote brick
>> # option remote-port 6996              # default server port is 6996
>> 
>> # option transport-timeout 30          # seconds to wait for a reply
>>                                        # from server for each request
>>   option remote-subvolume brick        # name of the remote volume
>> end-volume
>> 
>> ### Add client feature and attach to remote subvolume
>> volume client2
>>   type protocol/client
>>     option transport-type tcp/client     # for TCP/IP transport
>>     # option ibv-send-work-request-size  131072
>>     # option ibv-send-work-request-count 64
>>     # option ibv-recv-work-request-size  131072
>>     # option ibv-recv-work-request-count 64
>>     # option transport-type ib-sdp/client  # for Infiniband transport
>>     # option transport-type ib-verbs/client # for ib-verbs transport
>>       option remote-host yy.yyy.yy.yyy     # IP address of the remote brick
>>     # option remote-port 6996              # default server port is 6996
>>
>>     # option transport-timeout 30          # seconds to wait for a reply
>>                                            # from server for each request
>>       option remote-subvolume brick        # name of the remote volume
>> end-volume
>> 
>> volume bricks
>>   type cluster/unify
>>     subvolumes client1 client2
>>     option scheduler alu
>>     option alu.limits.min-free-disk 4GB
>>     option alu.limits.max-open-files 10000
>>
>>     option alu.order disk-usage:read-usage:write-usage:open-files-usage
>>     option alu.disk-usage.entry-threshold 2GB
>>     option alu.disk-usage.exit-threshold 10GB
>>     option alu.open-files-usage.entry-threshold 1024
>>     option alu.open-files-usage.exit-threshold 32
>>     option alu.stat-refresh.interval 10sec
>> 
>> end-volume
>> #
>> 
>> ### Add writeback feature
>> volume writeback
>>   type performance/write-behind
>>   option aggregate-size 131072 # unit in bytes
>>   subvolumes bricks
>> end-volume
>> 
>> ### Add readahead feature
>> volume readahead
>>   type performance/read-ahead
>>   option page-size 65536     # unit in bytes
>>   option page-count 16       # cache per file  = (page-count x page-size)
>>   subvolumes writeback
>> end-volume
>> 
>> ### Add stat-prefetch feature
>> ### If you are not concerned about performance of interactive commands
>> ### like "ls -l", you wouln't need this translator.
>> volume statprefetch
>>    type performance/stat-prefetch
>>    option cache-seconds 2   # timeout for stat cache
>>    subvolumes readahead
>> end-volume
>> 
>> This is the brick config:
>> 
>> ### Export volume "brick" with the contents of "/home/export" directory.
>> volume brick
>>   type storage/posix                   # POSIX FS translator
>>   option directory /mnt/export        # Export this directory
>> end-volume
>> 
>> volume iothreads
>>   type performance/io-threads
>>   option thread-count 8
>>   subvolumes brick
>> end-volume
>> 
>> ### Add network serving capability to above brick.
>> volume server
>>   type protocol/server
>>   option transport-type tcp/server     # For TCP/IP transport
>> # option ibv-send-work-request-size  131072
>> # option ibv-send-work-request-count 64
>> # option ibv-recv-work-request-size  131072
>> # option ibv-recv-work-request-count 64
>> # option transport-type ib-sdp/server  # For Infiniband transport
>> # option transport-type ib-verbs/server # For ib-verbs transport
>> # option bind-address 192.168.1.10     # Default is to listen on all 
>> interfaces
>> # option listen-port 6996              # Default is 6996
>> # option client-volume-filename /etc/glusterfs/glusterfs-client.vol
>>   subvolumes iothreads
>> # NOTE: Access to any volume through protocol/server is denied by
>> # default. You need to explicitly grant access through # "auth"
>> # option.
>>   option auth.ip.brick.allow * # Allow access to "brick" volume
>> end-volume
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>> 
>
>
> -- 
> Anand V. Avati
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>






More information about the Gluster-devel mailing list