[Gluster-devel] Could be the bug of Glusterfs? The file system is unstable and hang

Wed Jun 3 21:48:32 UTC 2009

We applied the patch mentioned the thread, and use fixed thread count in the
server config. Unfortunately, we got the same error:

[2009-06-03 04:57:36] W [fuse-bridge.c:2284:fuse_setlk_cbk] glusterfs-fuse:
22347008: ERR => -1 (Resource temporarily unavailable)
[2009-06-03 07:55:04] W [fuse-bridge.c:2284:fuse_setlk_cbk] glusterfs-fuse:
23431094: ERR => -1 (Resource temporarily unavailable)
[2009-06-03 15:58:25] E [client-protocol.c:292:call_bail] brick1: bailing
out frame LOOKUP(32) frame sent = 2009-06-03 15:28:23. frame-timeout = 1800

John

On Tue, Jun 2, 2009 at 12:25 AM, Shehjar Tikoo <shehjart at gluster.com> wrote:

>
> Hi
>
> >
> >     Also, avoid using autoscaling in io-threads for now.
> >
> >     -Shehjar
> >
> >
>
> -Shehjar
>
> Alpha Electronics wrote:
>
>> Thanks for looking into this. We do use io-threads. Here is the server
>> config:
>> : volume brick1-posix
>>  2:  type storage/posix
>>  3:  option directory /mnt/brick1
>>  4: end-volume
>>  5:
>>  6: volume brick2-posix
>>  7:  type storage/posix
>>  8:  option directory /mnt/brick2
>>  9: end-volume
>>  10:
>>  11:
>>  12: volume brick1-locks
>>  13:   type features/locks
>>  14:   subvolumes brick1-posix
>>  15: end-volume
>>  16:
>>  17: volume brick2-locks
>>  18:   type features/locks
>>  19:   subvolumes brick2-posix
>>  20: end-volume
>>  21:
>>  22: volume brick1
>>  23:  type performance/io-threads
>>  24:  option min-threads 16
>>  25:  option autoscaling on
>>  26:  subvolumes brick1-locks
>>  27: end-volume
>>  28:
>>  29: volume brick2
>>  30:  type performance/io-threads
>>  31:  option min-threads 16
>>  32:  option autoscaling on
>>  33:  subvolumes brick2-locks
>>  34: end-volume
>>  35:
>>  36: volume server
>>  37:  type protocol/server
>>  38:  option transport-type tcp
>>  40:  option auth.addr.brick1.allow *
>>  41:  option auth.addr.brick2.allow *
>>  42:  subvolumes brick1 brick2
>>  43: end-volume
>>  44:
>>
>>
>>
>> On Sun, May 31, 2009 at 11:44 PM, Shehjar Tikoo <shehjart at gluster.com<mailto:
>> shehjart at gluster.com>> wrote:
>>
>>    Alpha Electronics wrote:
>>
>>        We are testing the glusterfs before recommending them to
>>        enterprise clients. We found that the file system always hang
>>        after running for about 2 days. after killing the server side
>>        process and then restart, everything goes back to normal.
>>
>>
>>    What is the server config?
>>    If you're not using io-threads on the server, I suggest you do,
>>    because it does basic load-balancing to avoid timeouts.
>>
>>    Also, avoid using autoscaling in io-threads for now.
>>
>>    -Shehjar
>>
>>
>>         Here is the spec and error logged:
>>        GlusterFS version:  v2.0.1
>>
>>        Client volume:
>>        volume brick_1
>>         type protocol/client
>>         option transport-type tcp/client
>>         option remote-port 7777 # Non-default port
>>         option remote-host server1
>>         option remote-subvolume brick
>>        end-volume
>>
>>        volume brick_2
>>         type protocol/client
>>         option transport-type tcp/client
>>         option remote-port 7777 # Non-default port
>>         option remote-host server2
>>         option remote-subvolume brick
>>        end-volume
>>
>>        volume bricks
>>         type cluster/distribute
>>         subvolumes brick_1 brick_2
>>        end-volume
>>
>>        Error logged on client side through /var/log/glusterfs.log
>>        [2009-05-29 14:58:55] E [client-protocol.c:292:call_bail]
>>        brick_1: bailing out frame LK(28) frame sent = 2009-05-29
>>        14:28:54. frame-timeout = 1800
>>        [2009-05-29 14:58:55] W [fuse-bridge.c:2284:fuse_setlk_cbk]
>>        glusterfs-fuse: 106850788: ERR => -1 (Transport endpoint is not
>>        connected)
>>        error logged on server
>>        [2009-05-29 14:59:15] E [client-protocol.c:292:call_bail]
>>        brick_2: bailing out frame LK(28) frame sent = 2009-05-29
>>        14:29:05. frame-timeout = 1800
>>        [2009-05-29 14:59:15] W [fuse-bridge.c:2284:fuse_setlk_cbk]
>>        glusterfs-fuse: 106850860: ERR => -1 (Transport endpoint is not
>>        connected)
>>
>>        There is error message logged on server side after 1 hour in
>>        /var/log/messages:
>>        May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
>>        lib/util_sock.c:write_data(564)
>>        May 29 16:04:16 server2 winbindd[3649]:   write_data: write
>>        failure. Error = Connection reset by peer
>>        May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
>>        libsmb/clientgen.c:write_socket(158)
>>        May 29 16:04:16 server2 winbindd[3649]:   write_socket: Error
>>        writing 104 bytes to socket 18: ERRNO = Connection reset by peer
>>        May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
>>        libsmb/clientgen.c:cli_send_smb(188)
>>        May 29 16:04:16 server2 winbindd[3649]:   Error writing 104
>>        bytes to client. -1 (Connection reset by peer)
>>        May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
>>        libsmb/cliconnect.c:cli_session_setup_spnego(859)
>>        May 29 16:04:16 server2 winbindd[3649]:   Kinit failed: Cannot
>>        contact any KDC for requested realm
>>
>>
>>
>>  ------------------------------------------------------------------------
>>
>>        _______________________________________________
>>        Gluster-devel mailing list
>>        Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
>>        http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20090603/dada1589/attachment-0003.html>