[Gluster-devel] Unify behaviour if one of the servers disconnected

Amar S. Tumballi amar at zresearch.com
Thu Jun 19 18:00:49 UTC 2008


NovA,
 Another valid point, which was in our todo from long time. Will be done
soon.

Regards,
Amar

2008/6/18 NovA <av.nova at gmail.com>:

> Hi!
>
> 2008/6/18 Amar S. Tumballi <amar at zresearch.com>:
> >  The fix for this issue is in the source repo now. You can try out
> beating
> > it hard again. Let me know about the problems if you have any.
> Oh, what a fast response! :) Thanks a lot!
> Unify is working now as expected. But not without problems,
> unfortunately...
>
> When I disconnect one of the node and do "ls /home" at once, the
> command hangs and can't be killed even with SIGKILL. The client log
> contains:
> ----
> 2008-06-18 17:05:15 W [client-protocol.c:205:call_bail] c54:
> activating bail-out. pending frames = 2. last sent = 2008-06-18
> 17:04:29. last received = 2008-06-18 17:03:35 transport-timeout = 42
> 2008-06-18 17:05:15 C [client-protocol.c:212:call_bail] c54: bailing
> transport
> 2008-06-18 17:05:15 W [client-protocol.c:4777:client_protocol_cleanup]
> c54: cleaning up state in transport object 0x63d790
> 2008-06-18 17:05:15 E [client-protocol.c:4827:client_protocol_cleanup]
> c54: forced unwinding frame type(1) op(34) reply=@0x657320
> 2008-06-18 17:05:15 E [client-protocol.c:4423:client_lookup_cbk] c54:
> no proper reply from server, returning ENOTCONN
> 2008-06-18 17:05:15 E [unify.c:182:unify_lookup_cbk] bricks: c54 returned
> 107
> 2008-06-18 17:05:15 E [unify.c:265:unify_lookup_cbk] bricks:
> Revalidate failed for /
> 2008-06-18 17:05:15 E [fuse-bridge.c:468:fuse_entry_cbk]
> glusterfs-fuse: 15: (34) / => -1 (107)
> 2008-06-18 17:05:15 E [client-protocol.c:325:client_protocol_xfer]
> c54: transport_submit failed
> 2008-06-18 17:05:15 E [client-protocol.c:4827:client_protocol_cleanup]
> c54: forced unwinding frame type(1) op(34) reply=@0x657320
> 2008-06-18 17:05:15 E [client-protocol.c:4423:client_lookup_cbk] c54:
> no proper reply from server, returning ENOTCONN
> 2008-06-18 17:05:15 E [unify.c:182:unify_lookup_cbk] bricks: c54 returned
> 107
> 2008-06-18 17:05:15 E [unify.c:265:unify_lookup_cbk] bricks:
> Revalidate failed for /
> 2008-06-18 17:05:15 E [fuse-bridge.c:468:fuse_entry_cbk]
> glusterfs-fuse: 16: (34) / => -1 (107)
> 2008-06-18 17:05:15 E [client-protocol.c:325:client_protocol_xfer]
> c54: transport_submit failed
> 2008-06-18 17:05:19 E [tcp-client.c:190:tcp_connect] c54: non-blocking
> connect() returned: 113 (No route to host)
> 2008-06-18 17:05:27 E [tcp-client.c:190:tcp_connect] c54: non-blocking
> connect() returned: 113 (No route to host)
> 2008-06-18 17:05:48 E [tcp-client.c:190:tcp_connect] c54: non-blocking
> connect() returned: 113 (No route to host)
> 2008-06-18 17:06:43 E [tcp-client.c:190:tcp_connect] c54: non-blocking
> connect() returned: 113 (No route to host)
> 2008-06-18 17:07:55 E [tcp-client.c:190:tcp_connect] c54: non-blocking
> connect() returned: 113 (No route to host)
> 2008-06-18 17:07:55 W [client-protocol.c:332:client_protocol_xfer]
> c54: not connected at the moment to submit frame type(1) op(34)
> 2008-06-18 17:07:55 E [client-protocol.c:4423:client_lookup_cbk] c54:
> no proper reply from server, returning ENOTCONN
> 2008-06-18 17:07:55 E [unify.c:182:unify_lookup_cbk] bricks: c54 returned
> 107
> 2008-06-18 17:07:55 E [unify.c:265:unify_lookup_cbk] bricks:
> Revalidate failed for /
> 2008-06-18 17:07:55 E [fuse-bridge.c:468:fuse_entry_cbk]
> glusterfs-fuse: 19: (34) / => -1 (107)
> 2008-06-18 17:07:55 W [client-protocol.c:332:client_protocol_xfer]
> c54: not connected at the moment to submit frame type(1) op(34)
> 2008-06-18 17:07:55 E [client-protocol.c:4423:client_lookup_cbk] c54:
> no proper reply from server, returning ENOTCONN
> 2008-06-18 17:07:55 E [client-protocol.c:4572:client_checksum] c54: /:
> returning EINVAL
> ------
>
> But after some time (seemingly concerned with transport timeout) any
> further commands "ls /home" succeed. But the log is flooded by
> messages like:
> -----
> 2008-06-18 17:08:10 W [client-protocol.c:332:client_protocol_xfer]
> c54: not connected at the moment to submit frame type(1) op(34)
> 2008-06-18 17:08:10 E [client-protocol.c:4423:client_lookup_cbk] c54:
> no proper reply from server, returning ENOTCONN
> 2008-06-18 17:08:10 E [client-protocol.c:4572:client_checksum] c54:
> /danilov/public_html: returning EINVAL
> 2008-06-18 17:08:10 W [client-protocol.c:332:client_protocol_xfer]
> c54: not connected at the moment to submit frame type(1) op(34)
> 2008-06-18 17:08:10 E [client-protocol.c:4423:client_lookup_cbk] c54:
> no proper reply from server, returning ENOTCONN
> 2008-06-18 17:08:10 E [client-protocol.c:4572:client_checksum] c54:
> /danilov/.mc: returning EINVAL
> ----
> This is not so important as the first mentioned issue. But if, for
> example, I turn the node off for a couple of days, then the log will
> grow enormously...
>
> WBR,
>  Andrey
>
>
> >
> > On Tue, Jun 17, 2008 at 6:49 AM, Amar S. Tumballi <amar at zresearch.com>
> > wrote:
> >>
> >>  I just noticed this behavior, which ideally should not be the case, you
> >> will have a fix to it tomorrow.
> >>
> >>
> >> On Tue, Jun 17, 2008 at 6:21 AM, Amar S. Tumballi <amar at zresearch.com>
> >> wrote:
> >>>
> >>> Currently if the server which got disconnected is having Namespace
> export
> >>> too.. then the lookups return ENOENT (file not found). Otherwise what
> you
> >>> described (whole filesystem will be online without few files).
> >>>
> >>>
> >>> On Tue, Jun 17, 2008 at 4:57 AM, NovA <av.nova at gmail.com> wrote:
> >>>>
> >>>> I'm continuing to stress-test glusterFS 1.3.8+ series. Just upgraded
> >>>> to tla781. It seems stable in my setup by now, no lockups yet. ;)
> >>>> Great!
> >>>> But I still can't reveal the desired feature concerning the subj. So I
> >>>> have a concrete question. :) What is the supposed behaviour of the
> >>>> unify translator (without AFR), when one of the servers disconnected?
> >>>> I assumed, that in this case the glusterFS volume should remain online
> >>>> with some files being inaccessible (which are on the disconnected
> >>>> server). But now, if I plug the network cable out of a cluster node,
> >>>> then "ls <unify_volume>" says that it cannot open directory,
> >>>> "Transport endpoint is not connected". Am I just believe what I
> >>>> desire? Is it supposed that the unify volume goes back online only
> >>>> after the disconnected server return?
>
>


-- 
Amar Tumballi
Gluster/GlusterFS Hacker
[bulde on #gluster/irc.gnu.org]
http://www.zresearch.com - Commoditizing Super Storage!



More information about the Gluster-devel mailing list