[Gluster-devel] Unify behaviour if one of the servers disconnected
NovA
av.nova at gmail.com
Wed Jun 18 14:17:25 UTC 2008
Hi!
2008/6/18 Amar S. Tumballi <amar at zresearch.com>:
> The fix for this issue is in the source repo now. You can try out beating
> it hard again. Let me know about the problems if you have any.
Oh, what a fast response! :) Thanks a lot!
Unify is working now as expected. But not without problems, unfortunately...
When I disconnect one of the node and do "ls /home" at once, the
command hangs and can't be killed even with SIGKILL. The client log
contains:
----
2008-06-18 17:05:15 W [client-protocol.c:205:call_bail] c54:
activating bail-out. pending frames = 2. last sent = 2008-06-18
17:04:29. last received = 2008-06-18 17:03:35 transport-timeout = 42
2008-06-18 17:05:15 C [client-protocol.c:212:call_bail] c54: bailing transport
2008-06-18 17:05:15 W [client-protocol.c:4777:client_protocol_cleanup]
c54: cleaning up state in transport object 0x63d790
2008-06-18 17:05:15 E [client-protocol.c:4827:client_protocol_cleanup]
c54: forced unwinding frame type(1) op(34) reply=@0x657320
2008-06-18 17:05:15 E [client-protocol.c:4423:client_lookup_cbk] c54:
no proper reply from server, returning ENOTCONN
2008-06-18 17:05:15 E [unify.c:182:unify_lookup_cbk] bricks: c54 returned 107
2008-06-18 17:05:15 E [unify.c:265:unify_lookup_cbk] bricks:
Revalidate failed for /
2008-06-18 17:05:15 E [fuse-bridge.c:468:fuse_entry_cbk]
glusterfs-fuse: 15: (34) / => -1 (107)
2008-06-18 17:05:15 E [client-protocol.c:325:client_protocol_xfer]
c54: transport_submit failed
2008-06-18 17:05:15 E [client-protocol.c:4827:client_protocol_cleanup]
c54: forced unwinding frame type(1) op(34) reply=@0x657320
2008-06-18 17:05:15 E [client-protocol.c:4423:client_lookup_cbk] c54:
no proper reply from server, returning ENOTCONN
2008-06-18 17:05:15 E [unify.c:182:unify_lookup_cbk] bricks: c54 returned 107
2008-06-18 17:05:15 E [unify.c:265:unify_lookup_cbk] bricks:
Revalidate failed for /
2008-06-18 17:05:15 E [fuse-bridge.c:468:fuse_entry_cbk]
glusterfs-fuse: 16: (34) / => -1 (107)
2008-06-18 17:05:15 E [client-protocol.c:325:client_protocol_xfer]
c54: transport_submit failed
2008-06-18 17:05:19 E [tcp-client.c:190:tcp_connect] c54: non-blocking
connect() returned: 113 (No route to host)
2008-06-18 17:05:27 E [tcp-client.c:190:tcp_connect] c54: non-blocking
connect() returned: 113 (No route to host)
2008-06-18 17:05:48 E [tcp-client.c:190:tcp_connect] c54: non-blocking
connect() returned: 113 (No route to host)
2008-06-18 17:06:43 E [tcp-client.c:190:tcp_connect] c54: non-blocking
connect() returned: 113 (No route to host)
2008-06-18 17:07:55 E [tcp-client.c:190:tcp_connect] c54: non-blocking
connect() returned: 113 (No route to host)
2008-06-18 17:07:55 W [client-protocol.c:332:client_protocol_xfer]
c54: not connected at the moment to submit frame type(1) op(34)
2008-06-18 17:07:55 E [client-protocol.c:4423:client_lookup_cbk] c54:
no proper reply from server, returning ENOTCONN
2008-06-18 17:07:55 E [unify.c:182:unify_lookup_cbk] bricks: c54 returned 107
2008-06-18 17:07:55 E [unify.c:265:unify_lookup_cbk] bricks:
Revalidate failed for /
2008-06-18 17:07:55 E [fuse-bridge.c:468:fuse_entry_cbk]
glusterfs-fuse: 19: (34) / => -1 (107)
2008-06-18 17:07:55 W [client-protocol.c:332:client_protocol_xfer]
c54: not connected at the moment to submit frame type(1) op(34)
2008-06-18 17:07:55 E [client-protocol.c:4423:client_lookup_cbk] c54:
no proper reply from server, returning ENOTCONN
2008-06-18 17:07:55 E [client-protocol.c:4572:client_checksum] c54: /:
returning EINVAL
------
But after some time (seemingly concerned with transport timeout) any
further commands "ls /home" succeed. But the log is flooded by
messages like:
-----
2008-06-18 17:08:10 W [client-protocol.c:332:client_protocol_xfer]
c54: not connected at the moment to submit frame type(1) op(34)
2008-06-18 17:08:10 E [client-protocol.c:4423:client_lookup_cbk] c54:
no proper reply from server, returning ENOTCONN
2008-06-18 17:08:10 E [client-protocol.c:4572:client_checksum] c54:
/danilov/public_html: returning EINVAL
2008-06-18 17:08:10 W [client-protocol.c:332:client_protocol_xfer]
c54: not connected at the moment to submit frame type(1) op(34)
2008-06-18 17:08:10 E [client-protocol.c:4423:client_lookup_cbk] c54:
no proper reply from server, returning ENOTCONN
2008-06-18 17:08:10 E [client-protocol.c:4572:client_checksum] c54:
/danilov/.mc: returning EINVAL
----
This is not so important as the first mentioned issue. But if, for
example, I turn the node off for a couple of days, then the log will
grow enormously...
WBR,
Andrey
>
> On Tue, Jun 17, 2008 at 6:49 AM, Amar S. Tumballi <amar at zresearch.com>
> wrote:
>>
>> I just noticed this behavior, which ideally should not be the case, you
>> will have a fix to it tomorrow.
>>
>>
>> On Tue, Jun 17, 2008 at 6:21 AM, Amar S. Tumballi <amar at zresearch.com>
>> wrote:
>>>
>>> Currently if the server which got disconnected is having Namespace export
>>> too.. then the lookups return ENOENT (file not found). Otherwise what you
>>> described (whole filesystem will be online without few files).
>>>
>>>
>>> On Tue, Jun 17, 2008 at 4:57 AM, NovA <av.nova at gmail.com> wrote:
>>>>
>>>> I'm continuing to stress-test glusterFS 1.3.8+ series. Just upgraded
>>>> to tla781. It seems stable in my setup by now, no lockups yet. ;)
>>>> Great!
>>>> But I still can't reveal the desired feature concerning the subj. So I
>>>> have a concrete question. :) What is the supposed behaviour of the
>>>> unify translator (without AFR), when one of the servers disconnected?
>>>> I assumed, that in this case the glusterFS volume should remain online
>>>> with some files being inaccessible (which are on the disconnected
>>>> server). But now, if I plug the network cable out of a cluster node,
>>>> then "ls <unify_volume>" says that it cannot open directory,
>>>> "Transport endpoint is not connected". Am I just believe what I
>>>> desire? Is it supposed that the unify volume goes back online only
>>>> after the disconnected server return?
More information about the Gluster-devel
mailing list