[Gluster-users] servers with multiple bricks do not get removed from LOOKUP function

Wed Mar 7 16:42:44 UTC 2012

Dear all,

First of all apologies if the question has been answered before. In such
case, I would really appreciate any pointer to the location of the
information.

Using gluster version 3.2.5 and 2 nodes, each one with 2 bricks, in a
replicated distributed topology:

(d1a)(d1b) (d2a)(d2b)

disks d1a and disk d2a form a replica set, as well as d1b and d2b. This
covers the failure of one server.

The scenario I'm testing is the simulation of a disk failure, for instance
disk d2b. The information keeps consistency (from a customer point of view
all the files are in the right directory and it's possible to read, modify,
delete,etc..).
The problem is that all this operations takes extremelly too much time to
complete. A single ls of a 4-file directory takes almost a minute. The main
problem that may arisewould be I/o errors due to timeout in complex
operations.

Halting a server, on the other side, keeps the cluster working without
performance penality.

My guess is that since only one disk is down, but the server 2 is still up,
the client (which is using gluster native library) still tries to use the
brick at node 2, because node2 is still up although the brick is missing.
When the LOOKUP times out, the client send a "broadcast" LOOKUP function
and the other node (1) replies with the information of the running bricks.
That may explain the reason why it takes so much to complete but the
information is right.

Is there any configuration option to solve this issue? Is my guess
something similar to reallity? Is there any workaround besides using nodes
of 1 single brick?

Thank you in advance for any help,

Samuel.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120307/778da33f/attachment.html>