[Gluster-devel] Problem with clients that goes down..
Antonio González
antonio.gonzalez at libera.net
Mon Apr 21 12:30:23 UTC 2008
Thanks Krishna, dont worry for not respond, i think is a hard work to
maintain this list!!!
Well, the main problem is the first you note. I have made some test over
glusters to check the viability when client goes down, I can see that some
times if a client hangs while making any operation (read/write) other
clients dont work correctly.
I proved this issue in several scenarios, and I can see this problem always.
Mi last test can explain you the problem. I have 4 machines, two servers and
to clients.
One server export one brick for storage (posix storage), the other server
exports a brick for namespace and a brick for storage. The unify translator
is place at client side.
The test is: From one client I cp a file (from local to glusters and vice
versa) while the client is completing the cp I power down the client, then
from other client I try a "ls" command (I proved also sha1sum over a file in
the Gluster, cp, cat ...), the client finishes blocked during a large time.
Some times finish the command (for example "ls" 2/3 minutes) and other times
send an error message.
Note: some times the client is not blocked and the gluster works fine. Is
difficult to prevent when the client will be blocked and when no.
As I comment previously I test this issue with several scenarios, with and
without AFR (I think the problem is because unify translator), the unify
translator at the client side and at the server side, one server and two
clients, 2 server and 2 clients, 3 server and two clients.
The issue about timeout option is related about this problem. I test with
the timeout option to see the impact over the same tests. I can see that if
I define a timeout, when a client try a ls command (or cp, sha1sum ..) the
recovery time is less than if I not define timeout. I dont know the
relation about this, but it seems that with timeout the client when the
timeout expire try the command other time and this time the command finish
successfully but I dont sure about this.
The config files of this last test:
Server1
volume brick
type storage/posix
option directory /home/pruebaD
end-volume
volume brick-ns
type storage/posix
option directory /home/namespace
end-volume
volume server
type protocol/server
subvolumes brick brick-ns
option transport-type tcp/server
option auth.ip.brick.allow *
option auth.ip.brick-ns.allow *
option listen-port 6996 # Default is 6996
option client-volume-filename
etc/glusterfs/pruebaDistribuida/glusterfs-client.vol
end-volume
Sever2
volume brick
type storage/posix
option directory /home/pruebaD
end-volume
volume server
type protocol/server
subvolumes brick
option transport-type tcp/server
option auth.ip.brick.allow *
end-volume
Clients
volume brick1
type protocol/client
option transport-type tcp/client
option remote-host 10.1.0.45
option remote-subvolume brick
end-volume
volume brick2
type protocol/client
option transport-type tcp/client
option remote-host 10.1.0.40
option remote-subvolume brick
end-volume
volume ns1
type protocol/client
option transport-type tcp/client
option remote-host 10.1.0.45
option remote-subvolume brick-ns
end-volume
volume unify
type cluster/unify
subvolumes brick1 brick2
option namespace ns1
option scheduler rr
end-volume
The version of glusters is 1.3.8pre5, fuse 2.7.2glfs9. The OS is gentoo
kernel 2.6.23-r6.
Thanks for the reply,
-----Mensaje original-----
De: krishna.srinivas at gmail.com [mailto:krishna.srinivas at gmail.com] En nombre
de Krishna Srinivas
Enviado el: lunes, 21 de abril de 2008 13:09
Para: Antonio González
CC: gluster-devel at nongnu.org
Asunto: Re: [Gluster-devel] Problem with clients that goes down..
Hi Antonio,
Excuse us, somehow your issue was not responded to.
If I understand correctly, you are facing two problems:
1) plugging out the cable on one client will make other clients hang
2) the timeout value you specify in spec file does not reflect
in the actual timeout you see when you access glusterfs.
Is that correct? I have lost track of your setup details. Searching mail
archives did not give me the exact picture. Can you give the setup
details with config files? And also the tests?
Surely the problem you are facing should be fixed.
Regards
Krishna
On Mon, Apr 21, 2008 at 3:58 PM, Antonio González
<antonio.gonzalez at libera.net> wrote:
> Hello all,
>
>
>
> I have made a lot of tests over GlusterFS to verify his viability. I
wrote
> at this list one or two weeks ago asking about an issue with clients that
> goes down and causes problems with other clients that can not access to
the
> Gluster file system.
>
>
>
> Are the developers of GlusterFS noticed about this issue? I think that
is a
> serious problem and I need an answer to advice or not the use of
GlusterFS
> in a project.
>
>
>
> I proved this issue over several scenarios (AFR/unify at server side,
client
> side, without AFR
), and I think that the problem is the unify
translator.
> I made a test with one server and two clients. Without unify translator
> works fine, a client who goes down while reads or copy a file, don't
affect
> other clients. With the unify translator, if a client who reads/writes
file
> goes down causes the problem (other clients that tries an "ls" command
are
> blocked).
>
>
>
> I made a test with two servers (without AFR, unify at client side), I
have
> localized files in each server, I try to block one server and access to a
> file in the other server (cp command). I can see that the access to this
> server (no blocked) is in function of the timeout option. If I don't set
> timeout, the client takes 2 or 3 minutes and not finishes the command. If
I
> set a timeout of 20 sec the client takes 32 sec and finishes the command.
> For a timeout of 40 s. the client takes 60 sec approximately.
>
>
>
>
>
> I would like to know at least if this problem is recognized by the
> developers of Gluster. They know which is problem? They working to solve
> it? .
>
>
>
> Thanks,
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
<http://www.libera.net/correoweb/redir.php?https://www.plaxo.com/add_me?u=51
540170138&v0=1125188&k0=1660502549>
<http://www.libera.net/correoweb/redir.php?http://www.plaxo.com/signature>
More information about the Gluster-devel
mailing list