[Gluster-devel] Problem with clients that goes down..
Krishna Srinivas
krishna at zresearch.com
Mon Apr 21 13:01:27 UTC 2008
Your description says that you are powering the client down. I will
try to reproduce this bug and get back to you.
Krishna
On Mon, Apr 21, 2008 at 6:22 PM, Krishna Srinivas <krishna at zresearch.com> wrote:
> One doubt, are you sure you are not stopping the server on which
> the namespace is there?
>
> On Mon, Apr 21, 2008 at 6:00 PM, Antonio González
>
>
> <antonio.gonzalez at libera.net> wrote:
> > Thanks Krishna, dont worry for not respond, i think is a hard work to
> > maintain this list!!!
> >
> >
> >
> > Well, the main problem is the first you note. I have made some test over
> > glusters to check the viability when client goes down, I can see that some
> > times if a client hangs while making any operation (read/write) other
> > clients don't work correctly.
> >
> >
> >
> > I proved this issue in several scenarios, and I can see this problem always.
> > Mi last test can explain you the problem. I have 4 machines, two servers and
> > to clients.
> >
> >
> >
> > One server export one brick for storage (posix storage), the other server
> > exports a brick for namespace and a brick for storage. The unify translator
> > is place at client side.
> >
> >
> >
> > The test is: From one client I cp a file (from local to glusters and vice
> > versa) while the client is completing the cp I power down the client, then
> > from other client I try a "ls" command (I proved also sha1sum over a file in
> > the Gluster, cp, cat ...), the client finishes blocked during a large time.
> > Some times finish the command (for example "ls" 2/3 minutes) and other times
> > send an error message.
> >
> >
> >
> > Note: some times the client is not blocked and the gluster works fine. Is
> > difficult to prevent when the client will be blocked and when no.
> >
> >
> >
> >
> >
> > As I comment previously I test this issue with several scenarios, with and
> > without AFR (I think the problem is because unify translator), the unify
> > translator at the client side and at the server side, one server and two
> > clients, 2 server and 2 clients, 3 server and two clients.
> >
> >
> >
> > The issue about timeout option is related about this problem. I test with
> > the timeout option to see the impact over the same tests. I can see that if
> > I define a timeout, when a client try a ls command (or cp, sha1sum ..) the
> > recovery time is less than if I not define timeout. I don't know the
> > relation about this, but it seems that with timeout the client when the
> > timeout expire try the command other time and this time the command finish
> > successfully but I don't sure about this.
> >
> >
> >
> >
> >
> > The config files of this last test:
> >
> >
> >
> >
> >
> > Server1
> >
> >
> >
> > volume brick
> >
> > type storage/posix
> >
> > option directory /home/pruebaD
> >
> > end-volume
> >
> >
> >
> > volume brick-ns
> >
> > type storage/posix
> >
> > option directory /home/namespace
> >
> > end-volume
> >
> >
> >
> >
> >
> > volume server
> >
> > type protocol/server
> >
> > subvolumes brick brick-ns
> >
> > option transport-type tcp/server
> >
> > option auth.ip.brick.allow *
> >
> > option auth.ip.brick-ns.allow *
> >
> > option listen-port 6996 # Default is 6996
> >
> > option client-volume-filename
> > etc/glusterfs/pruebaDistribuida/glusterfs-client.vol
> >
> > end-volume
> >
> >
> >
> >
> >
> > Sever2
> >
> >
> >
> > volume brick
> >
> > type storage/posix
> >
> > option directory /home/pruebaD
> >
> > end-volume
> >
> >
> >
> > volume server
> >
> > type protocol/server
> >
> > subvolumes brick
> >
> > option transport-type tcp/server
> >
> > option auth.ip.brick.allow *
> >
> > end-volume
> >
> >
> >
> >
> >
> >
> >
> > Clients
> >
> >
> >
> > volume brick1
> >
> > type protocol/client
> >
> > option transport-type tcp/client
> >
> > option remote-host 10.1.0.45
> >
> > option remote-subvolume brick
> >
> > end-volume
> >
> >
> >
> > volume brick2
> >
> > type protocol/client
> >
> > option transport-type tcp/client
> >
> > option remote-host 10.1.0.40
> >
> > option remote-subvolume brick
> >
> > end-volume
> >
> >
> >
> >
> >
> > volume ns1
> >
> > type protocol/client
> >
> > option transport-type tcp/client
> >
> > option remote-host 10.1.0.45
> >
> > option remote-subvolume brick-ns
> >
> > end-volume
> >
> >
> >
> >
> >
> > volume unify
> >
> > type cluster/unify
> >
> > subvolumes brick1 brick2
> >
> > option namespace ns1
> >
> > option scheduler rr
> >
> > end-volume
> >
> >
> >
> >
> >
> >
> >
> > The version of glusters is 1.3.8pre5, fuse 2.7.2glfs9. The OS is gentoo
> > kernel 2.6.23-r6.
> >
> >
> >
> >
> >
> >
> >
> > Thanks for the reply,
> >
> >
> >
> >
> >
> > -----Mensaje original-----
> > De: krishna.srinivas at gmail.com [mailto:krishna.srinivas at gmail.com] En nombre
> > de Krishna Srinivas
> > Enviado el: lunes, 21 de abril de 2008 13:09
> > Para: Antonio González
> > CC: gluster-devel at nongnu.org
> > Asunto: Re: [Gluster-devel] Problem with clients that goes down..
> >
> >
> >
> >
> >
> > Hi Antonio,
> >
> >
> >
> > Excuse us, somehow your issue was not responded to.
> >
> >
> >
> > If I understand correctly, you are facing two problems:
> >
> > 1) plugging out the cable on one client will make other clients hang
> >
> > 2) the timeout value you specify in spec file does not reflect
> >
> > in the actual timeout you see when you access glusterfs.
> >
> >
> >
> > Is that correct? I have lost track of your setup details. Searching mail
> >
> > archives did not give me the exact picture. Can you give the setup
> >
> > details with config files? And also the tests?
> >
> >
> >
> > Surely the problem you are facing should be fixed.
> >
> >
> >
> > Regards
> >
> > Krishna
> >
> >
> >
> >
> >
> > On Mon, Apr 21, 2008 at 3:58 PM, Antonio González
> >
> > <antonio.gonzalez at libera.net> wrote:
> >
> > > Hello all,
> >
> > >
> >
> > >
> >
> > >
> >
> > > I have made a lot of tests over GlusterFS to verify his viability. I
> > wrote
> >
> > > at this list one or two weeks ago asking about an issue with clients that
> >
> > > goes down and causes problems with other clients that can not access to
> > the
> >
> > > Gluster file system.
> >
> > >
> >
> > >
> >
> > >
> >
> > > Are the developers of GlusterFS noticed about this issue? I think that
> > is a
> >
> > > serious problem and I need an answer to advice or not the use of
> > GlusterFS
> >
> > > in a project.
> >
> > >
> >
> > >
> >
> > >
> >
> > > I proved this issue over several scenarios (AFR/unify at server side,
> > client
> >
> > > side, without AFR…), and I think that the problem is the unify
> > translator.
> >
> > > I made a test with one server and two clients. Without unify translator
> >
> > > works fine, a client who goes down while reads or copy a file, don't
> > affect
> >
> > > other clients. With the unify translator, if a client who reads/writes
> > file
> >
> > > goes down causes the problem (other clients that tries an "ls" command
> > are
> >
> > > blocked).
> >
> > >
> >
> > >
> >
> > >
> >
> > > I made a test with two servers (without AFR, unify at client side), I
> > have
> >
> > > localized files in each server, I try to block one server and access to a
> >
> > > file in the other server (cp command). I can see that the access to this
> >
> > > server (no blocked) is in function of the timeout option. If I don't set
> >
> > > timeout, the client takes 2 or 3 minutes and not finishes the command. If
> > I
> >
> > > set a timeout of 20 sec the client takes 32 sec and finishes the command.
> >
> > > For a timeout of 40 s. the client takes 60 sec approximately.
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > > I would like to know at least if this problem is recognized by the
> >
> > > developers of Gluster. They know which is problem? They working to solve
> >
> > > it? .
> >
> > >
> >
> > >
> >
> > >
> >
> > > Thanks,
> >
> > >
> >
> > > _______________________________________________
> >
> > > Gluster-devel mailing list
> >
> > > Gluster-devel at nongnu.org
> >
> > > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >
> > >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > <http://www.libera.net/correoweb/redir.php?https://www.plaxo.com/add_me?u=51
> > 540170138&v0=1125188&k0=1660502549>
> >
> > <http://www.libera.net/correoweb/redir.php?http://www.plaxo.com/signature>
> >
> >
> >
> >
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at nongnu.org
> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >
>
More information about the Gluster-devel
mailing list