[Gluster-devel] replicate with 5 nodes - and adding more nodes in the future

Wed Jun 27 06:02:24 UTC 2007

after copying some few thousands files and deleting and copying again
i get a lot of errors:
File descriptor in bad state
No such file or directory

and a lot of
[Jun 26 05:45:13] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server:
connection to server disconnected
[Jun 26 05:45:13] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw:
0 bytes r/w instead of 113 (errno=9)

in the glusterd.log

I have set it up like this:

1.3-pre4

5 servers + 5 clients (running on same boxes as servers).
what could cause the disconnection ?

server:

volume gfs
  type storage/posix
  option directory /mnt/gluster/gfs1
end-volume

volume gfs-afr
  type storage/posix
  option directory /mnt/gluster/afr-gfs1
end-volume

volume server
  type protocol/server
  option transport-type tcp/server
 option listen-port 6996
  subvolumes gfs gfs-afr
  option auth.ip.gfs.allow *
  option auth.ip.gfs-afr.allow *
end-volume

client:
volume gfs
  type storage/posix
  option directory /mnt/gluster/gfs1
end-volume

volume gfs-afr
  type storage/posix
  option directory /mnt/gluster/afr-gfs1
end-volume

volume server
  type protocol/server
  option transport-type tcp/server
 option listen-port 6996
  subvolumes gfs gfs-afr
  option auth.ip.gfs.allow *
  option auth.ip.gfs-afr.allow *
end-volume
[root at hd-t1157cl etc]# cat cluster-client.vol
volume a1
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.47.0.10
  option remote-port 6996
  option remote-subvolume gfs
end-volume

volume a2
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.47.0.10
  option remote-port 6996
  option remote-subvolume gfs-afr
end-volume

volume b1
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.47.0.11
  option remote-port 6996
  option remote-subvolume gfs
end-volume

volume b2
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.47.0.11
  option remote-port 6996
  option remote-subvolume gfs-afr
end-volume

volume c1
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.47.0.12
  option remote-port 6996
  option remote-subvolume gfs
end-volume

volume c2
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.47.0.12
  option remote-port 6996
  option remote-subvolume gfs-afr
end-volume

volume d1
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.47.0.13
  option remote-port 6996
  option remote-subvolume gfs
end-volume

volume d2
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.47.0.13
  option remote-port 6996
  option remote-subvolume gfs-afr
end-volume

volume e1
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.47.0.14
  option remote-port 6996
  option remote-subvolume gfs
end-volume

volume e2
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.47.0.14
  option remote-port 6996
  option remote-subvolume gfs-afr
end-volume

volume afr1
  type cluster/afr
  subvolumes a1 e2
  option replicate *:2
end-volume

volume afr2
  type cluster/afr
  subvolumes b1 d2
  option replicate *:2
end-volume

volume afr3
  type cluster/afr
  subvolumes c1 a2
  option replicate *:2
end-volume

volume afr4
  type cluster/afr
  subvolumes d1 b2
  option replicate *:2
end-volume

volume afr5
  type cluster/afr
  subvolumes e1 c2
  option replicate *:2
end-volume

volume gfstest
  type cluster/unify
  subvolumes afr1 afr2 afr3 afr4 afr5
  option scheduler rr
  option rr.limits.min-free-disk 5GB
end-volume

On 6/26/07, Sebastien LELIEVRE <slelievre at tbs-internet.com> wrote:
>
> Hi again !
>
> Shai DB a écrit :
> > another question
> > I notice that 1.2 dont have the AFR on its source
> > how can i use/install it anyway ?
> > i saw 1.3-pre has it..
> > is the 1.3-pre OK for production ?
> > thanks
> >
>
> I had forgotten this point ! :)
>
> Yes, 1.3-pre4 archive is stable enough for production, but you can also
> use the tla repository with the branch 2.4 which is stable enough (to
> me) to be used in production.
>
> Just note that the 1.3 stable release will be based on the 2.5
> mainbranch and will include self-heal feature (and many more !)
>
> Cheers,
>
> Sebastien LELIEVRE
> slelievre at tbs-internet.com           Services to ISP
> TBS-internet                   http://www.TBS-internet.com
>
> > I need it for replication (to have 2 copies of data in case of crash)
> >
> >
> > On 6/26/07, *Sebastien LELIEVRE* <slelievre at tbs-internet.com
> > <mailto:slelievre at tbs-internet.com>> wrote:
> >
> >     Hi,
> >
> >     I just wanted to stress this :
> >
> >     Shai a écrit :
> >     > Hello, we are testing glusterfs 1.2 and I have few questions -
> >
> >     1.2 doesn't bring "self-heal" with it, so keep in mind that if a
> drives
> >     crashes, you would have to sync your new drive "manually" with the
> >     others.
> >
> >
> > so to just copy all data to the replaced disk from his afr 'pair' ?
> >
> >
> >     BUT, 1.3 is going to correct this, and this is good :)
> >
> >     That's all I had to add
> >
> >     Cheers,
> >
> >     Sebastien LELIEVRE
> >     slelievre at tbs-internet.com
> >     <mailto:slelievre at tbs-internet.com>           Services to ISP
> >     TBS-internet                   http://www.TBS-internet.com
> >
> >     Krishna Srinivas a écrit :
> >     > As of now you need to restart glusterfs if there is any change
> >     > in the config spec file. However in future versions you wont need
> >     > to remount (This is in our road map)
> >     >
> >     > On 6/25/07, Shai DB <dbshai at gmail.com <mailto:dbshai at gmail.com>>
> >     wrote:
> >     >> thanks for the answer
> >     >> this seems easy and neat to setup
> >     >>
> >     >> another question is, if i add 2 more nodes to the gang
> >     >> how can i setup all the clients with the new configuration,
> without
> >     >> need to
> >     >> 'remount' the glusterfs ?
> >     >>
> >     >> Thanks
> >     >>
> >     >>
> >     >> On 6/25/07, Krishna Srinivas <krishna at zresearch.com
> >     <mailto:krishna at zresearch.com>> wrote:
> >     >> >
> >     >> > On 6/25/07, Shai DB < dbshai at gmail.com
> >     <mailto:dbshai at gmail.com>> wrote:
> >     >> > > Hello, we are testing glusterfs 1.2 and I have few questions
> -
> >     >> > >
> >     >> > >
> >     >> > > 1. we are going to store millions of small jpg files that
> >     will be
> >     >> read
> >     >> > by
> >     >> > > webserver - is glusterfs good solution for this ?
> >     >> >
> >     >> > Yes, definitely.
> >     >> >
> >     >> > > 2. we are going to run both server+clients on each node
> >     together with
> >     >> > apache
> >     >> > >
> >     >> > > 3. replicate *:2
> >     >> > >
> >     >> > > the way i think doing replicate is defining on each server 2
> >     >> volumes and
> >     >> > > using AFR:
> >     >> > >
> >     >> > > server1: a1, a2
> >     >> > > server2: b1, b2
> >     >> > > server3: c1, c2
> >     >> > > server4: d1, d2
> >     >> > > server5: e1, e2
> >     >> > >
> >     >> > > afr1: a1+b2
> >     >> > > afr2: b1+c2
> >     >> > > afr3: c1+d2
> >     >> > > afr4: d1+e2
> >     >> > > afr5: e1+a2
> >     >> > >
> >     >> > > and then unify = afr1+afr2+afr3+afr4+afr5 with replicate
> option
> >     >> > >
> >     >> > > is this correct way ?
> >     >> > > and what to do on the future when we add more nodes ? when
> >     >> changing the
> >     >> > afr
> >     >> > > (adding and changing the couples) making glusterfs
> >     >> > > redistribute the files the new way ?
> >     >> >
> >     >> > Yes this is the right way. If you add one more server f, the
> one
> >     >> solution
> >     >> > is to move contents of a2 to f2 and clean up a2 and have it as
> >     >> following:
> >     >> >
> >     >> > afr5: e1 + f2
> >     >> > afr6: f1 + a2
> >     >> >
> >     >> > Cant think of an easier solution.
> >     >> >
> >     >> > But if we assume that you will always add 2 servers when you
> >     want to
> >     >> add,
> >     >> > we can have the setup in following way:
> >     >> > afr1: a1 + b2
> >     >> > afr2: b1 + a2
> >     >> > afr3: c1 + d2
> >     >> > afr4: d1 + c2
> >     >> > afr5: e1 + f2
> >     >> > afr6: f1 + e2
> >     >> >
> >     >> > Now when you add a pair of servers to this (g, h):
> >     >> > afr7: f1 + h2
> >     >> > afr8: h1 +f2
> >     >> >
> >     >> > Which is very easy. But you will have to add 2 servers
> everytime.
> >     >> > The advantage is that it is easier to visualize the setup and
> add
> >     >> > new nodes.
> >     >> >
> >     >> > Thinking further, if we assume that you will replicate all the
> >     files
> >     >> > twice (option replicate *:2) you can have the following setup:
> >     >> > afr1: a + b
> >     >> > afr2: c + d
> >     >> > afr3: e + f
> >     >> >
> >     >> > This is a very easy setup. It is simple to add a fresh pair
> >     (afr4: g
> >     >> +h)
> >     >> >
> >     >> > You can have whatever setup you want depending on your
> >     >> > convinience and requirement.
> >     >> >
> >     >> > >
> >     >> > > 4. what happens when a hard drive goes down and replaces, the
> >     cluster
> >     >> > also
> >     >> > > redistribute the files ?
> >     >> >
> >     >> > When a hard drive is replaced, missing files will be replicated
> >     from
> >     >> the
> >     >> > AFR's other child.
> >     >> >
> >     >> > Regards
> >     >> > Krishna
> >     >> >
> >     >> > -------
> >     >> >
> >     >> > The best quote ever : '
> >     >> >
>
>