[Gluster-users] building 4-nodes cluster

Keith Freedman freedman at FreeFormIT.com
Mon Aug 11 13:20:58 UTC 2008


I haven't had time to do any thorough testing..  I wont for a couple 
weeks unfortunately.

Here's what "seems" to be going on.

  I have 4 systems, which monitor eachother.  every 2 minutes they 
pull a page from each of the other 3 servers. (this is a php script 
which returns the hostname & timestamp).
When things seem to get stuck, it seems that most of the websites are 
working fine with the following exceptions.

since it takes a few minutes for my pager to go off, there are 
usually a stack of php proceses hung accessing the status.php script.

Sometimes there will be other virtualhosts' scripts lingering, but 
usually only one or 2 if so.  These are generally the index.php file 
for some of the busy hosts, or the cart.php file for a busy shopping site.

gluster seems to be pretty happy during all this, so I'm not sure if 
the problem is on the underlying filesystem or fuse. (I'm not using 
the gluster optimized fuse at the moment--I don't have all the kernel 
sources to build it).

I did realize, after I read your original email that I didn't have 
"option mandatory on" in the locks brick.   I enabled that and amd 
thinking that might solve the problem.

since I have threads brick enabled, I'm now wondering if there was 
some strange thing related to the semaphores where they were getting 
removed while something was trying to get a lock on them.  the file 
goes away, the lock request doesn't know what to do with itself and 
just sits there waiting forever???

I'm speculating, but there's the behavior I've been able to observe.

Make sure when you do your tests, you have some scripts that take a 
while to process and some that are really super fast.

I think the super fast ones cause most of the problem.  If my 
suspicions about the semaphores and the locks is true, that is likely 
where you'll get tripped up.

keep me posted, would love to hear any results of your testing.

Keith

At 02:40 AM 8/11/2008, Roman Hlynovskiy wrote:
>Hello Keith,
>
>ok thanks, we will try to make stress tests with php and check if the
>same situation apply to our configuration.
>did this semaphore issue occurred only with some specific number of
>simultaneous connections or it was matter of "luck" :) ?
>
>
>2008/8/11 Keith Freedman <freedman at freeformit.com>:
> > I'll let one of the devs respond to your specific config.
> >
> > There are a couple cautions ...
> > if you're running PHP, you'll want to modify your php.ini to have
> > session_save_path on shared storage..  If someones session starts on server
> > one and the browser directs them to server2 , their session is missing
> >  (Either that or use DB based sessions).
> >
> > I've noticed some problems with this configuration, in that it seems PHP
> > likes to create semaphores all the time.  These get created in
> > session_save_path.   There seems to be some cases where processes sometimes
> > block on the semaphore form the other server.
> >
> > I haven't been able to figure out exactly why, and it may be 
> exclusive to my
> > configuration, but it's something to watch out for.
> > You might end up with non-killable php processes out iowait blocked.  the
> > only solution has been to kill gluster and remount the filesystem.  This
> > only takes a second but it's inconvenient, and until you realize it's
> > happening, any process which tries to access the same files will 
> block also,
> > thus eventually consuming all your spare httpd processes.
> >
> > Keith
> >
> > At 10:51 PM 8/10/2008, Roman Hlynovskiy wrote:
> >>
> >> Hello everyone,
> >>
> >> We want to build a cluster of 4 web-servers. ftp and http will be
> >> load-balanced, so we will never know which node will serve ftp/http
> >> traffic.
> >> Since we don't want to loose any part of functionality in case of
> >> getting one of the servers out of order, we have invented the
> >> following architecture:
> >>  - each server will have 2 data bricks and 1 namespace bricks
> >>  - each second data brick is AFRed with first data brick of the next
> >> server
> >>  - all namespace bricks ar AFRed
> >>
> >> we've tried to follow recommendations from wiki and the following
> >> configs have been created:
> >> ------------------------------- begin server config
> >> -------------------------------------------
> >>
> >> #
> >> # Object Storage Brick 1
> >> #
> >>
> >> # low-level brick pointing to physical folder
> >> volume posix1
> >>        type storage/posix
> >>        option directory /mnt/os1/export
> >> end-volume
> >>
> >> # put support for fcntl over brick
> >> volume locks1
> >>        type features/posix-locks
> >>        subvolumes posix1
> >>        option mandatory on
> >> end-volume
> >>
> >> # put additional io threads for this brick
> >> volume brick1
> >>        type performance/io-threads
> >>        option thread-count 4
> >>        option cache-size 32MB
> >>        subvolumes locks1
> >> end-volume
> >>
> >> #
> >> # Object Storage Brick 2
> >> #
> >>
> >> # low-level brick pointing to physical folder
> >> volume posix2
> >>        type storage/posix
> >>        option directory /mnt/os2/export
> >> end-volume
> >>
> >> # put support for fcntl over brick
> >> volume locks2
> >>        type features/posix-locks
> >>        subvolumes posix2
> >>        option mandatory on
> >> end-volume
> >>
> >> # put additional io threads for this brick
> >> volume brick2
> >>        type performance/io-threads
> >>        option thread-count 4
> >>        option cache-size 32MB
> >>        subvolumes locks2
> >> end-volume
> >>
> >> #
> >> # Metadata Storage
> >> #
> >>
> >> volume brick1ns
> >>        type storage/posix
> >>        option directory /mnt/ms1
> >> end-volume
> >>
> >> #
> >> # Volume to export
> >> #
> >>
> >> volume server
> >>        type protocol/server
> >>        subvolumes brick1 brick2 brick1ns brick2ns
> >>        option transport-type tcp/server
> >>        option auth.ip.brick1.allow *
> >>        option auth.ip.brick2.allow *
> >>        option auth.ip.brick1ns.allow *
> >> end-volume
> >>
> >> ------------------------------- end server config
> >> -------------------------------------------
> >>
> >> and client config from one of the nodes
> >>
> >> ------------------------------- begin client config
> >> -------------------------------------------
> >>
> >> ### begin x-346-01 ###
> >>
> >> volume brick01
> >>  type protocol/client
> >>  option transport-type tcp/client
> >>  option remote-host 192.168.252.11
> >>  option remote-subvolume brick1
> >> end-volume
> >>
> >> volume brick02
> >>  type protocol/client
> >>  option transport-type tcp/client
> >>  option remote-host 192.168.252.11
> >>  option remote-subvolume brick2
> >> end-volume
> >>
> >> volume brick01ns
> >>  type protocol/client
> >>  option transport-type tcp/client
> >>  option remote-host 192.168.252.11
> >>  option remote-subvolume brick1ns
> >> end-volume
> >>
> >> ### end x-346-01 ###
> >>
> >>
> >>
> >> ### begin x-346-02 ###
> >>
> >> volume brick03
> >>  type protocol/client
> >>  option transport-type tcp/client
> >>  option remote-host 192.168.252.21
> >>  option remote-subvolume brick1
> >> end-volume
> >>
> >> volume brick04
> >>  type protocol/client
> >>  option transport-type tcp/client
> >>  option remote-host 192.168.252.21
> >>  option remote-subvolume brick2
> >> end-volume
> >>
> >> volume brick03ns
> >>  type protocol/client
> >>  option transport-type tcp/client
> >>  option remote-host 192.168.252.21
> >>  option remote-subvolume brick1n
> >> end-volume
> >>
> >> ### end x-346-02 ###
> >>
> >>
> >>
> >> ### begin x-346-03 ###
> >>
> >> volume brick05
> >>  type protocol/client
> >>  option transport-type tcp/client
> >>  option remote-host 192.168.252.31
> >>  option remote-subvolume brick1
> >> end-volume
> >>
> >> volume brick06
> >>  type protocol/client
> >>  option transport-type tcp/client
> >>  option remote-host 192.168.252.31
> >>  option remote-subvolume brick2
> >> end-volume
> >>
> >> volume brick05ns
> >>  type protocol/client
> >>  option transport-type tcp/client
> >>  option remote-host 192.168.252.31
> >>  option remote-subvolume brick1ns
> >> end-volume
> >>
> >> ### begin x-346-03 ###
> >>
> >>
> >>
> >> ### begin x-346-04 ###
> >>
> >> volume brick07
> >>  type protocol/client
> >>  option transport-type tcp/client
> >>  option remote-host 192.168.252.41
> >>  option remote-subvolume brick1
> >> end-volume
> >>
> >> volume brick08
> >>  type protocol/client
> >>  option transport-type tcp/client
> >>  option remote-host 192.168.252.41
> >>  option remote-subvolume brick2
> >> end-volume
> >>
> >> volume brick07ns
> >>  type protocol/client
> >>  option transport-type tcp/client
> >>  option remote-host 192.168.252.41
> >>  option remote-subvolume brick1ns
> >> end-volume
> >>
> >> ### begin x-346-04 ###
> >>
> >>
> >>
> >> ### afr bricks ###
> >>
> >> volume afr01
> >>  type cluster/afr
> >>  subvolumes brick02 brick03
> >> end-volume
> >>
> >> volume afr02
> >>  type cluster/afr
> >>  subvolumes brick04 brick05
> >> end-volume
> >>
> >> volume afr03
> >>  type cluster/afr
> >>  subvolumes brick06 brick07
> >> end-volume
> >>
> >> volume afr04
> >>  type cluster/afr
> >>  subvolumes brick08 brick01
> >> end-volume
> >>
> >> volume afrns
> >>  type cluster/afr
> >>  subvolumes brick01ns brick03ns brick05ns brick07ns
> >> end-volume
> >>
> >> ### unify ###
> >>
> >> volume unify
> >>  type cluster/unify
> >>  option namespace afrns
> >>  option scheduler nufa
> >>  option nufa.local-volume-name brick03
> >>  option nufa.local-volume-name brick04
> >>  option nufa.limits.min-free-disk 5%
> >>  subvolumes afr01 afr02 afr03 afr04
> >> end-volume
> >>
> >> ------------------------------- end client config
> >> -------------------------------------------
> >>
> >> seems everything is working fine, but we want to know if there are any
> >> alternatives to such configuration and maybe some additional
> >> optimizations may be applied?
> >> is there any mechanisms to split one file over more than 2 nodes?
> >> Do we need readahead translators if we use nufa with local-volume
> >> options? what about write-ahead? did we miss something else?
> >>
> >>
> >> --
> >> ...WBR, Roman Hlynovskiy
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
> >
> >
>
>
>
>--
>...WBR, Roman Hlynovskiy
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users





More information about the Gluster-users mailing list