[Gluster-devel] Confused on AFR, where does it happen client or server

Sascha Ottolski ottolski at web.de
Tue Jan 8 07:55:21 UTC 2008


Am Dienstag 08 Januar 2008 06:06:30 schrieb Anand Avati:
> > > Brandon,
> > >  who does the copy is decided where the AFR translator is loaded. if
> > > you have AFR loaded on the client side, then the client does the two
> > > writes.
> >
> > you
> >
> > > can also have AFR loaded on the server side, and handle server do the
> > > replication. Translators can be loaded anywhere (client or server,
> >
> > anywhere
> >
> > > in the graph). You need to think more on the lines on how you can
> >
> > 'program
> >
> > > glusterfs' rather than how to 'configure glusterfs'.
> >
> > Performance-wise, which is better?  Or does it make sense one way vs.
> > the other based on number of clients?
>
> Depends, if the interconnect between server and client is precious, then
> have the servers replicate (load afr on server side) with replication
> happening on a seperate network. This is also good if you  have servers
> interconnected with high speed networks like infiniband.
>
> If your servers are having just one network interface (no seperate network
> for replication), and your client apps are IO bound, then it does not
> matter where you load AFR; they all would give the same performance.
>
> avati

i did a simple test recently, which suggests that there is a significant
performance difference: I did a comparison of client vs. server 
side afr with bonnie, for a one client and two servers setup with tla 
patch628, connected over GB Ethernet; please see my results below. 

There also was a posting on this list with a lot of test results, 
suggesting that server side afr is fastest: 
http://lists.nongnu.org/archive/html/gluster-devel/2007-08/msg00136.html

In my own results though, client-side afr seems to be better in most of
the test; I should note that I'm not sure if the chosen setup has a 
negative impact on the performance (two servers afr-ing each other), so 
any comments on this would be highly appreciate (I add the configs for 
the tests below).

server side afr (I hope it stays readable):

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
stf-db22     31968M 31438  43 35528   0   990   0 32375  43 41107   1  38.1   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16    34   0   416   0   190   0    35   0   511   0   227   0


client side afr:

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
stf-db22     31968M 27583  38 31518   0   862   0 49522  63 56388   2  28.0   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16   418   0  2225   1   948   1   455   0  2305   1   947   0



server side afr config:


glusterfs-server.vol.server_afr:

  volume fsbrick1
    type storage/posix
    option directory /data1
  end-volume
  
  volume fsbrick2
    type storage/posix
    option directory /data2
  end-volume
  
  volume nsfsbrick1
    type storage/posix
    option directory /data-ns1
  end-volume
  
  volume brick1
    type performance/io-threads
    option thread-count 8
    option queue-limit 1024
    subvolumes fsbrick1
  end-volume
  
  volume brick2
    type performance/io-threads
    option thread-count 8
    option queue-limit 1024
    subvolumes fsbrick2
  end-volume
  
  volume brick1r
    type protocol/client
    option transport-type tcp/client
    option remote-host 10.10.1.99
    option remote-subvolume brick2
  end-volume
  
  volume afr1
    type cluster/afr
    subvolumes brick1 brick1r
    # option replicate *:2 # obsolete with tla snapshot
  end-volume
  
  ### Add network serving capability to above bricks.
  volume server
    type protocol/server
    option transport-type tcp/server     # For TCP/IP transport
    option listen-port 6996              # Default is 6996
    option client-volume-filename /etc/glusterfs/glusterfs-client.vol
    subvolumes afr1 nsfsbrick1
    option auth.ip.afr1.allow * # Allow access to "brick" volume
    option auth.ip.brick2.allow * # Allow access to "brick" volume
    option auth.ip.nsfsbrick1.allow * # Allow access to "brick" volume
  end-volume






glusterfs-client.vol.test.server_afr:

  volume fsc1
    type protocol/client
    option transport-type tcp/client
    option remote-host 10.10.1.10
    option remote-subvolume afr1
  end-volume

  volume fsc2
    type protocol/client
    option transport-type tcp/client
    option remote-host 10.10.1.99
    option remote-subvolume afr1
  end-volume

  volume ns1
    type protocol/client
    option transport-type tcp/client
    option remote-host 10.10.1.10
    option remote-subvolume nsfsbrick1
  end-volume
  
  volume ns2
    type protocol/client
    option transport-type tcp/client
    option remote-host 10.10.1.99
    option remote-subvolume nsfsbrick1
  end-volume
  
  volume afrns
    type cluster/afr
    subvolumes ns1 ns2
  end-volume
  
  volume bricks
    type cluster/unify
    subvolumes fsc1 fsc2
    option namespace afrns
    option scheduler alu
    option alu.limits.min-free-disk  5%              # Stop creating files when free-space lt 5 %
    option alu.limits.max-open-files 10000
    option alu.order disk-usage:read-usage:write-usage:open-files-usage:disk-speed-usage
    option alu.disk-usage.entry-threshold 2GB          # Units in KB, MB and GB are allowed
    option alu.disk-usage.exit-threshold  60MB         # Units in KB, MB and GB are allowed
    option alu.open-files-usage.entry-threshold 1024
    option alu.open-files-usage.exit-threshold 32
    option alu.stat-refresh.interval 10sec
  end-volume
  
  volume readahead
    type performance/read-ahead
    option page-size 256KB
    option page-count 2
    subvolumes bricks
  end-volume
  
  volume write-behind
    type performance/write-behind
    option aggregate-size 1MB
    subvolumes readahead
  end-volume


-----------------------------------------------------------------------

client side afr config:


glusterfs-server.vol.client_afr:

  volume fsbrick1
    type storage/posix
    option directory /data1
  end-volume
  
  volume fsbrick2
    type storage/posix
    option directory /data2
  end-volume
  
  volume nsfsbrick1
    type storage/posix
    option directory /data-ns1
  end-volume
  
  volume brick1
    type performance/io-threads
    option thread-count 8
    option queue-limit 1024
    subvolumes fsbrick1
  end-volume
  
  volume brick2
    type performance/io-threads
    option thread-count 8
    option queue-limit 1024
    subvolumes fsbrick2
  end-volume
  
  ### Add network serving capability to above bricks.
  volume server
    type protocol/server
    option transport-type tcp/server     # For TCP/IP transport
    option listen-port 6996              # Default is 6996
    option client-volume-filename /etc/glusterfs/glusterfs-client.vol
    subvolumes brick1 brick2 nsfsbrick1
    option auth.ip.brick1.allow * # Allow access to "brick" volume
    option auth.ip.brick2.allow * # Allow access to "brick" volume
    option auth.ip.nsfsbrick1.allow * # Allow access to "brick" volume
  end-volume






glusterfs-client.vol.test.client_afr:

  volume fsc1
    type protocol/client
    option transport-type tcp/client
    option remote-host 10.10.1.10
    option remote-subvolume brick1
  end-volume
  
  volume fsc1r
    type protocol/client
    option transport-type tcp/client
    option remote-host 10.10.1.10
    option remote-subvolume brick2
  end-volume
  
  volume fsc2
    type protocol/client
    option transport-type tcp/client
    option remote-host 10.10.1.99
    option remote-subvolume brick1
  end-volume
  
  volume fsc2r
    type protocol/client
    option transport-type tcp/client
    option remote-host 10.10.1.99
    option remote-subvolume brick2
  end-volume
  
  volume afr1
    type cluster/afr
    subvolumes fsc1 fsc2r
    # option replicate *:2 # obsolete with tla snapshot
  end-volume
  
  volume afr2
    type cluster/afr
    subvolumes fsc2 fsc1r
    # option replicate *:2 # obsolete with tla snapshot
  end-volume
  
  volume ns1
    type protocol/client
    option transport-type tcp/client
    option remote-host 10.10.1.10
    option remote-subvolume nsfsbrick1
  end-volume
  
  volume ns2
    type protocol/client
    option transport-type tcp/client
    option remote-host 10.10.1.99
    option remote-subvolume nsfsbrick1
  end-volume
  
  volume afrns
    type cluster/afr
    subvolumes ns1 ns2
    # option replicate *:2 # obsolete with tla snapshot
  end-volume
  
  volume bricks
    type cluster/unify
    subvolumes afr1 afr2
    option namespace afrns
    option scheduler alu
    option alu.limits.min-free-disk  5%              # Stop creating files when free-space lt 5 %
    option alu.limits.max-open-files 10000
    option alu.order disk-usage:read-usage:write-usage:open-files-usage:disk-speed-usage
    option alu.disk-usage.entry-threshold 2GB          # Units in KB, MB and GB are allowed
    option alu.disk-usage.exit-threshold  60MB         # Units in KB, MB and GB are allowed
    option alu.open-files-usage.entry-threshold 1024
    option alu.open-files-usage.exit-threshold 32
    option alu.stat-refresh.interval 10sec
  end-volume
  
  volume readahead
    type performance/read-ahead
    option page-size 256KB
    option page-count 2
    subvolumes bricks
  end-volume
  
  volume write-behind
    type performance/write-behind
    option aggregate-size 1MB
    subvolumes readahead
  end-volume


Cheers, Sascha





More information about the Gluster-devel mailing list