[Gluster-devel] Confused on AFR, where does it happen client or server

Sascha Ottolski ottolski at web.de
Tue Jan 8 09:53:37 UTC 2008


Am Dienstag 08 Januar 2008 09:44:34 schrieb Anand Avati:
> was it just one file being read/written? if not please use rr (a more
> deterministic scheduler) and share the numbers please.

I did only the bonnie tests, posting I put the link in wasn't by me.

Cheers, Sascha


> about the create and delete rate, client side afr is defnitely faster since
> the create operations happen parallely (w.r.t to the network - 1x the
> time.) but if you have afr on server, it happens serially across the
> machines (2x the time.. 1x upto the 1st server, and 1x to the remaining N-1
> servers).
>
> not that for such a configuration, unify is not needed, and removing unify
> and having plain AFR will increase your create rate further since unify
> serializes creates across the namespace and storage.
>
> about throughput, the writes seem to be faster with server AFR (or did my
> eyes defeat me with that indentation?) and reads faster with client AFR.
>
> the faster writes might be because of a good job done by the full-duplex
> switch. when afr is on the client side, both copies are outbound on the
> same outbound channel of the client NIC, effectively serially writing two
> copies, but when on the server side, the replication copy is using the
> outbound channel of the server NIC  while the main loop is parallely
> fetching the next write block in the server's inbound channel. using
> io-threads between afr and protocol/client on the replication path might
> help further.
>
> the slower reads might be because.. well i'm still not sure, maybe you have
> secretly applied the striped readv patch for afr? :)
>
> avati
>
> 2008/1/8, Sascha Ottolski <ottolski at web.de>:
> > Am Dienstag 08 Januar 2008 06:06:30 schrieb Anand Avati:
> > > > > Brandon,
> > > > >  who does the copy is decided where the AFR translator is loaded.
> > > > > if you have AFR loaded on the client side, then the client does the
> > > > > two writes.
> > > >
> > > > you
> > > >
> > > > > can also have AFR loaded on the server side, and handle server do
> >
> > the
> >
> > > > > replication. Translators can be loaded anywhere (client or server,
> > > >
> > > > anywhere
> > > >
> > > > > in the graph). You need to think more on the lines on how you can
> > > >
> > > > 'program
> > > >
> > > > > glusterfs' rather than how to 'configure glusterfs'.
> > > >
> > > > Performance-wise, which is better?  Or does it make sense one way vs.
> > > > the other based on number of clients?
> > >
> > > Depends, if the interconnect between server and client is precious,
> > > then have the servers replicate (load afr on server side) with
> > > replication happening on a seperate network. This is also good if you 
> > > have servers interconnected with high speed networks like infiniband.
> > >
> > > If your servers are having just one network interface (no seperate
> >
> > network
> >
> > > for replication), and your client apps are IO bound, then it does not
> > > matter where you load AFR; they all would give the same performance.
> > >
> > > avati
> >
> > i did a simple test recently, which suggests that there is a significant
> > performance difference: I did a comparison of client vs. server
> > side afr with bonnie, for a one client and two servers setup with tla
> > patch628, connected over GB Ethernet; please see my results below.
> >
> > There also was a posting on this list with a lot of test results,
> > suggesting that server side afr is fastest:
> > http://lists.nongnu.org/archive/html/gluster-devel/2007-08/msg00136.html
> >
> > In my own results though, client-side afr seems to be better in most of
> > the test; I should note that I'm not sure if the chosen setup has a
> > negative impact on the performance (two servers afr-ing each other), so
> > any comments on this would be highly appreciate (I add the configs for
> > the tests below).
> >
> > server side afr (I hope it stays readable):
> >
> > Version  1.03       ------Sequential Output------ --Sequential Input-
> > --Random-
> >                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> > --Seeks--
> > Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
> > %CP  /sec %CP
> > stf-db22     31968M 31438  43 35528   0   990   0 32375  43 41107   1
> > 38.1   0
> >                     ------Sequential Create------ --------Random
> > Create--------
> >                     -Create-- --Read--- -Delete-- -Create-- --Read---
> > -Delete--
> >               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
> > %CP  /sec %CP
> >                  16    34   0   416   0   190   0    35   0   511   0
> > 227   0
> >
> >
> > client side afr:
> >
> > Version  1.03       ------Sequential Output------ --Sequential Input-
> > --Random-
> >                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> > --Seeks--
> > Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
> > %CP  /sec %CP
> > stf-db22     31968M 27583  38 31518   0   862   0 49522  63 56388   2
> > 28.0   0
> >                     ------Sequential Create------ --------Random
> > Create--------
> >                     -Create-- --Read--- -Delete-- -Create-- --Read---
> > -Delete--
> >               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
> > %CP  /sec %CP
> >                  16   418   0  2225   1   948   1   455   0  2305   1
> > 947   0
> >
> >
> >
> > server side afr config:
> >
> >
> > glusterfs-server.vol.server_afr:
> >
> >   volume fsbrick1
> >     type storage/posix
> >     option directory /data1
> >   end-volume
> >
> >   volume fsbrick2
> >     type storage/posix
> >     option directory /data2
> >   end-volume
> >
> >   volume nsfsbrick1
> >     type storage/posix
> >     option directory /data-ns1
> >   end-volume
> >
> >   volume brick1
> >     type performance/io-threads
> >     option thread-count 8
> >     option queue-limit 1024
> >     subvolumes fsbrick1
> >   end-volume
> >
> >   volume brick2
> >     type performance/io-threads
> >     option thread-count 8
> >     option queue-limit 1024
> >     subvolumes fsbrick2
> >   end-volume
> >
> >   volume brick1r
> >     type protocol/client
> >     option transport-type tcp/client
> >     option remote-host 10.10.1.99
> >     option remote-subvolume brick2
> >   end-volume
> >
> >   volume afr1
> >     type cluster/afr
> >     subvolumes brick1 brick1r
> >     # option replicate *:2 # obsolete with tla snapshot
> >   end-volume
> >
> >   ### Add network serving capability to above bricks.
> >   volume server
> >     type protocol/server
> >     option transport-type tcp/server     # For TCP/IP transport
> >     option listen-port 6996              # Default is 6996
> >     option client-volume-filename /etc/glusterfs/glusterfs-client.vol
> >     subvolumes afr1 nsfsbrick1
> >     option auth.ip.afr1.allow * # Allow access to "brick" volume
> >     option auth.ip.brick2.allow * # Allow access to "brick" volume
> >     option auth.ip.nsfsbrick1.allow * # Allow access to "brick" volume
> >   end-volume
> >
> >
> >
> >
> >
> >
> > glusterfs-client.vol.test.server_afr:
> >
> >   volume fsc1
> >     type protocol/client
> >     option transport-type tcp/client
> >     option remote-host 10.10.1.10
> >     option remote-subvolume afr1
> >   end-volume
> >
> >   volume fsc2
> >     type protocol/client
> >     option transport-type tcp/client
> >     option remote-host 10.10.1.99
> >     option remote-subvolume afr1
> >   end-volume
> >
> >   volume ns1
> >     type protocol/client
> >     option transport-type tcp/client
> >     option remote-host 10.10.1.10
> >     option remote-subvolume nsfsbrick1
> >   end-volume
> >
> >   volume ns2
> >     type protocol/client
> >     option transport-type tcp/client
> >     option remote-host 10.10.1.99
> >     option remote-subvolume nsfsbrick1
> >   end-volume
> >
> >   volume afrns
> >     type cluster/afr
> >     subvolumes ns1 ns2
> >   end-volume
> >
> >   volume bricks
> >     type cluster/unify
> >     subvolumes fsc1 fsc2
> >     option namespace afrns
> >     option scheduler alu
> >     option alu.limits.min-free-disk  5%              # Stop creating
> > files when free-space lt 5 %
> >     option alu.limits.max-open-files 10000
> >     option
> > alu.orderdisk-usage:read-usage:write-usage:open-files-usage:disk-speed-us
> >age option alu.disk-usage.entry-threshold 2GB          # Units in KB, MB
> > and GB are allowed
> >     option alu.disk-usage.exit-threshold  60MB         # Units in KB, MB
> > and GB are allowed
> >     option alu.open-files-usage.entry-threshold 1024
> >     option alu.open-files-usage.exit-threshold 32
> >     option alu.stat-refresh.interval 10sec
> >   end-volume
> >
> >   volume readahead
> >     type performance/read-ahead
> >     option page-size 256KB
> >     option page-count 2
> >     subvolumes bricks
> >   end-volume
> >
> >   volume write-behind
> >     type performance/write-behind
> >     option aggregate-size 1MB
> >     subvolumes readahead
> >   end-volume
> >
> >
> > -----------------------------------------------------------------------
> >
> > client side afr config:
> >
> >
> > glusterfs-server.vol.client_afr:
> >
> >   volume fsbrick1
> >     type storage/posix
> >     option directory /data1
> >   end-volume
> >
> >   volume fsbrick2
> >     type storage/posix
> >     option directory /data2
> >   end-volume
> >
> >   volume nsfsbrick1
> >     type storage/posix
> >     option directory /data-ns1
> >   end-volume
> >
> >   volume brick1
> >     type performance/io-threads
> >     option thread-count 8
> >     option queue-limit 1024
> >     subvolumes fsbrick1
> >   end-volume
> >
> >   volume brick2
> >     type performance/io-threads
> >     option thread-count 8
> >     option queue-limit 1024
> >     subvolumes fsbrick2
> >   end-volume
> >
> >   ### Add network serving capability to above bricks.
> >   volume server
> >     type protocol/server
> >     option transport-type tcp/server     # For TCP/IP transport
> >     option listen-port 6996              # Default is 6996
> >     option client-volume-filename /etc/glusterfs/glusterfs-client.vol
> >     subvolumes brick1 brick2 nsfsbrick1
> >     option auth.ip.brick1.allow * # Allow access to "brick" volume
> >     option auth.ip.brick2.allow * # Allow access to "brick" volume
> >     option auth.ip.nsfsbrick1.allow * # Allow access to "brick" volume
> >   end-volume
> >
> >
> >
> >
> >
> >
> > glusterfs-client.vol.test.client_afr:
> >
> >   volume fsc1
> >     type protocol/client
> >     option transport-type tcp/client
> >     option remote-host 10.10.1.10
> >     option remote-subvolume brick1
> >   end-volume
> >
> >   volume fsc1r
> >     type protocol/client
> >     option transport-type tcp/client
> >     option remote-host 10.10.1.10
> >     option remote-subvolume brick2
> >   end-volume
> >
> >   volume fsc2
> >     type protocol/client
> >     option transport-type tcp/client
> >     option remote-host 10.10.1.99
> >     option remote-subvolume brick1
> >   end-volume
> >
> >   volume fsc2r
> >     type protocol/client
> >     option transport-type tcp/client
> >     option remote-host 10.10.1.99
> >     option remote-subvolume brick2
> >   end-volume
> >
> >   volume afr1
> >     type cluster/afr
> >     subvolumes fsc1 fsc2r
> >     # option replicate *:2 # obsolete with tla snapshot
> >   end-volume
> >
> >   volume afr2
> >     type cluster/afr
> >     subvolumes fsc2 fsc1r
> >     # option replicate *:2 # obsolete with tla snapshot
> >   end-volume
> >
> >   volume ns1
> >     type protocol/client
> >     option transport-type tcp/client
> >     option remote-host 10.10.1.10
> >     option remote-subvolume nsfsbrick1
> >   end-volume
> >
> >   volume ns2
> >     type protocol/client
> >     option transport-type tcp/client
> >     option remote-host 10.10.1.99
> >     option remote-subvolume nsfsbrick1
> >   end-volume
> >
> >   volume afrns
> >     type cluster/afr
> >     subvolumes ns1 ns2
> >     # option replicate *:2 # obsolete with tla snapshot
> >   end-volume
> >
> >   volume bricks
> >     type cluster/unify
> >     subvolumes afr1 afr2
> >     option namespace afrns
> >     option scheduler alu
> >     option alu.limits.min-free-disk  5%              # Stop creating
> > files when free-space lt 5 %
> >     option alu.limits.max-open-files 10000
> >     option
> > alu.orderdisk-usage:read-usage:write-usage:open-files-usage:disk-speed-us
> >age option alu.disk-usage.entry-threshold 2GB          # Units in KB, MB
> > and GB are allowed
> >     option alu.disk-usage.exit-threshold  60MB         # Units in KB, MB
> > and GB are allowed
> >     option alu.open-files-usage.entry-threshold 1024
> >     option alu.open-files-usage.exit-threshold 32
> >     option alu.stat-refresh.interval 10sec
> >   end-volume
> >
> >   volume readahead
> >     type performance/read-ahead
> >     option page-size 256KB
> >     option page-count 2
> >     subvolumes bricks
> >   end-volume
> >
> >   volume write-behind
> >     type performance/write-behind
> >     option aggregate-size 1MB
> >     subvolumes readahead
> >   end-volume
> >
> >
> > Cheers, Sascha
> >
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at nongnu.org
> > http://lists.nongnu.org/mailman/listinfo/gluster-devel







More information about the Gluster-devel mailing list