[Gluster-devel] Confused on AFR, where does it happen client or server
Sascha Ottolski
ottolski at web.de
Tue Jan 8 09:53:37 UTC 2008
Am Dienstag 08 Januar 2008 09:44:34 schrieb Anand Avati:
> was it just one file being read/written? if not please use rr (a more
> deterministic scheduler) and share the numbers please.
I did only the bonnie tests, posting I put the link in wasn't by me.
Cheers, Sascha
> about the create and delete rate, client side afr is defnitely faster since
> the create operations happen parallely (w.r.t to the network - 1x the
> time.) but if you have afr on server, it happens serially across the
> machines (2x the time.. 1x upto the 1st server, and 1x to the remaining N-1
> servers).
>
> not that for such a configuration, unify is not needed, and removing unify
> and having plain AFR will increase your create rate further since unify
> serializes creates across the namespace and storage.
>
> about throughput, the writes seem to be faster with server AFR (or did my
> eyes defeat me with that indentation?) and reads faster with client AFR.
>
> the faster writes might be because of a good job done by the full-duplex
> switch. when afr is on the client side, both copies are outbound on the
> same outbound channel of the client NIC, effectively serially writing two
> copies, but when on the server side, the replication copy is using the
> outbound channel of the server NIC while the main loop is parallely
> fetching the next write block in the server's inbound channel. using
> io-threads between afr and protocol/client on the replication path might
> help further.
>
> the slower reads might be because.. well i'm still not sure, maybe you have
> secretly applied the striped readv patch for afr? :)
>
> avati
>
> 2008/1/8, Sascha Ottolski <ottolski at web.de>:
> > Am Dienstag 08 Januar 2008 06:06:30 schrieb Anand Avati:
> > > > > Brandon,
> > > > > who does the copy is decided where the AFR translator is loaded.
> > > > > if you have AFR loaded on the client side, then the client does the
> > > > > two writes.
> > > >
> > > > you
> > > >
> > > > > can also have AFR loaded on the server side, and handle server do
> >
> > the
> >
> > > > > replication. Translators can be loaded anywhere (client or server,
> > > >
> > > > anywhere
> > > >
> > > > > in the graph). You need to think more on the lines on how you can
> > > >
> > > > 'program
> > > >
> > > > > glusterfs' rather than how to 'configure glusterfs'.
> > > >
> > > > Performance-wise, which is better? Or does it make sense one way vs.
> > > > the other based on number of clients?
> > >
> > > Depends, if the interconnect between server and client is precious,
> > > then have the servers replicate (load afr on server side) with
> > > replication happening on a seperate network. This is also good if you
> > > have servers interconnected with high speed networks like infiniband.
> > >
> > > If your servers are having just one network interface (no seperate
> >
> > network
> >
> > > for replication), and your client apps are IO bound, then it does not
> > > matter where you load AFR; they all would give the same performance.
> > >
> > > avati
> >
> > i did a simple test recently, which suggests that there is a significant
> > performance difference: I did a comparison of client vs. server
> > side afr with bonnie, for a one client and two servers setup with tla
> > patch628, connected over GB Ethernet; please see my results below.
> >
> > There also was a posting on this list with a lot of test results,
> > suggesting that server side afr is fastest:
> > http://lists.nongnu.org/archive/html/gluster-devel/2007-08/msg00136.html
> >
> > In my own results though, client-side afr seems to be better in most of
> > the test; I should note that I'm not sure if the chosen setup has a
> > negative impact on the performance (two servers afr-ing each other), so
> > any comments on this would be highly appreciate (I add the configs for
> > the tests below).
> >
> > server side afr (I hope it stays readable):
> >
> > Version 1.03 ------Sequential Output------ --Sequential Input-
> > --Random-
> > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> > --Seeks--
> > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
> > %CP /sec %CP
> > stf-db22 31968M 31438 43 35528 0 990 0 32375 43 41107 1
> > 38.1 0
> > ------Sequential Create------ --------Random
> > Create--------
> > -Create-- --Read--- -Delete-- -Create-- --Read---
> > -Delete--
> > files /sec %CP /sec %CP /sec %CP /sec %CP /sec
> > %CP /sec %CP
> > 16 34 0 416 0 190 0 35 0 511 0
> > 227 0
> >
> >
> > client side afr:
> >
> > Version 1.03 ------Sequential Output------ --Sequential Input-
> > --Random-
> > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> > --Seeks--
> > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
> > %CP /sec %CP
> > stf-db22 31968M 27583 38 31518 0 862 0 49522 63 56388 2
> > 28.0 0
> > ------Sequential Create------ --------Random
> > Create--------
> > -Create-- --Read--- -Delete-- -Create-- --Read---
> > -Delete--
> > files /sec %CP /sec %CP /sec %CP /sec %CP /sec
> > %CP /sec %CP
> > 16 418 0 2225 1 948 1 455 0 2305 1
> > 947 0
> >
> >
> >
> > server side afr config:
> >
> >
> > glusterfs-server.vol.server_afr:
> >
> > volume fsbrick1
> > type storage/posix
> > option directory /data1
> > end-volume
> >
> > volume fsbrick2
> > type storage/posix
> > option directory /data2
> > end-volume
> >
> > volume nsfsbrick1
> > type storage/posix
> > option directory /data-ns1
> > end-volume
> >
> > volume brick1
> > type performance/io-threads
> > option thread-count 8
> > option queue-limit 1024
> > subvolumes fsbrick1
> > end-volume
> >
> > volume brick2
> > type performance/io-threads
> > option thread-count 8
> > option queue-limit 1024
> > subvolumes fsbrick2
> > end-volume
> >
> > volume brick1r
> > type protocol/client
> > option transport-type tcp/client
> > option remote-host 10.10.1.99
> > option remote-subvolume brick2
> > end-volume
> >
> > volume afr1
> > type cluster/afr
> > subvolumes brick1 brick1r
> > # option replicate *:2 # obsolete with tla snapshot
> > end-volume
> >
> > ### Add network serving capability to above bricks.
> > volume server
> > type protocol/server
> > option transport-type tcp/server # For TCP/IP transport
> > option listen-port 6996 # Default is 6996
> > option client-volume-filename /etc/glusterfs/glusterfs-client.vol
> > subvolumes afr1 nsfsbrick1
> > option auth.ip.afr1.allow * # Allow access to "brick" volume
> > option auth.ip.brick2.allow * # Allow access to "brick" volume
> > option auth.ip.nsfsbrick1.allow * # Allow access to "brick" volume
> > end-volume
> >
> >
> >
> >
> >
> >
> > glusterfs-client.vol.test.server_afr:
> >
> > volume fsc1
> > type protocol/client
> > option transport-type tcp/client
> > option remote-host 10.10.1.10
> > option remote-subvolume afr1
> > end-volume
> >
> > volume fsc2
> > type protocol/client
> > option transport-type tcp/client
> > option remote-host 10.10.1.99
> > option remote-subvolume afr1
> > end-volume
> >
> > volume ns1
> > type protocol/client
> > option transport-type tcp/client
> > option remote-host 10.10.1.10
> > option remote-subvolume nsfsbrick1
> > end-volume
> >
> > volume ns2
> > type protocol/client
> > option transport-type tcp/client
> > option remote-host 10.10.1.99
> > option remote-subvolume nsfsbrick1
> > end-volume
> >
> > volume afrns
> > type cluster/afr
> > subvolumes ns1 ns2
> > end-volume
> >
> > volume bricks
> > type cluster/unify
> > subvolumes fsc1 fsc2
> > option namespace afrns
> > option scheduler alu
> > option alu.limits.min-free-disk 5% # Stop creating
> > files when free-space lt 5 %
> > option alu.limits.max-open-files 10000
> > option
> > alu.orderdisk-usage:read-usage:write-usage:open-files-usage:disk-speed-us
> >age option alu.disk-usage.entry-threshold 2GB # Units in KB, MB
> > and GB are allowed
> > option alu.disk-usage.exit-threshold 60MB # Units in KB, MB
> > and GB are allowed
> > option alu.open-files-usage.entry-threshold 1024
> > option alu.open-files-usage.exit-threshold 32
> > option alu.stat-refresh.interval 10sec
> > end-volume
> >
> > volume readahead
> > type performance/read-ahead
> > option page-size 256KB
> > option page-count 2
> > subvolumes bricks
> > end-volume
> >
> > volume write-behind
> > type performance/write-behind
> > option aggregate-size 1MB
> > subvolumes readahead
> > end-volume
> >
> >
> > -----------------------------------------------------------------------
> >
> > client side afr config:
> >
> >
> > glusterfs-server.vol.client_afr:
> >
> > volume fsbrick1
> > type storage/posix
> > option directory /data1
> > end-volume
> >
> > volume fsbrick2
> > type storage/posix
> > option directory /data2
> > end-volume
> >
> > volume nsfsbrick1
> > type storage/posix
> > option directory /data-ns1
> > end-volume
> >
> > volume brick1
> > type performance/io-threads
> > option thread-count 8
> > option queue-limit 1024
> > subvolumes fsbrick1
> > end-volume
> >
> > volume brick2
> > type performance/io-threads
> > option thread-count 8
> > option queue-limit 1024
> > subvolumes fsbrick2
> > end-volume
> >
> > ### Add network serving capability to above bricks.
> > volume server
> > type protocol/server
> > option transport-type tcp/server # For TCP/IP transport
> > option listen-port 6996 # Default is 6996
> > option client-volume-filename /etc/glusterfs/glusterfs-client.vol
> > subvolumes brick1 brick2 nsfsbrick1
> > option auth.ip.brick1.allow * # Allow access to "brick" volume
> > option auth.ip.brick2.allow * # Allow access to "brick" volume
> > option auth.ip.nsfsbrick1.allow * # Allow access to "brick" volume
> > end-volume
> >
> >
> >
> >
> >
> >
> > glusterfs-client.vol.test.client_afr:
> >
> > volume fsc1
> > type protocol/client
> > option transport-type tcp/client
> > option remote-host 10.10.1.10
> > option remote-subvolume brick1
> > end-volume
> >
> > volume fsc1r
> > type protocol/client
> > option transport-type tcp/client
> > option remote-host 10.10.1.10
> > option remote-subvolume brick2
> > end-volume
> >
> > volume fsc2
> > type protocol/client
> > option transport-type tcp/client
> > option remote-host 10.10.1.99
> > option remote-subvolume brick1
> > end-volume
> >
> > volume fsc2r
> > type protocol/client
> > option transport-type tcp/client
> > option remote-host 10.10.1.99
> > option remote-subvolume brick2
> > end-volume
> >
> > volume afr1
> > type cluster/afr
> > subvolumes fsc1 fsc2r
> > # option replicate *:2 # obsolete with tla snapshot
> > end-volume
> >
> > volume afr2
> > type cluster/afr
> > subvolumes fsc2 fsc1r
> > # option replicate *:2 # obsolete with tla snapshot
> > end-volume
> >
> > volume ns1
> > type protocol/client
> > option transport-type tcp/client
> > option remote-host 10.10.1.10
> > option remote-subvolume nsfsbrick1
> > end-volume
> >
> > volume ns2
> > type protocol/client
> > option transport-type tcp/client
> > option remote-host 10.10.1.99
> > option remote-subvolume nsfsbrick1
> > end-volume
> >
> > volume afrns
> > type cluster/afr
> > subvolumes ns1 ns2
> > # option replicate *:2 # obsolete with tla snapshot
> > end-volume
> >
> > volume bricks
> > type cluster/unify
> > subvolumes afr1 afr2
> > option namespace afrns
> > option scheduler alu
> > option alu.limits.min-free-disk 5% # Stop creating
> > files when free-space lt 5 %
> > option alu.limits.max-open-files 10000
> > option
> > alu.orderdisk-usage:read-usage:write-usage:open-files-usage:disk-speed-us
> >age option alu.disk-usage.entry-threshold 2GB # Units in KB, MB
> > and GB are allowed
> > option alu.disk-usage.exit-threshold 60MB # Units in KB, MB
> > and GB are allowed
> > option alu.open-files-usage.entry-threshold 1024
> > option alu.open-files-usage.exit-threshold 32
> > option alu.stat-refresh.interval 10sec
> > end-volume
> >
> > volume readahead
> > type performance/read-ahead
> > option page-size 256KB
> > option page-count 2
> > subvolumes bricks
> > end-volume
> >
> > volume write-behind
> > type performance/write-behind
> > option aggregate-size 1MB
> > subvolumes readahead
> > end-volume
> >
> >
> > Cheers, Sascha
> >
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at nongnu.org
> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list