[Gluster-devel] write-behind tuning
Jordan Mendler
jmendler at ucla.edu
Wed May 14 07:15:35 UTC 2008
Hi all,
I am in the process of implementing write-behind and had some questions.
1) I've been told aggregate-size should be between 0-4MB. What is the
down-side to making it large? In our case (a backup server) I would think
the bigger the better since we are doing lots of consecutive/parallel rsyncs
of a combination of tons of small files and some very large files. The only
down-side I could see is that small transfers are not distributed as evenly
since large writes will be done to only 1 brick instead of half of the write
to each brick. Perhaps someone can clarify.
2) What does flush-behind do? What is the advantage of having it on and what
is the advantage of it off.
3) write-behind on the client aggregates small writes into larger ones. Is
there any purpose to doing it on the server side? If so, how is this
helpful?
4) should write-behind be done on a brick-by-brick basis on the client, or
is it fine to do it after the unify? (seems like it would be fine since this
would consolidate small writes before sending it to the scheduler).
Hardware wise we currently have 2 16x1TB Hardware RAID6 servers (each is
8core, 8gb of RAM). Each acts as both a server and a unify client.
Underlying filesystem is currently XFS on Linux, ~13TB each. Interconnect is
GigE and eventually we will have more external clients, though for now we
are just using the servers as clients. My current client config is below.
Any other suggestions are also appreciated.
Thanks, Jordan
----
### Client config
### Import storage volumes and thread for performance
volume brick1
type protocol/client
option transport-type tcp/client
option remote-host storage-0-1
option remote-subvolume brick
end-volume
volume brick2
type protocol/client
option transport-type tcp/client
option remote-host storage-0-2
option remote-subvolume brick
end-volume
volume brick-io1
type performance/io-threads
option thread-count 8
option cache-size 4096MB
subvolumes brick1
end-volume
volume brick-io2
type performance/io-threads
option thread-count 8
option cache-size 4096MB
subvolumes brick2
end-volume
### Imported namespaces and mirror them for redudancy
volume brick-ns1
type protocol/client
option transport-type tcp/client
option remote-host storage-0-1
option remote-subvolume brick-ns
end-volume
volume brick-ns2
type protocol/client
option transport-type tcp/client
option remote-host storage-0-2
option remote-subvolume brick-ns
end-volume
volume brick-ns
type cluster/afr
subvolumes brick-ns1 brick-ns2
end-volume
### Unify bricks into a single logical volume, and use ALU for scheduling
volume bricks
type cluster/unify
subvolumes brick-io1 brick-io2
option namespace brick-ns
# Use ALU scheduling algorithm
option scheduler alu # use the ALU scheduler
option alu.limits.min-free-disk 5% # Don't create files one a
volume with less than 5% free diskspace
option alu.limits.max-open-files 10000 # Don't create files on a
volume with more than 10000 files open
# When deciding where to place a file, first look at the disk-usage, then
at
# read-usage, write-usage, open files, and finally the disk-speed-usage.
option alu.order
disk-usage:write-usage:read-usage:open-files-usage:disk-speed-usage
option alu.disk-usage.entry-threshold 100GB # Kick in if the
discrepancy in disk-usage between volumes is more than 100GB
option alu.disk-usage.exit-threshold 50GB # Don't stop writing to
the least-used volume until the discrepancy is 50GB
option alu.open-files-usage.entry-threshold 1024 # Kick in if the
discrepancy in open files is 1024
option alu.open-files-usage.exit-threshold 32 # Don't stop until
992 files have been written the least-used volume
option alu.read-usage.entry-threshold 20% # Kick in when the
read-usage discrepancy is 20%
option alu.read-usage.exit-threshold 4% # Don't stop until the
discrepancy has been reduced to 16% (20% - 4%)
option alu.write-usage.entry-threshold 20% # Kick in when the
write-usage discrepancy is 20%
option alu.write-usage.exit-threshold 4% # Don't stop until the
discrepancy has been reduced to 16%
option alu.stat-refresh.interval 10sec # Refresh the statistics
used for decision-making every 10 seconds
# option alu.stat-refresh.num-file-create 10 # Refresh the
statistics used for decision-making after creating 10 files
end-volume
volume write-back
type performance/write-behind
option aggregate-size 1MB
option flush-behind on # default is 'off'
subvolumes bricks
end-volume
More information about the Gluster-devel
mailing list