[Gluster-devel] Question about unify over afr
Łukasz Mierzwa
l.mierzwa at grono.net
Thu Aug 28 13:29:06 UTC 2008
Thursday 28 of August 2008 12:39:03 napisałeś(-łaś):
> On Thu, Aug 28, 2008 at 3:01 PM, Łukasz Mierzwa <l.mierzwa at grono.net> wrote:
> > Thursday 28 of August 2008 07:06:30 Krishna Srinivas napisał(a):
> >> On Wed, Aug 27, 2008 at 10:55 PM, Łukasz Mierzwa <l.mierzwa at grono.net>
> >
> > wrote:
> >> > Tuesday 26 August 2008 16:28:41 Łukasz Mierzwa napisał(a):
> >> >> Hi,
> >> >>
> >> >> I testing glusterfs for small files storage, first I've setup a
> >> >> single disk gluster server, connected to it from another machine and
> >> >> served those files with nginx. That worked ok, I got good
> >> >> performance, on average about +20ms slower for each request but
> >> >> that's ok. Now I've setup unify over afr (2 afr groups with 3 servers
> >> >> each, unify and afr on the client side, namespace dir is on every
> >> >> server, as other stuff afr'ed on the client side), this is mounted on
> >> >> one of those 6 servers. After writing ~200GB files from production
> >> >> server I started to do some tests and I've noticed that doing simple
> >> >> ls on that mount point causes as many writes as reads, this has to do
> >> >> something to either unify or afr, I suspect that those writes are do
> >> >> to namespace but I need to do more debugging. It's very annoying that
> >> >> simple reads are causing so many writes. All my servers are in sync
> >> >> so there should not be any need for sealf-healing. Before I start
> >> >> debugging it I wanted to ask if this is normal? Shoud afr or unify
> >> >> generate so many writes to namespace or maybe xattrs during reads
> >> >> (storage is on ext3 with users_xattrs on)?
> >> >
> >> > I tested it a little bit today and I found out that if I got 1 or 2
> >> > nodes in my afr group for namespace there are no writes at all while
> >> > doing ls, if I add one or more nodes they are starting to get writes.
> >> > WTF?
> >>
> >> Do you mean that your NS is getting write() calls when you do "ls"?
> >
> > It seems so. I will split my NS and DATA bricks to different disks today
> > so I will be 100% sure. What I am sure now is that I am getting as many
> > writes as reads when I do "ls" and have more than 2 NS bricks in AFR.
>
> reads/writes should not happen when you do an 'ls' where are you seeing
> reads and writes being done? How are you seeing it? are you strace'ing
> the glusterfsd?
>
> Krishna
I first noticed them when I looked at rrd graphs for those machines, I wanted
to see if AFR is balancing reads. I can see them in rrd graphs generated from
collectd, dstat, iotop and iostat, they are happening. I first tried to find
something in my config and forgot about such obvious step as straceing
glusterfs. I attach log from one of the servers, I straced gluster-server on
this machine, You can see that there is a lot of mkdir/chown/chmod on files
that are already there, all bricks were online when I was writing files to
gluster client so no self-heal should be needed. I've also attached client
and server configs.
--
z poważaniem
Łukasz Mierzwa
Administrator Sieci
Grono.net Spółka Akcyjna
ul. Szturmowa 2a, 02-678 Warszawa
Sąd Rejonowy dla m.st. Warszawy, XIII Wydział Gospodarczy;
KRS 0000228856, NIP: 929-173-90-15 Regon: 080019856 Kapitał zakładowy:
550.000,00 złotych
http://grono.net
Treść tej wiadomości jest poufna i prawnie chroniona. Odbiorca może być
jedynie jej adresat z wyłączeniem dostępu osób trzecich. Jeżeli nie jesteś
adresatem niniejszej wiadomości, jej rozpowszechnianie, kopiowanie,
rozprowadzanie lub inne działanie o podobnym charakterze jest prawnie
zabronione i może by karalne. Jeżeli wiadomość ta trafiła do Ciebie omyłkowo,
uprzejmie prosimy o odesłanie jej na adres nadawcy i usunięcie.
-------------- next part --------------
volume brick_40
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.40
option remote-subvolume brick-writebehind
end-volume
volume ns_40
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.40
option remote-subvolume brick-ns
end-volume
volume brick_41
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.41
option remote-subvolume brick-writebehind
end-volume
volume ns_41
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.41
option remote-subvolume brick-ns
end-volume
volume brick_42
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.42
option remote-subvolume brick-writebehind
end-volume
volume ns_42
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.42
option remote-subvolume brick-ns
end-volume
volume brick_43
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.43
option remote-subvolume brick-writebehind
end-volume
volume ns_43
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.43
option remote-subvolume brick-ns
end-volume
volume brick_44
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.44
option remote-subvolume brick-writebehind
end-volume
volume ns_44
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.44
option remote-subvolume brick-ns
end-volume
volume brick_45
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.45
option remote-subvolume brick-writebehind
end-volume
volume ns_45
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.45
option remote-subvolume brick-ns
end-volume
volume afr_1
type cluster/afr
option self-heal on
subvolumes brick_40 brick_41 brick_42
end-volume
volume afr_2
type cluster/afr
option self-heal on
subvolumes brick_43 brick_44 brick_45
end-volume
volume afr_ns
type cluster/afr
subvolumes ns_40 ns_41 ns_42 ns_43 ns_44 ns_45
end-volume
volume unify
type cluster/unify
option namespace afr_ns
option scheduler alu
option alu.limits.min-free-disk 5%
option alu.order disk-usage:read-usage:write-usage:open-files-usage:disk-speed-usage
option alu.disk-usage.entry-threshold 4GB
option alu.disk-usage.exit-threshold 500MB
option alu.read-usage.entry-threshold 20%
option alu.read-usage.exit-threshold 5%
option alu.write-usage.entry-threshold 20%
option alu.write-usage.exit-threshold 5%
option alu.stat-refresh.interval 30sec
option alu.stat-refresh.num-file-create 200
subvolumes afr_1 afr_2
end-volume
volume iothreads
type performance/io-threads
option thread-count 2 # deault is 1
option cache-size 64MB #64MB
subvolumes unify
end-volume
volume readahead
type performance/read-ahead
option page-size 128kB
option page-count 4
subvolumes iothreads
end-volume
volume writebehind
type performance/write-behind
option aggregate-size 1MB
option flush-behind on
subvolumes readahead
end-volume
volume io-cache
type performance/io-cache
option cache-size 256MB
option force-revalidate-timeout 600
subvolumes writebehind
end-volume
-------------- next part --------------
volume brick-data
type storage/posix
option directory /home/gluster/data
end-volume
volume brick-ns
type storage/posix
option directory /home/gluster/ns
end-volume
volume brick-locks
type features/posix-locks
subvolumes brick-data
end-volume
volume brick-iothreads
type performance/io-threads
option thread-count 4
option cache-size 64MB
subvolumes brick-locks
end-volume
volume brick-readahead
type performance/read-ahead
subvolumes brick-iothreads
end-volume
volume brick-writebehind
type performance/write-behind
option aggregate-size 1MB
option flush-behind on
subvolumes brick-readahead
end-volume
volume server
type protocol/server
subvolumes brick-ns brick-writebehind
option transport-type tcp/server
option auth.ip.brick-ns.allow *
option auth.ip.brick-writebehind.allow *
end-volume
More information about the Gluster-devel
mailing list