[Gluster-devel] Question about unify over afr

Łukasz Mierzwa l.mierzwa at grono.net
Thu Aug 28 13:29:06 UTC 2008


Thursday 28 of August 2008 12:39:03 napisałeś(-łaś):
> On Thu, Aug 28, 2008 at 3:01 PM, Łukasz Mierzwa <l.mierzwa at grono.net> wrote:
> > Thursday 28 of August 2008 07:06:30 Krishna Srinivas napisał(a):
> >> On Wed, Aug 27, 2008 at 10:55 PM, Łukasz Mierzwa <l.mierzwa at grono.net>
> >
> > wrote:
> >> > Tuesday 26 August 2008 16:28:41 Łukasz Mierzwa napisał(a):
> >> >> Hi,
> >> >>
> >> >> I testing glusterfs for small files storage, first I've setup a
> >> >> single disk gluster server, connected to it from another machine and
> >> >> served those files with nginx. That worked ok, I got good
> >> >> performance, on average about +20ms slower for each request but
> >> >> that's ok. Now I've setup unify over afr (2 afr groups with 3 servers
> >> >> each, unify and afr on the client side, namespace dir is on every
> >> >> server, as other stuff afr'ed on the client side), this is mounted on
> >> >> one of those 6 servers. After writing ~200GB files from production
> >> >> server I started to do some tests and I've noticed that doing simple
> >> >> ls on that mount point causes as many writes as reads, this has to do
> >> >> something to either unify or afr, I suspect that those writes are do
> >> >> to namespace but I need to do more debugging. It's very annoying that
> >> >> simple reads are causing so many writes. All my servers are in sync
> >> >> so there should not be any need for sealf-healing. Before I start
> >> >> debugging it I wanted to ask if this is normal? Shoud afr or unify
> >> >> generate so many writes to namespace or maybe xattrs during reads
> >> >> (storage is on ext3 with users_xattrs on)?
> >> >
> >> > I tested it a little bit today and I found out that if I got 1 or 2
> >> > nodes in my afr group for namespace there are no writes at all while
> >> > doing ls, if I add one or more nodes they are starting to get writes.
> >> > WTF?
> >>
> >> Do you mean that your NS is getting write() calls when you do "ls"?
> >
> > It seems so. I will split my NS and DATA bricks to different disks today
> > so I will be 100% sure. What I am sure now is that I am getting as many
> > writes as reads when I do "ls" and have more than 2 NS bricks in AFR.
>
> reads/writes should not happen when you do an 'ls' where are you seeing
> reads and writes being done? How are you seeing it? are you strace'ing
> the glusterfsd?
>
> Krishna

I first noticed them when I looked at rrd graphs for those machines, I wanted 
to see if AFR is balancing reads. I can see them in rrd graphs generated from 
collectd, dstat, iotop and iostat, they are happening. I first tried to find 
something in my config and forgot about such obvious step as straceing 
glusterfs. I attach log from one of the servers, I straced gluster-server on 
this machine, You can see that there is a lot of mkdir/chown/chmod on files 
that are already there, all bricks were online when I was writing files to 
gluster client so no self-heal should be needed. I've also attached client 
and server configs.

-- 
z poważaniem

Łukasz Mierzwa
Administrator Sieci

Grono.net Spółka Akcyjna

ul. Szturmowa 2a, 02-678 Warszawa

Sąd Rejonowy dla m.st. Warszawy, XIII Wydział Gospodarczy; 

KRS 0000228856, NIP: 929-173-90-15  Regon: 080019856 Kapitał zakładowy: 
550.000,00 złotych

http://grono.net

Treść tej wiadomości jest poufna i prawnie chroniona. Odbiorca może być 
jedynie jej adresat z wyłączeniem dostępu osób trzecich. Jeżeli nie jesteś 
adresatem niniejszej wiadomości, jej rozpowszechnianie, kopiowanie, 
rozprowadzanie lub inne działanie o podobnym charakterze jest prawnie 
zabronione i może by karalne. Jeżeli wiadomość ta trafiła do Ciebie omyłkowo, 
uprzejmie prosimy o odesłanie jej na adres nadawcy i usunięcie.
-------------- next part --------------
volume brick_40
        type protocol/client
        option transport-type tcp/client
        option remote-host 192.168.1.40
        option remote-subvolume brick-writebehind
end-volume

volume ns_40
        type protocol/client
        option transport-type tcp/client
        option remote-host 192.168.1.40
        option remote-subvolume brick-ns
end-volume

volume brick_41
        type protocol/client
        option transport-type tcp/client
        option remote-host 192.168.1.41
        option remote-subvolume brick-writebehind
end-volume

volume ns_41
        type protocol/client
        option transport-type tcp/client
        option remote-host 192.168.1.41
        option remote-subvolume brick-ns
end-volume

volume brick_42
        type protocol/client
        option transport-type tcp/client
        option remote-host 192.168.1.42
        option remote-subvolume brick-writebehind
end-volume

volume ns_42
        type protocol/client
        option transport-type tcp/client
        option remote-host 192.168.1.42
        option remote-subvolume brick-ns
end-volume

volume brick_43
        type protocol/client
        option transport-type tcp/client
        option remote-host 192.168.1.43
        option remote-subvolume brick-writebehind
end-volume

volume ns_43
        type protocol/client
        option transport-type tcp/client
        option remote-host 192.168.1.43
        option remote-subvolume brick-ns
end-volume

volume brick_44
        type protocol/client
        option transport-type tcp/client
        option remote-host 192.168.1.44
        option remote-subvolume brick-writebehind
end-volume

volume ns_44
        type protocol/client
        option transport-type tcp/client
        option remote-host 192.168.1.44
        option remote-subvolume brick-ns
end-volume

volume brick_45
        type protocol/client
        option transport-type tcp/client
        option remote-host 192.168.1.45
        option remote-subvolume brick-writebehind
end-volume

volume ns_45
        type protocol/client
        option transport-type tcp/client
        option remote-host 192.168.1.45
        option remote-subvolume brick-ns
end-volume

volume afr_1
        type cluster/afr
        option self-heal on
        subvolumes brick_40 brick_41 brick_42
end-volume

volume afr_2
        type cluster/afr
        option self-heal on
        subvolumes brick_43 brick_44 brick_45
end-volume

volume afr_ns
        type cluster/afr
        subvolumes ns_40 ns_41 ns_42 ns_43 ns_44 ns_45
end-volume

volume unify
        type cluster/unify
        option namespace afr_ns
        option scheduler alu
        option alu.limits.min-free-disk 5%
        option alu.order disk-usage:read-usage:write-usage:open-files-usage:disk-speed-usage
        option alu.disk-usage.entry-threshold 4GB
        option alu.disk-usage.exit-threshold 500MB
        option alu.read-usage.entry-threshold 20%
        option alu.read-usage.exit-threshold 5%
        option alu.write-usage.entry-threshold 20%
        option alu.write-usage.exit-threshold 5%
        option alu.stat-refresh.interval 30sec
        option alu.stat-refresh.num-file-create 200
        subvolumes afr_1 afr_2
end-volume


volume iothreads
        type performance/io-threads
        option thread-count 2  # deault is 1
        option cache-size 64MB #64MB
        subvolumes unify
end-volume

volume readahead
        type performance/read-ahead
        option page-size 128kB
        option page-count 4
        subvolumes iothreads
end-volume

volume writebehind
        type performance/write-behind
        option aggregate-size 1MB
        option flush-behind on
        subvolumes readahead
end-volume

volume io-cache
        type performance/io-cache
        option cache-size 256MB
        option force-revalidate-timeout 600
        subvolumes writebehind
end-volume
-------------- next part --------------
volume brick-data
    type storage/posix
    option directory /home/gluster/data
end-volume

volume brick-ns
    type storage/posix
    option directory /home/gluster/ns
end-volume

volume brick-locks
    type features/posix-locks
    subvolumes brick-data
end-volume

volume brick-iothreads
    type performance/io-threads
    option thread-count 4
    option cache-size 64MB
    subvolumes brick-locks
end-volume

volume brick-readahead
    type performance/read-ahead
    subvolumes brick-iothreads
end-volume

volume brick-writebehind
    type performance/write-behind
    option aggregate-size 1MB
    option flush-behind on
    subvolumes brick-readahead
end-volume

volume server
    type protocol/server
    subvolumes brick-ns brick-writebehind
    option transport-type tcp/server
    option auth.ip.brick-ns.allow *
    option auth.ip.brick-writebehind.allow *
end-volume



More information about the Gluster-devel mailing list