[Gluster-users] Horrible performance with small files (DHT/AFR)
Vahriç Muhtaryan
vahric at doruk.net.tr
Wed Jun 3 20:33:17 UTC 2009
For better understanding issue did you try 4 servers DHT only or 2 servers
DHT only or two servers replication only for find out real problem maybe
replication or dht could have a bug ?
-----Original Message-----
From: gluster-users-bounces at gluster.org
[mailto:gluster-users-bounces at gluster.org] On Behalf Of Benjamin Krein
Sent: Wednesday, June 03, 2009 11:00 PM
To: Jasper van Wanrooy - Chatventure
Cc: gluster-users at gluster.org
Subject: Re: [Gluster-users] Horrible performance with small files (DHT/AFR)
The current boxes I'm using for testing are as follows:
* 2x dual-core Opteron ~2GHz (x86_64)
* 4GB RAM
* 4x 7200 RPM 73GB SATA - RAID1+0 w/3ware hardware controllers
The server storage directories live in /home/clusterfs where /home is
an ext3 partition mounted with noatime.
These servers are not virtualized. They are running Ubuntu 8.04 LTS
Server x86_64.
The files I'm copying are all <2k javascript files (plain text) stored
in 100 hash directories in each of 3 parent directories:
/home/clusterfs/
+ parentdir1/
| + 00/
| | ...
| + 99/
+ parentdir1/
| + 00/
| | ...
| + 99/
+ parentdir1/
+ 00/
| ...
+ 99/
There are ~10k of these <2k javascript files distributed throughout
the above directory structure totaling approximately 570MB. My tests
have been copying that entire directory structure from a client
machine into the glusterfs mountpoint on the client.
Observing IO on both the client box & all the server boxes via iostat
shows that the disks are doing *very* little work. Observing the CPU/
memory load with top or htop shows that none of the boxes are CPU or
memory bound. Observing the bandwidth in/out of the network interface
shows <1MB/s throughput (we have a fully gigabit LAN!) which usually
drops down to <150KB/s during the copy.
scp'ing the same directory structure from the same client to one of
the same servers will work at ~40-50MB/s sustained as a comparison.
Here is the results of copying the same directory structure using
rsync to the same partition:
# time rsync -ap * benk at cfs1:~/cache/
benk at cfs1's password:
real 0m23.566s
user 0m8.433s
sys 0m4.580s
Ben
On Jun 3, 2009, at 3:16 PM, Jasper van Wanrooy - Chatventure wrote:
> Hi Benjamin,
>
> That's not good news. What kind of hardware do you use? Is it
> virtualised? Or do you use real boxes?
> What kind of files are you copying in your test? What performance do
> you have when copying it to a local dir?
>
> Best regards Jasper
>
> ----- Original Message -----
> From: "Benjamin Krein" <superbenk at superk.org>
> To: "Jasper van Wanrooy - Chatventure" <jvanwanrooy at chatventure.nl>
> Cc: "Vijay Bellur" <vijay at gluster.com>, gluster-users at gluster.org
> Sent: Wednesday, 3 June, 2009 19:23:51 GMT +01:00 Amsterdam /
> Berlin / Bern / Rome / Stockholm / Vienna
> Subject: Re: [Gluster-users] Horrible performance with small files
> (DHT/AFR)
>
> I reduced my config to only 2 servers (had to donate 2 of the 4 to
> another project). I now have a single server using DHT (for future
> scaling) and AFR to a mirrored server. Copy times are much better,
> but still pretty horrible:
>
> # time cp -rp * /mnt/
>
> real 21m11.505s
> user 0m1.000s
> sys 0m6.416s
>
> Ben
>
> On Jun 3, 2009, at 3:13 AM, Jasper van Wanrooy - Chatventure wrote:
>
>> Hi Benjamin,
>>
>> Did you also try with a lower thread-count. Actually I'm using 3
>> threads.
>>
>> Best Regards Jasper
>>
>>
>> On 2 jun 2009, at 18:25, Benjamin Krein wrote:
>>
>>> I do not see any difference with autoscaling removed. Current
>>> server config:
>>>
>>> # webform flat-file cache
>>>
>>> volume webform_cache
>>> type storage/posix
>>> option directory /home/clusterfs/webform/cache
>>> end-volume
>>>
>>> volume webform_cache_locks
>>> type features/locks
>>> subvolumes webform_cache
>>> end-volume
>>>
>>> volume webform_cache_brick
>>> type performance/io-threads
>>> option thread-count 32
>>> subvolumes webform_cache_locks
>>> end-volume
>>>
>>> <<snip>>
>>>
>>> # GlusterFS Server
>>> volume server
>>> type protocol/server
>>> option transport-type tcp
>>> subvolumes dns_public_brick dns_private_brick webform_usage_brick
>>> webform_cache_brick wordpress_uploads_brick subs_exports_brick
>>> option auth.addr.dns_public_brick.allow 10.1.1.*
>>> option auth.addr.dns_private_brick.allow 10.1.1.*
>>> option auth.addr.webform_usage_brick.allow 10.1.1.*
>>> option auth.addr.webform_cache_brick.allow 10.1.1.*
>>> option auth.addr.wordpress_uploads_brick.allow 10.1.1.*
>>> option auth.addr.subs_exports_brick.allow 10.1.1.*
>>> end-volume
>>>
>>> # time cp -rp * /mnt/
>>>
>>> real 70m13.672s
>>> user 0m1.168s
>>> sys 0m8.377s
>>>
>>> NOTE: the above test was also done during peak hours when the LAN/
>>> dev server were in use which would cause some of the extra time.
>>> This is still WAY too much, though.
>>>
>>> Ben
>>>
>>>
>>> On Jun 1, 2009, at 1:40 PM, Vijay Bellur wrote:
>>>
>>>> Hi Benjamin,
>>>>
>>>> Could you please try by turning autoscaling off?
>>>>
>>>> Thanks,
>>>> Vijay
>>>>
>>>> Benjamin Krein wrote:
>>>>> I'm seeing extremely poor performance writing small files to a
>>>>> glusterfs DHT/AFR mount point. Here are the stats I'm seeing:
>>>>>
>>>>> * Number of files:
>>>>> root at dev1|/home/aweber/cache|# find |wc -l
>>>>> 102440
>>>>>
>>>>> * Average file size (bytes):
>>>>> root at dev1|/home/aweber/cache|# ls -lR | awk '{sum += $5; n++;}
>>>>> END {print sum/n;}'
>>>>> 4776.47
>>>>>
>>>>> * Using scp:
>>>>> root at dev1|/home/aweber/cache|# time scp -rp * benk at cfs1:~/cache/
>>>>>
>>>>> real 1m38.726s
>>>>> user 0m12.173s
>>>>> sys 0m12.141s
>>>>>
>>>>> * Using cp to glusterfs mount point:
>>>>> root at dev1|/home/aweber/cache|# time cp -rp * /mnt
>>>>>
>>>>> real 30m59.101s
>>>>> user 0m1.296s
>>>>> sys 0m5.820s
>>>>>
>>>>> Here is my configuration (currently, single client writing to 4
>>>>> servers (2 DHT servers doing AFR):
>>>>>
>>>>> SERVER:
>>>>>
>>>>> # webform flat-file cache
>>>>>
>>>>> volume webform_cache
>>>>> type storage/posix
>>>>> option directory /home/clusterfs/webform/cache
>>>>> end-volume
>>>>>
>>>>> volume webform_cache_locks
>>>>> type features/locks
>>>>> subvolumes webform_cache
>>>>> end-volume
>>>>>
>>>>> volume webform_cache_brick
>>>>> type performance/io-threads
>>>>> option thread-count 32
>>>>> option max-threads 128
>>>>> option autoscaling on
>>>>> subvolumes webform_cache_locks
>>>>> end-volume
>>>>>
>>>>> <<snip>>
>>>>>
>>>>> # GlusterFS Server
>>>>> volume server
>>>>> type protocol/server
>>>>> option transport-type tcp
>>>>> subvolumes dns_public_brick dns_private_brick webform_usage_brick
>>>>> webform_cache_brick wordpress_uploads_brick subs_exports_brick
>>>>> option auth.addr.dns_public_brick.allow 10.1.1.*
>>>>> option auth.addr.dns_private_brick.allow 10.1.1.*
>>>>> option auth.addr.webform_usage_brick.allow 10.1.1.*
>>>>> option auth.addr.webform_cache_brick.allow 10.1.1.*
>>>>> option auth.addr.wordpress_uploads_brick.allow 10.1.1.*
>>>>> option auth.addr.subs_exports_brick.allow 10.1.1.*
>>>>> end-volume
>>>>>
>>>>> CLIENT:
>>>>>
>>>>> # Webform Flat-File Cache Volume client configuration
>>>>>
>>>>> volume srv1
>>>>> type protocol/client
>>>>> option transport-type tcp
>>>>> option remote-host cfs1
>>>>> option remote-subvolume webform_cache_brick
>>>>> end-volume
>>>>>
>>>>> volume srv2
>>>>> type protocol/client
>>>>> option transport-type tcp
>>>>> option remote-host cfs2
>>>>> option remote-subvolume webform_cache_brick
>>>>> end-volume
>>>>>
>>>>> volume srv3
>>>>> type protocol/client
>>>>> option transport-type tcp
>>>>> option remote-host cfs3
>>>>> option remote-subvolume webform_cache_brick
>>>>> end-volume
>>>>>
>>>>> volume srv4
>>>>> type protocol/client
>>>>> option transport-type tcp
>>>>> option remote-host cfs4
>>>>> option remote-subvolume webform_cache_brick
>>>>> end-volume
>>>>>
>>>>> volume afr1
>>>>> type cluster/afr
>>>>> subvolumes srv1 srv3
>>>>> end-volume
>>>>>
>>>>> volume afr2
>>>>> type cluster/afr
>>>>> subvolumes srv2 srv4
>>>>> end-volume
>>>>>
>>>>> volume dist
>>>>> type cluster/distribute
>>>>> subvolumes afr1 afr2
>>>>> end-volume
>>>>>
>>>>> volume writebehind
>>>>> type performance/write-behind
>>>>> option cache-size 4mb
>>>>> option flush-behind on
>>>>> subvolumes dist
>>>>> end-volume
>>>>>
>>>>> volume cache
>>>>> type performance/io-cache
>>>>> option cache-size 512mb
>>>>> subvolumes writebehind
>>>>> end-volume
>>>>>
>>>>> Benjamin Krein
>>>>> www.superk.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>>
>
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list