[Gluster-users] Horrible performance with small files (DHT/AFR)

Benjamin Krein superbenk at superk.org
Thu Jun 4 20:21:01 UTC 2009


Here are some more details with different configs:

* Only AFR between cfs1 & cfs2:
root at dev1# time cp -rp * /mnt/

real	16m45.995s
user	0m1.104s
sys	0m5.528s

* Single server - cfs1:
root at dev1# time cp -rp * /mnt/

real	10m33.967s
user	0m0.764s
sys	0m5.516s

* Stats via bmon on cfs1 during above copy:
   #   Interface                RX Rate         RX #     TX  
Rate         TX #
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
─ 
────────────────────────
cfs1 (source: local)
   0   eth1                     951.25KiB       1892      
254.00KiB       1633

It gets progressively better, but that's still a *long* way from <2  
min times with scp & <1 min times with rsync!  And, I have no  
redundancy or distributed hash whatsoever.

* Client config for the last test:
-----
# Webform Flat-File Cache Volume client configuration

volume srv1
	type protocol/client
	option transport-type tcp
	option remote-host cfs1
	option remote-subvolume webform_cache_brick
end-volume

volume writebehind
	type performance/write-behind
	option cache-size 4mb
         option flush-behind on
	subvolumes srv1
end-volume

volume cache
	type performance/io-cache
	option cache-size 512mb
	subvolumes writebehind
end-volume
-----

Ben

On Jun 3, 2009, at 4:33 PM, Vahriç Muhtaryan wrote:

> For better understanding issue did you try 4 servers DHT only or 2  
> servers
> DHT only or two servers replication only for find out real problem  
> maybe
> replication or dht could have a bug ?
>
> -----Original Message-----
> From: gluster-users-bounces at gluster.org
> [mailto:gluster-users-bounces at gluster.org] On Behalf Of Benjamin Krein
> Sent: Wednesday, June 03, 2009 11:00 PM
> To: Jasper van Wanrooy - Chatventure
> Cc: gluster-users at gluster.org
> Subject: Re: [Gluster-users] Horrible performance with small files  
> (DHT/AFR)
>
> The current boxes I'm using for testing are as follows:
>
>  * 2x dual-core Opteron ~2GHz (x86_64)
>  * 4GB RAM
>  * 4x 7200 RPM 73GB SATA - RAID1+0 w/3ware hardware controllers
>
> The server storage directories live in /home/clusterfs where /home is
> an ext3 partition mounted with noatime.
>
> These servers are not virtualized.  They are running Ubuntu 8.04 LTS
> Server x86_64.
>
> The files I'm copying are all <2k javascript files (plain text) stored
> in 100 hash directories in each of 3 parent directories:
>
> /home/clusterfs/
>   + parentdir1/
>   |   + 00/
>   |   | ...
>   |   + 99/
>   + parentdir1/
>   |   + 00/
>   |   | ...
>   |   + 99/
>   + parentdir1/
>       + 00/
>       | ...
>       + 99/
>
> There are ~10k of these <2k javascript files distributed throughout
> the above directory structure totaling approximately 570MB.  My tests
> have been copying that entire directory structure from a client
> machine into the glusterfs mountpoint on the client.
>
> Observing IO on both the client box & all the server boxes via iostat
> shows that the disks are doing *very* little work.  Observing the CPU/
> memory load with top or htop shows that none of the boxes are CPU or
> memory bound.  Observing the bandwidth in/out of the network interface
> shows <1MB/s throughput (we have a fully gigabit LAN!) which usually
> drops down to <150KB/s during the copy.
>
> scp'ing the same directory structure from the same client to one of
> the same servers will work at ~40-50MB/s sustained as a comparison.
> Here is the results of copying the same directory structure using
> rsync to the same partition:
>
> # time rsync -ap * benk at cfs1:~/cache/
> benk at cfs1's password:
>
> real	0m23.566s
> user	0m8.433s
> sys	0m4.580s
>
> Ben
>
> On Jun 3, 2009, at 3:16 PM, Jasper van Wanrooy - Chatventure wrote:
>
>> Hi Benjamin,
>>
>> That's not good news. What kind of hardware do you use? Is it
>> virtualised? Or do you use real boxes?
>> What kind of files are you copying in your test? What performance do
>> you have when copying it to a local dir?
>>
>> Best regards Jasper
>>
>> ----- Original Message -----
>> From: "Benjamin Krein" <superbenk at superk.org>
>> To: "Jasper van Wanrooy - Chatventure" <jvanwanrooy at chatventure.nl>
>> Cc: "Vijay Bellur" <vijay at gluster.com>, gluster-users at gluster.org
>> Sent: Wednesday, 3 June, 2009 19:23:51 GMT +01:00 Amsterdam /
>> Berlin / Bern / Rome / Stockholm / Vienna
>> Subject: Re: [Gluster-users] Horrible performance with small files
>> (DHT/AFR)
>>
>> I reduced my config to only 2 servers (had to donate 2 of the 4 to
>> another project).  I now have a single server using DHT (for future
>> scaling) and AFR to a mirrored server.  Copy times are much better,
>> but still pretty horrible:
>>
>> # time cp -rp * /mnt/
>>
>> real	21m11.505s
>> user	0m1.000s
>> sys	0m6.416s
>>
>> Ben
>>
>> On Jun 3, 2009, at 3:13 AM, Jasper van Wanrooy - Chatventure wrote:
>>
>>> Hi Benjamin,
>>>
>>> Did you also try with a lower thread-count. Actually I'm using 3
>>> threads.
>>>
>>> Best Regards Jasper
>>>
>>>
>>> On 2 jun 2009, at 18:25, Benjamin Krein wrote:
>>>
>>>> I do not see any difference with autoscaling removed.  Current
>>>> server config:
>>>>
>>>> # webform flat-file cache
>>>>
>>>> volume webform_cache
>>>> type storage/posix
>>>> option directory /home/clusterfs/webform/cache
>>>> end-volume
>>>>
>>>> volume webform_cache_locks
>>>> type features/locks
>>>> subvolumes webform_cache
>>>> end-volume
>>>>
>>>> volume webform_cache_brick
>>>> type performance/io-threads
>>>> option thread-count 32
>>>> subvolumes webform_cache_locks
>>>> end-volume
>>>>
>>>> <<snip>>
>>>>
>>>> # GlusterFS Server
>>>> volume server
>>>> type protocol/server
>>>> option transport-type tcp
>>>> subvolumes dns_public_brick dns_private_brick webform_usage_brick
>>>> webform_cache_brick wordpress_uploads_brick subs_exports_brick
>>>> option auth.addr.dns_public_brick.allow 10.1.1.*
>>>> option auth.addr.dns_private_brick.allow 10.1.1.*
>>>> option auth.addr.webform_usage_brick.allow 10.1.1.*
>>>> option auth.addr.webform_cache_brick.allow 10.1.1.*
>>>> option auth.addr.wordpress_uploads_brick.allow 10.1.1.*
>>>> option auth.addr.subs_exports_brick.allow 10.1.1.*
>>>> end-volume
>>>>
>>>> # time cp -rp * /mnt/
>>>>
>>>> real	70m13.672s
>>>> user	0m1.168s
>>>> sys	0m8.377s
>>>>
>>>> NOTE: the above test was also done during peak hours when the LAN/
>>>> dev server were in use which would cause some of the extra time.
>>>> This is still WAY too much, though.
>>>>
>>>> Ben
>>>>
>>>>
>>>> On Jun 1, 2009, at 1:40 PM, Vijay Bellur wrote:
>>>>
>>>>> Hi Benjamin,
>>>>>
>>>>> Could you please try by turning autoscaling off?
>>>>>
>>>>> Thanks,
>>>>> Vijay
>>>>>
>>>>> Benjamin Krein wrote:
>>>>>> I'm seeing extremely poor performance writing small files to a
>>>>>> glusterfs DHT/AFR mount point. Here are the stats I'm seeing:
>>>>>>
>>>>>> * Number of files:
>>>>>> root at dev1|/home/aweber/cache|# find |wc -l
>>>>>> 102440
>>>>>>
>>>>>> * Average file size (bytes):
>>>>>> root at dev1|/home/aweber/cache|# ls -lR | awk '{sum += $5; n++;}
>>>>>> END {print sum/n;}'
>>>>>> 4776.47
>>>>>>
>>>>>> * Using scp:
>>>>>> root at dev1|/home/aweber/cache|# time scp -rp * benk at cfs1:~/cache/
>>>>>>
>>>>>> real 1m38.726s
>>>>>> user 0m12.173s
>>>>>> sys 0m12.141s
>>>>>>
>>>>>> * Using cp to glusterfs mount point:
>>>>>> root at dev1|/home/aweber/cache|# time cp -rp * /mnt
>>>>>>
>>>>>> real 30m59.101s
>>>>>> user 0m1.296s
>>>>>> sys 0m5.820s
>>>>>>
>>>>>> Here is my configuration (currently, single client writing to 4
>>>>>> servers (2 DHT servers doing AFR):
>>>>>>
>>>>>> SERVER:
>>>>>>
>>>>>> # webform flat-file cache
>>>>>>
>>>>>> volume webform_cache
>>>>>> type storage/posix
>>>>>> option directory /home/clusterfs/webform/cache
>>>>>> end-volume
>>>>>>
>>>>>> volume webform_cache_locks
>>>>>> type features/locks
>>>>>> subvolumes webform_cache
>>>>>> end-volume
>>>>>>
>>>>>> volume webform_cache_brick
>>>>>> type performance/io-threads
>>>>>> option thread-count 32
>>>>>> option max-threads 128
>>>>>> option autoscaling on
>>>>>> subvolumes webform_cache_locks
>>>>>> end-volume
>>>>>>
>>>>>> <<snip>>
>>>>>>
>>>>>> # GlusterFS Server
>>>>>> volume server
>>>>>> type protocol/server
>>>>>> option transport-type tcp
>>>>>> subvolumes dns_public_brick dns_private_brick webform_usage_brick
>>>>>> webform_cache_brick wordpress_uploads_brick subs_exports_brick
>>>>>> option auth.addr.dns_public_brick.allow 10.1.1.*
>>>>>> option auth.addr.dns_private_brick.allow 10.1.1.*
>>>>>> option auth.addr.webform_usage_brick.allow 10.1.1.*
>>>>>> option auth.addr.webform_cache_brick.allow 10.1.1.*
>>>>>> option auth.addr.wordpress_uploads_brick.allow 10.1.1.*
>>>>>> option auth.addr.subs_exports_brick.allow 10.1.1.*
>>>>>> end-volume
>>>>>>
>>>>>> CLIENT:
>>>>>>
>>>>>> # Webform Flat-File Cache Volume client configuration
>>>>>>
>>>>>> volume srv1
>>>>>> type protocol/client
>>>>>> option transport-type tcp
>>>>>> option remote-host cfs1
>>>>>> option remote-subvolume webform_cache_brick
>>>>>> end-volume
>>>>>>
>>>>>> volume srv2
>>>>>> type protocol/client
>>>>>> option transport-type tcp
>>>>>> option remote-host cfs2
>>>>>> option remote-subvolume webform_cache_brick
>>>>>> end-volume
>>>>>>
>>>>>> volume srv3
>>>>>> type protocol/client
>>>>>> option transport-type tcp
>>>>>> option remote-host cfs3
>>>>>> option remote-subvolume webform_cache_brick
>>>>>> end-volume
>>>>>>
>>>>>> volume srv4
>>>>>> type protocol/client
>>>>>> option transport-type tcp
>>>>>> option remote-host cfs4
>>>>>> option remote-subvolume webform_cache_brick
>>>>>> end-volume
>>>>>>
>>>>>> volume afr1
>>>>>> type cluster/afr
>>>>>> subvolumes srv1 srv3
>>>>>> end-volume
>>>>>>
>>>>>> volume afr2
>>>>>> type cluster/afr
>>>>>> subvolumes srv2 srv4
>>>>>> end-volume
>>>>>>
>>>>>> volume dist
>>>>>> type cluster/distribute
>>>>>> subvolumes afr1 afr2
>>>>>> end-volume
>>>>>>
>>>>>> volume writebehind
>>>>>> type performance/write-behind
>>>>>> option cache-size 4mb
>>>>>> option flush-behind on
>>>>>> subvolumes dist
>>>>>> end-volume
>>>>>>
>>>>>> volume cache
>>>>>> type performance/io-cache
>>>>>> option cache-size 512mb
>>>>>> subvolumes writebehind
>>>>>> end-volume
>>>>>>
>>>>>> Benjamin Krein
>>>>>> www.superk.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>>>
>>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>





More information about the Gluster-users mailing list