[Gluster-users] gluster local vs local = gluster x4 slower

Wed Mar 31 09:15:19 UTC 2010

Thanks for the links- those are interesting numbers.  Looks like small 
block i/o performance stinks there relative to NFS too.  Given the 
performance I'm seeing, I doubt much has changed, but it certainly would 
be interesting to see the tests re-run.

     Jeremy

On 3/29/2010 1:47 PM, Ian Rogers wrote:
>
> Have you guys seen the wiki page - 
> http://www.gluster.com/community/documentation/index.php/GlusterFS_2.0_Benchmark_Results_%28compared_with_NFS%29 
> - it would be interesting if you could replicate the "Single 
> GlusterFS" section to see how things have changed...
>
> Ian 
> <http://www.gluster.com/community/documentation/index.php/GlusterFS_2.0_Benchmark_Results_%28compared_with_NFS%29#Single_GlusterFS_.26_NFS_Test_Platform> 
>
>
>
> On 29/03/2010 19:21, Jeremy Enos wrote:
>> Got a chance to run your suggested test:
>>
>> ##############GLUSTER SINGLE DISK##############
>>
>> [root at ac33 gjenos]# dd bs=4096 count=32768 if=/dev/zero 
>> of=./filename.test
>> 32768+0 records in
>> 32768+0 records out
>> 134217728 bytes (134 MB) copied, 8.60486 s, 15.6 MB/s
>> [root at ac33 gjenos]#
>> [root at ac33 gjenos]# cd /export/jenos/
>>
>> ##############DIRECT SINGLE DISK##############
>>
>> [root at ac33 jenos]# dd bs=4096 count=32768 if=/dev/zero 
>> of=./filename.test
>> 32768+0 records in
>> 32768+0 records out
>> 134217728 bytes (134 MB) copied, 0.21915 s, 612 MB/s
>> [root at ac33 jenos]#
>>
>> If doing anything that can see a cache benefit, the performance of 
>> Gluster can't compare.  Is it even using cache?
>>
>> This is the client vol file I used for that test:
>>
>> [root at ac33 jenos]# cat /etc/glusterfs/ghome.vol
>> #-----------IB remotes------------------
>> volume ghome
>>   type protocol/client
>>   option transport-type tcp/client
>>   option remote-host ac33
>>   option remote-subvolume ibstripe
>> end-volume
>>
>> #------------Performance Options-------------------
>>
>> volume readahead
>>   type performance/read-ahead
>>   option page-count 4           # 2 is default option
>>   option force-atime-update off # default is off
>>   subvolumes ghome
>> end-volume
>>
>> volume writebehind
>>   type performance/write-behind
>>   option cache-size 1MB
>>   subvolumes readahead
>> end-volume
>>
>> volume cache
>>   type performance/io-cache
>>   option cache-size 2GB
>>   subvolumes writebehind
>> end-volume
>>
>>
>> Any suggestions appreciated.  thx-
>>
>>     Jeremy
>>
>> On 3/26/2010 6:09 PM, Bryan Whitehead wrote:
>>> One more thought, looks like (from your emails) you are always running
>>> the gluster test first. Maybe the tar file is being read from disk
>>> when you do the gluster test, then being read from cache when you run
>>> for the disk.
>>>
>>> What if you just pull a chunk of 0's off /dev/zero?
>>>
>>> dd bs=4096 count=32768 if=/dev/zero of=./filename.test
>>>
>>> or stick the tar in a ramdisk?
>>>
>>> (or run the benchmark 10 times for each, drop the best and the worse,
>>> and average the remaining 8)
>>>
>>> Would also be curious if you add another node if the time would be
>>> halved, then add another 2... then it would be halved again? I guess
>>> that depends on if striping or just replicating is being used.
>>> (unfortunately I don't have access to more than 1 test box right now).
>>>
>>> On Wed, Mar 24, 2010 at 11:06 PM, Jeremy Enos<jenos at ncsa.uiuc.edu>  
>>> wrote:
>>>> For completeness:
>>>>
>>>> ##############GLUSTER SINGLE DISK NO PERFORMANCE OPTIONS##############
>>>> [root at ac33 gjenos]# time (tar xzf
>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&  sync )
>>>>
>>>> real    0m41.052s
>>>> user    0m7.705s
>>>> sys     0m3.122s
>>>> ##############DIRECT SINGLE DISK##############
>>>> [root at ac33 gjenos]# cd /export/jenos
>>>> [root at ac33 jenos]# time (tar xzf
>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&  sync )
>>>>
>>>> real    0m22.093s
>>>> user    0m6.932s
>>>> sys     0m2.459s
>>>> [root at ac33 jenos]#
>>>>
>>>> The performance options don't appear to be the problem.  So the 
>>>> question
>>>> stands- how do I get the disk cache advantage through the Gluster 
>>>> mounted
>>>> filesystem?  It seems to be key in the large performance difference.
>>>>
>>>>     Jeremy
>>>>
>>>> On 3/24/2010 4:47 PM, Jeremy Enos wrote:
>>>>> Good suggestion- I hadn't tried that yet.  It brings them much 
>>>>> closer.
>>>>>
>>>>> ##############GLUSTER SINGLE DISK##############
>>>>> [root at ac33 gjenos]# time (tar xzf
>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&  sync )
>>>>>
>>>>> real    0m32.089s
>>>>> user    0m6.516s
>>>>> sys     0m3.177s
>>>>> ##############DIRECT SINGLE DISK##############
>>>>> [root at ac33 gjenos]# cd /export/jenos/
>>>>> [root at ac33 jenos]# time (tar xzf
>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&  sync )
>>>>>
>>>>> real    0m25.089s
>>>>> user    0m6.850s
>>>>> sys     0m2.058s
>>>>> ##############DIRECT SINGLE DISK CACHED##############
>>>>> [root at ac33 jenos]# time (tar xzf
>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz )
>>>>>
>>>>> real    0m8.955s
>>>>> user    0m6.785s
>>>>> sys     0m1.848s
>>>>>
>>>>>
>>>>> Oddly, I'm seeing better performance on the gluster system than 
>>>>> previous
>>>>> tests too (used to be ~39 s).  The direct disk time is obviously 
>>>>> benefiting
>>>>> from cache.  There is still a difference, but it appears most of the
>>>>> difference disappears w/ the cache advantage removed.  That said- the
>>>>> relative performance issue then still exists with Gluster.  What 
>>>>> can be done
>>>>> to make it benefit from cache the same way direct disk does?
>>>>> thx-
>>>>>
>>>>>     Jeremy
>>>>>
>>>>> P.S.
>>>>> I'll be posting results w/ performance options completely removed 
>>>>> from
>>>>> gluster as soon as I get a chance.
>>>>>
>>>>>     Jeremy
>>>>>
>>>>> On 3/24/2010 4:23 PM, Bryan Whitehead wrote:
>>>>>> I'd like to see results with this:
>>>>>>
>>>>>> time ( tar xzf /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&
>>>>>>   sync )
>>>>>>
>>>>>> I've found local filesystems seem to use cache very heavily. The
>>>>>> untarred file could mostly be sitting in ram with local fs vs going
>>>>>> though fuse (which might do many more sync'ed flushes to disk?).
>>>>>>
>>>>>> On Wed, Mar 24, 2010 at 2:25 AM, Jeremy 
>>>>>> Enos<jenos at ncsa.uiuc.edu>    wrote:
>>>>>>> I also neglected to mention that the underlying filesystem is ext3.
>>>>>>>
>>>>>>> On 3/24/2010 3:44 AM, Jeremy Enos wrote:
>>>>>>>> I haven't tried all performance options disabled yet- I can try 
>>>>>>>> that
>>>>>>>> tomorrow when the resource frees up.  I was actually asking first
>>>>>>>> before
>>>>>>>> blindly trying different configuration matrices in case there's 
>>>>>>>> a clear
>>>>>>>> direction I should take with it.  I'll let you know.
>>>>>>>>
>>>>>>>>     Jeremy
>>>>>>>>
>>>>>>>> On 3/24/2010 2:54 AM, Stephan von Krawczynski wrote:
>>>>>>>>> Hi Jeremy,
>>>>>>>>>
>>>>>>>>> have you tried to reproduce with all performance options 
>>>>>>>>> disabled?
>>>>>>>>> They
>>>>>>>>> are
>>>>>>>>> possibly no good idea on a local system.
>>>>>>>>> What local fs do you use?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> Regards,
>>>>>>>>> Stephan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 23 Mar 2010 19:11:28 -0500
>>>>>>>>> Jeremy Enos<jenos at ncsa.uiuc.edu>      wrote:
>>>>>>>>>
>>>>>>>>>> Stephan is correct- I primarily did this test to show a 
>>>>>>>>>> demonstrable
>>>>>>>>>> overhead example that I'm trying to eliminate.  It's pronounced
>>>>>>>>>> enough
>>>>>>>>>> that it can be seen on a single disk / single node 
>>>>>>>>>> configuration,
>>>>>>>>>> which
>>>>>>>>>> is good in a way (so anyone can easily repro).
>>>>>>>>>>
>>>>>>>>>> My distributed/clustered solution would be ideal if it were fast
>>>>>>>>>> enough
>>>>>>>>>> for small block i/o as well as large block- I was hoping that 
>>>>>>>>>> single
>>>>>>>>>> node systems would achieve that, hence the single node test.  
>>>>>>>>>> Because
>>>>>>>>>> the single node test performed poorly, I eventually reduced 
>>>>>>>>>> down to
>>>>>>>>>> single disk to see if it could still be seen, and it clearly 
>>>>>>>>>> can be.
>>>>>>>>>> Perhaps it's something in my configuration?  I've pasted my 
>>>>>>>>>> config
>>>>>>>>>> files
>>>>>>>>>> below.
>>>>>>>>>> thx-
>>>>>>>>>>
>>>>>>>>>>       Jeremy
>>>>>>>>>>
>>>>>>>>>> ######################glusterfsd.vol######################
>>>>>>>>>> volume posix
>>>>>>>>>>     type storage/posix
>>>>>>>>>>     option directory /export
>>>>>>>>>> end-volume
>>>>>>>>>>
>>>>>>>>>> volume locks
>>>>>>>>>>     type features/locks
>>>>>>>>>>     subvolumes posix
>>>>>>>>>> end-volume
>>>>>>>>>>
>>>>>>>>>> volume disk
>>>>>>>>>>     type performance/io-threads
>>>>>>>>>>     option thread-count 4
>>>>>>>>>>     subvolumes locks
>>>>>>>>>> end-volume
>>>>>>>>>>
>>>>>>>>>> volume server-ib
>>>>>>>>>>     type protocol/server
>>>>>>>>>>     option transport-type ib-verbs/server
>>>>>>>>>>     option auth.addr.disk.allow *
>>>>>>>>>>     subvolumes disk
>>>>>>>>>> end-volume
>>>>>>>>>>
>>>>>>>>>> volume server-tcp
>>>>>>>>>>     type protocol/server
>>>>>>>>>>     option transport-type tcp/server
>>>>>>>>>>     option auth.addr.disk.allow *
>>>>>>>>>>     subvolumes disk
>>>>>>>>>> end-volume
>>>>>>>>>>
>>>>>>>>>> ######################ghome.vol######################
>>>>>>>>>>
>>>>>>>>>> #-----------IB remotes------------------
>>>>>>>>>> volume ghome
>>>>>>>>>>     type protocol/client
>>>>>>>>>>     option transport-type ib-verbs/client
>>>>>>>>>> #  option transport-type tcp/client
>>>>>>>>>>     option remote-host acfs
>>>>>>>>>>     option remote-subvolume raid
>>>>>>>>>> end-volume
>>>>>>>>>>
>>>>>>>>>> #------------Performance Options-------------------
>>>>>>>>>>
>>>>>>>>>> volume readahead
>>>>>>>>>>     type performance/read-ahead
>>>>>>>>>>     option page-count 4           # 2 is default option
>>>>>>>>>>     option force-atime-update off # default is off
>>>>>>>>>>     subvolumes ghome
>>>>>>>>>> end-volume
>>>>>>>>>>
>>>>>>>>>> volume writebehind
>>>>>>>>>>     type performance/write-behind
>>>>>>>>>>     option cache-size 1MB
>>>>>>>>>>     subvolumes readahead
>>>>>>>>>> end-volume
>>>>>>>>>>
>>>>>>>>>> volume cache
>>>>>>>>>>     type performance/io-cache
>>>>>>>>>>     option cache-size 1GB
>>>>>>>>>>     subvolumes writebehind
>>>>>>>>>> end-volume
>>>>>>>>>>
>>>>>>>>>> ######################END######################
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3/23/2010 6:02 AM, Stephan von Krawczynski wrote:
>>>>>>>>>>> On Tue, 23 Mar 2010 02:59:35 -0600 (CST)
>>>>>>>>>>> "Tejas N. Bhise"<tejas at gluster.com>       wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Out of curiosity, if you want to do stuff only on one machine,
>>>>>>>>>>>> why do you want to use a distributed, multi node, clustered,
>>>>>>>>>>>> file system ?
>>>>>>>>>>>>
>>>>>>>>>>> Because what he does is a very good way to show the overhead
>>>>>>>>>>> produced
>>>>>>>>>>> only by
>>>>>>>>>>> glusterfs and nothing else (i.e. no network involved).
>>>>>>>>>>> A pretty relevant test scenario I would say.
>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>>> Regards,
>>>>>>>>>>> Stephan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Am I missing something here ?
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Tejas.
>>>>>>>>>>>>
>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>> From: "Jeremy Enos"<jenos at ncsa.uiuc.edu>
>>>>>>>>>>>> To: gluster-users at gluster.org
>>>>>>>>>>>> Sent: Tuesday, March 23, 2010 2:07:06 PM GMT +05:30 Chennai,
>>>>>>>>>>>> Kolkata,
>>>>>>>>>>>> Mumbai, New Delhi
>>>>>>>>>>>> Subject: [Gluster-users] gluster local vs local = gluster 
>>>>>>>>>>>> x4 slower
>>>>>>>>>>>>
>>>>>>>>>>>> This test is pretty easy to replicate anywhere- only takes 
>>>>>>>>>>>> 1 disk,
>>>>>>>>>>>> one
>>>>>>>>>>>> machine, one tarball.  Untarring to local disk directly vs 
>>>>>>>>>>>> thru
>>>>>>>>>>>> gluster
>>>>>>>>>>>> is about 4.5x faster.  At first I thought this may be due 
>>>>>>>>>>>> to a slow
>>>>>>>>>>>> host
>>>>>>>>>>>> (Opteron 2.4ghz).  But it's not- same configuration, on a much
>>>>>>>>>>>> faster
>>>>>>>>>>>> machine (dual 3.33ghz Xeon) yields the performance below.
>>>>>>>>>>>>
>>>>>>>>>>>> ####THIS TEST WAS TO A LOCAL DISK THRU GLUSTER####
>>>>>>>>>>>> [root at ac33 jenos]# time tar xzf
>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>>>
>>>>>>>>>>>> real    0m41.290s
>>>>>>>>>>>> user    0m14.246s
>>>>>>>>>>>> sys     0m2.957s
>>>>>>>>>>>>
>>>>>>>>>>>> ####THIS TEST WAS TO A LOCAL DISK (BYPASS GLUSTER)####
>>>>>>>>>>>> [root at ac33 jenos]# cd /export/jenos/
>>>>>>>>>>>> [root at ac33 jenos]# time tar xzf
>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>>>
>>>>>>>>>>>> real    0m8.983s
>>>>>>>>>>>> user    0m6.857s
>>>>>>>>>>>> sys     0m1.844s
>>>>>>>>>>>>
>>>>>>>>>>>> ####THESE ARE TEST FILE DETAILS####
>>>>>>>>>>>> [root at ac33 jenos]# tar tzvf
>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz  |wc -l
>>>>>>>>>>>> 109
>>>>>>>>>>>> [root at ac33 jenos]# ls -l
>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>>> -rw-r--r-- 1 jenos ac 804385203 2010-02-07 06:32
>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>>> [root at ac33 jenos]#
>>>>>>>>>>>>
>>>>>>>>>>>> These are the relevant performance options I'm using in my 
>>>>>>>>>>>> .vol
>>>>>>>>>>>> file:
>>>>>>>>>>>>
>>>>>>>>>>>> #------------Performance Options-------------------
>>>>>>>>>>>>
>>>>>>>>>>>> volume readahead
>>>>>>>>>>>>      type performance/read-ahead
>>>>>>>>>>>>      option page-count 4           # 2 is default option
>>>>>>>>>>>>      option force-atime-update off # default is off
>>>>>>>>>>>>      subvolumes ghome
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> volume writebehind
>>>>>>>>>>>>      type performance/write-behind
>>>>>>>>>>>>      option cache-size 1MB
>>>>>>>>>>>>      subvolumes readahead
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> volume cache
>>>>>>>>>>>>      type performance/io-cache
>>>>>>>>>>>>      option cache-size 1GB
>>>>>>>>>>>>      subvolumes writebehind
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> What can I do to improve gluster's performance?
>>>>>>>>>>>>
>>>>>>>>>>>>        Jeremy
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>