[Gluster-users] gluster local vs local = gluster x4 slower

Wed Mar 31 09:22:32 UTC 2010

That too, is what I'm guessing is happening.  Besides official 
confirmation of what's going on, I'm mainly just after an answer as to 
if there is a way to solve it, and make a locally mounted single disk 
Gluster fs perform even close to as well as a single local disk 
directly, including for cached transactions.  So far, the performance 
translators have had little impact in making small block i/o performance 
competitive.
thx-

     Jeremy

On 3/30/2010 11:33 AM, Steven Truelove wrote:
> What you are likely seeing is the OS saving dirty pages in the disk 
> cache before writing them.  If you were untarring a file that was 
> significantly larger than available memory on the server, the server 
> would be forced to write to disk and you would likely see performance 
> fall more into line with the results you get when you call sync.
>
> Gluster is probably flushing data to disk more aggressively than the 
> OS would on its own.  This may be intended for reducing the loss of 
> data in server failure scenarios.  Someone on the Gluster team can 
> probably comment on any settings that may exist for controlling 
> Gluster's data flushing behaviour.
>
> Steven Truelove
>
>
> On 29/03/2010 5:09 PM, Jeremy Enos wrote:
>> I've already determined that && sync  brings the values at least to 
>> the same order (gluster is about 75% of direct disk there).  I could 
>> accept that for the benefit of having a parallel fileystem.
>> What I'm actually trying to achieve now is exactly what leaving out 
>> the && sync yields in perceived performance, which translates to real 
>> performance if the user can continue on to another task instead of 
>> blocking because Gluster isn't utilizing cache.  How, with Gluster, 
>> can I achieve the same cache benefit that direct disk gets?  Will a 
>> user ever be able to untar a moderately sized (below physical memory) 
>> file on to a Gluster filesystem as fast as to a single disk?  (as I 
>> did in my initial comparison)  Is there something fundamentally 
>> preventing that in Gluster's design, or am I misconfiguring it?
>> thx-
>>
>>     Jeremy
>>
>> On 3/29/2010 2:00 PM, Bryan Whitehead wrote:
>>> heh, don't forget the&&  sync
>>>
>>> :)
>>>
>>> On Mon, Mar 29, 2010 at 11:21 AM, Jeremy Enos<jenos at ncsa.uiuc.edu>  
>>> wrote:
>>>> Got a chance to run your suggested test:
>>>>
>>>> ##############GLUSTER SINGLE DISK##############
>>>>
>>>> [root at ac33 gjenos]# dd bs=4096 count=32768 if=/dev/zero 
>>>> of=./filename.test
>>>> 32768+0 records in
>>>> 32768+0 records out
>>>> 134217728 bytes (134 MB) copied, 8.60486 s, 15.6 MB/s
>>>> [root at ac33 gjenos]#
>>>> [root at ac33 gjenos]# cd /export/jenos/
>>>>
>>>> ##############DIRECT SINGLE DISK##############
>>>>
>>>> [root at ac33 jenos]# dd bs=4096 count=32768 if=/dev/zero 
>>>> of=./filename.test
>>>> 32768+0 records in
>>>> 32768+0 records out
>>>> 134217728 bytes (134 MB) copied, 0.21915 s, 612 MB/s
>>>> [root at ac33 jenos]#
>>>>
>>>> If doing anything that can see a cache benefit, the performance of 
>>>> Gluster
>>>> can't compare.  Is it even using cache?
>>>>
>>>> This is the client vol file I used for that test:
>>>>
>>>> [root at ac33 jenos]# cat /etc/glusterfs/ghome.vol
>>>> #-----------IB remotes------------------
>>>> volume ghome
>>>>   type protocol/client
>>>>   option transport-type tcp/client
>>>>   option remote-host ac33
>>>>   option remote-subvolume ibstripe
>>>> end-volume
>>>>
>>>> #------------Performance Options-------------------
>>>>
>>>> volume readahead
>>>>   type performance/read-ahead
>>>>   option page-count 4           # 2 is default option
>>>>   option force-atime-update off # default is off
>>>>   subvolumes ghome
>>>> end-volume
>>>>
>>>> volume writebehind
>>>>   type performance/write-behind
>>>>   option cache-size 1MB
>>>>   subvolumes readahead
>>>> end-volume
>>>>
>>>> volume cache
>>>>   type performance/io-cache
>>>>   option cache-size 2GB
>>>>   subvolumes writebehind
>>>> end-volume
>>>>
>>>>
>>>> Any suggestions appreciated.  thx-
>>>>
>>>>     Jeremy
>>>>
>>>> On 3/26/2010 6:09 PM, Bryan Whitehead wrote:
>>>>> One more thought, looks like (from your emails) you are always 
>>>>> running
>>>>> the gluster test first. Maybe the tar file is being read from disk
>>>>> when you do the gluster test, then being read from cache when you run
>>>>> for the disk.
>>>>>
>>>>> What if you just pull a chunk of 0's off /dev/zero?
>>>>>
>>>>> dd bs=4096 count=32768 if=/dev/zero of=./filename.test
>>>>>
>>>>> or stick the tar in a ramdisk?
>>>>>
>>>>> (or run the benchmark 10 times for each, drop the best and the worse,
>>>>> and average the remaining 8)
>>>>>
>>>>> Would also be curious if you add another node if the time would be
>>>>> halved, then add another 2... then it would be halved again? I guess
>>>>> that depends on if striping or just replicating is being used.
>>>>> (unfortunately I don't have access to more than 1 test box right 
>>>>> now).
>>>>>
>>>>> On Wed, Mar 24, 2010 at 11:06 PM, Jeremy 
>>>>> Enos<jenos at ncsa.uiuc.edu>    wrote:
>>>>>
>>>>>> For completeness:
>>>>>>
>>>>>> ##############GLUSTER SINGLE DISK NO PERFORMANCE 
>>>>>> OPTIONS##############
>>>>>> [root at ac33 gjenos]# time (tar xzf
>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&    sync )
>>>>>>
>>>>>> real    0m41.052s
>>>>>> user    0m7.705s
>>>>>> sys     0m3.122s
>>>>>> ##############DIRECT SINGLE DISK##############
>>>>>> [root at ac33 gjenos]# cd /export/jenos
>>>>>> [root at ac33 jenos]# time (tar xzf
>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&    sync )
>>>>>>
>>>>>> real    0m22.093s
>>>>>> user    0m6.932s
>>>>>> sys     0m2.459s
>>>>>> [root at ac33 jenos]#
>>>>>>
>>>>>> The performance options don't appear to be the problem.  So the 
>>>>>> question
>>>>>> stands- how do I get the disk cache advantage through the Gluster 
>>>>>> mounted
>>>>>> filesystem?  It seems to be key in the large performance difference.
>>>>>>
>>>>>>     Jeremy
>>>>>>
>>>>>> On 3/24/2010 4:47 PM, Jeremy Enos wrote:
>>>>>>
>>>>>>> Good suggestion- I hadn't tried that yet.  It brings them much 
>>>>>>> closer.
>>>>>>>
>>>>>>> ##############GLUSTER SINGLE DISK##############
>>>>>>> [root at ac33 gjenos]# time (tar xzf
>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&    sync )
>>>>>>>
>>>>>>> real    0m32.089s
>>>>>>> user    0m6.516s
>>>>>>> sys     0m3.177s
>>>>>>> ##############DIRECT SINGLE DISK##############
>>>>>>> [root at ac33 gjenos]# cd /export/jenos/
>>>>>>> [root at ac33 jenos]# time (tar xzf
>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&    sync )
>>>>>>>
>>>>>>> real    0m25.089s
>>>>>>> user    0m6.850s
>>>>>>> sys     0m2.058s
>>>>>>> ##############DIRECT SINGLE DISK CACHED##############
>>>>>>> [root at ac33 jenos]# time (tar xzf
>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz )
>>>>>>>
>>>>>>> real    0m8.955s
>>>>>>> user    0m6.785s
>>>>>>> sys     0m1.848s
>>>>>>>
>>>>>>>
>>>>>>> Oddly, I'm seeing better performance on the gluster system than 
>>>>>>> previous
>>>>>>> tests too (used to be ~39 s).  The direct disk time is obviously
>>>>>>> benefiting
>>>>>>> from cache.  There is still a difference, but it appears most of 
>>>>>>> the
>>>>>>> difference disappears w/ the cache advantage removed.  That 
>>>>>>> said- the
>>>>>>> relative performance issue then still exists with Gluster.  What 
>>>>>>> can be
>>>>>>> done
>>>>>>> to make it benefit from cache the same way direct disk does?
>>>>>>> thx-
>>>>>>>
>>>>>>>     Jeremy
>>>>>>>
>>>>>>> P.S.
>>>>>>> I'll be posting results w/ performance options completely 
>>>>>>> removed from
>>>>>>> gluster as soon as I get a chance.
>>>>>>>
>>>>>>>     Jeremy
>>>>>>>
>>>>>>> On 3/24/2010 4:23 PM, Bryan Whitehead wrote:
>>>>>>>
>>>>>>>> I'd like to see results with this:
>>>>>>>>
>>>>>>>> time ( tar xzf 
>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&
>>>>>>>>   sync )
>>>>>>>>
>>>>>>>> I've found local filesystems seem to use cache very heavily. The
>>>>>>>> untarred file could mostly be sitting in ram with local fs vs 
>>>>>>>> going
>>>>>>>> though fuse (which might do many more sync'ed flushes to disk?).
>>>>>>>>
>>>>>>>> On Wed, Mar 24, 2010 at 2:25 AM, Jeremy Enos<jenos at ncsa.uiuc.edu>
>>>>>>>>   wrote:
>>>>>>>>
>>>>>>>>> I also neglected to mention that the underlying filesystem is 
>>>>>>>>> ext3.
>>>>>>>>>
>>>>>>>>> On 3/24/2010 3:44 AM, Jeremy Enos wrote:
>>>>>>>>>
>>>>>>>>>> I haven't tried all performance options disabled yet- I can 
>>>>>>>>>> try that
>>>>>>>>>> tomorrow when the resource frees up.  I was actually asking 
>>>>>>>>>> first
>>>>>>>>>> before
>>>>>>>>>> blindly trying different configuration matrices in case 
>>>>>>>>>> there's a
>>>>>>>>>> clear
>>>>>>>>>> direction I should take with it.  I'll let you know.
>>>>>>>>>>
>>>>>>>>>>     Jeremy
>>>>>>>>>>
>>>>>>>>>> On 3/24/2010 2:54 AM, Stephan von Krawczynski wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Jeremy,
>>>>>>>>>>>
>>>>>>>>>>> have you tried to reproduce with all performance options 
>>>>>>>>>>> disabled?
>>>>>>>>>>> They
>>>>>>>>>>> are
>>>>>>>>>>> possibly no good idea on a local system.
>>>>>>>>>>> What local fs do you use?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>>> Regards,
>>>>>>>>>>> Stephan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, 23 Mar 2010 19:11:28 -0500
>>>>>>>>>>> Jeremy Enos<jenos at ncsa.uiuc.edu>        wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Stephan is correct- I primarily did this test to show a
>>>>>>>>>>>> demonstrable
>>>>>>>>>>>> overhead example that I'm trying to eliminate.  It's 
>>>>>>>>>>>> pronounced
>>>>>>>>>>>> enough
>>>>>>>>>>>> that it can be seen on a single disk / single node 
>>>>>>>>>>>> configuration,
>>>>>>>>>>>> which
>>>>>>>>>>>> is good in a way (so anyone can easily repro).
>>>>>>>>>>>>
>>>>>>>>>>>> My distributed/clustered solution would be ideal if it were 
>>>>>>>>>>>> fast
>>>>>>>>>>>> enough
>>>>>>>>>>>> for small block i/o as well as large block- I was hoping that
>>>>>>>>>>>> single
>>>>>>>>>>>> node systems would achieve that, hence the single node test.
>>>>>>>>>>>>   Because
>>>>>>>>>>>> the single node test performed poorly, I eventually reduced 
>>>>>>>>>>>> down to
>>>>>>>>>>>> single disk to see if it could still be seen, and it 
>>>>>>>>>>>> clearly can
>>>>>>>>>>>> be.
>>>>>>>>>>>> Perhaps it's something in my configuration?  I've pasted my 
>>>>>>>>>>>> config
>>>>>>>>>>>> files
>>>>>>>>>>>> below.
>>>>>>>>>>>> thx-
>>>>>>>>>>>>
>>>>>>>>>>>>       Jeremy
>>>>>>>>>>>>
>>>>>>>>>>>> ######################glusterfsd.vol######################
>>>>>>>>>>>> volume posix
>>>>>>>>>>>>     type storage/posix
>>>>>>>>>>>>     option directory /export
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> volume locks
>>>>>>>>>>>>     type features/locks
>>>>>>>>>>>>     subvolumes posix
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> volume disk
>>>>>>>>>>>>     type performance/io-threads
>>>>>>>>>>>>     option thread-count 4
>>>>>>>>>>>>     subvolumes locks
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> volume server-ib
>>>>>>>>>>>>     type protocol/server
>>>>>>>>>>>>     option transport-type ib-verbs/server
>>>>>>>>>>>>     option auth.addr.disk.allow *
>>>>>>>>>>>>     subvolumes disk
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> volume server-tcp
>>>>>>>>>>>>     type protocol/server
>>>>>>>>>>>>     option transport-type tcp/server
>>>>>>>>>>>>     option auth.addr.disk.allow *
>>>>>>>>>>>>     subvolumes disk
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> ######################ghome.vol######################
>>>>>>>>>>>>
>>>>>>>>>>>> #-----------IB remotes------------------
>>>>>>>>>>>> volume ghome
>>>>>>>>>>>>     type protocol/client
>>>>>>>>>>>>     option transport-type ib-verbs/client
>>>>>>>>>>>> #  option transport-type tcp/client
>>>>>>>>>>>>     option remote-host acfs
>>>>>>>>>>>>     option remote-subvolume raid
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> #------------Performance Options-------------------
>>>>>>>>>>>>
>>>>>>>>>>>> volume readahead
>>>>>>>>>>>>     type performance/read-ahead
>>>>>>>>>>>>     option page-count 4           # 2 is default option
>>>>>>>>>>>>     option force-atime-update off # default is off
>>>>>>>>>>>>     subvolumes ghome
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> volume writebehind
>>>>>>>>>>>>     type performance/write-behind
>>>>>>>>>>>>     option cache-size 1MB
>>>>>>>>>>>>     subvolumes readahead
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> volume cache
>>>>>>>>>>>>     type performance/io-cache
>>>>>>>>>>>>     option cache-size 1GB
>>>>>>>>>>>>     subvolumes writebehind
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> ######################END######################
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 3/23/2010 6:02 AM, Stephan von Krawczynski wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, 23 Mar 2010 02:59:35 -0600 (CST)
>>>>>>>>>>>>> "Tejas N. Bhise"<tejas at gluster.com>         wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Out of curiosity, if you want to do stuff only on one 
>>>>>>>>>>>>>> machine,
>>>>>>>>>>>>>> why do you want to use a distributed, multi node, clustered,
>>>>>>>>>>>>>> file system ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Because what he does is a very good way to show the overhead
>>>>>>>>>>>>> produced
>>>>>>>>>>>>> only by
>>>>>>>>>>>>> glusterfs and nothing else (i.e. no network involved).
>>>>>>>>>>>>> A pretty relevant test scenario I would say.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am I missing something here ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Tejas.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>>> From: "Jeremy Enos"<jenos at ncsa.uiuc.edu>
>>>>>>>>>>>>>> To: gluster-users at gluster.org
>>>>>>>>>>>>>> Sent: Tuesday, March 23, 2010 2:07:06 PM GMT +05:30 Chennai,
>>>>>>>>>>>>>> Kolkata,
>>>>>>>>>>>>>> Mumbai, New Delhi
>>>>>>>>>>>>>> Subject: [Gluster-users] gluster local vs local = gluster x4
>>>>>>>>>>>>>> slower
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This test is pretty easy to replicate anywhere- only takes 1
>>>>>>>>>>>>>> disk,
>>>>>>>>>>>>>> one
>>>>>>>>>>>>>> machine, one tarball.  Untarring to local disk directly 
>>>>>>>>>>>>>> vs thru
>>>>>>>>>>>>>> gluster
>>>>>>>>>>>>>> is about 4.5x faster.  At first I thought this may be due 
>>>>>>>>>>>>>> to a
>>>>>>>>>>>>>> slow
>>>>>>>>>>>>>> host
>>>>>>>>>>>>>> (Opteron 2.4ghz).  But it's not- same configuration, on a 
>>>>>>>>>>>>>> much
>>>>>>>>>>>>>> faster
>>>>>>>>>>>>>> machine (dual 3.33ghz Xeon) yields the performance below.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ####THIS TEST WAS TO A LOCAL DISK THRU GLUSTER####
>>>>>>>>>>>>>> [root at ac33 jenos]# time tar xzf
>>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> real    0m41.290s
>>>>>>>>>>>>>> user    0m14.246s
>>>>>>>>>>>>>> sys     0m2.957s
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ####THIS TEST WAS TO A LOCAL DISK (BYPASS GLUSTER)####
>>>>>>>>>>>>>> [root at ac33 jenos]# cd /export/jenos/
>>>>>>>>>>>>>> [root at ac33 jenos]# time tar xzf
>>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> real    0m8.983s
>>>>>>>>>>>>>> user    0m6.857s
>>>>>>>>>>>>>> sys     0m1.844s
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ####THESE ARE TEST FILE DETAILS####
>>>>>>>>>>>>>> [root at ac33 jenos]# tar tzvf
>>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz  |wc -l
>>>>>>>>>>>>>> 109
>>>>>>>>>>>>>> [root at ac33 jenos]# ls -l
>>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>>>>> -rw-r--r-- 1 jenos ac 804385203 2010-02-07 06:32
>>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>>>>> [root at ac33 jenos]#
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> These are the relevant performance options I'm using in 
>>>>>>>>>>>>>> my .vol
>>>>>>>>>>>>>> file:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> #------------Performance Options-------------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> volume readahead
>>>>>>>>>>>>>>      type performance/read-ahead
>>>>>>>>>>>>>>      option page-count 4           # 2 is default option
>>>>>>>>>>>>>>      option force-atime-update off # default is off
>>>>>>>>>>>>>>      subvolumes ghome
>>>>>>>>>>>>>> end-volume
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> volume writebehind
>>>>>>>>>>>>>>      type performance/write-behind
>>>>>>>>>>>>>>      option cache-size 1MB
>>>>>>>>>>>>>>      subvolumes readahead
>>>>>>>>>>>>>> end-volume
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> volume cache
>>>>>>>>>>>>>>      type performance/io-cache
>>>>>>>>>>>>>>      option cache-size 1GB
>>>>>>>>>>>>>>      subvolumes writebehind
>>>>>>>>>>>>>> end-volume
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What can I do to improve gluster's performance?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>        Jeremy
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>>>
>>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
>