[Gluster-users] gluster local vs local = gluster x4 slower
Jeremy Enos
jenos at ncsa.uiuc.edu
Wed Mar 31 09:22:32 UTC 2010
That too, is what I'm guessing is happening. Besides official
confirmation of what's going on, I'm mainly just after an answer as to
if there is a way to solve it, and make a locally mounted single disk
Gluster fs perform even close to as well as a single local disk
directly, including for cached transactions. So far, the performance
translators have had little impact in making small block i/o performance
competitive.
thx-
Jeremy
On 3/30/2010 11:33 AM, Steven Truelove wrote:
> What you are likely seeing is the OS saving dirty pages in the disk
> cache before writing them. If you were untarring a file that was
> significantly larger than available memory on the server, the server
> would be forced to write to disk and you would likely see performance
> fall more into line with the results you get when you call sync.
>
> Gluster is probably flushing data to disk more aggressively than the
> OS would on its own. This may be intended for reducing the loss of
> data in server failure scenarios. Someone on the Gluster team can
> probably comment on any settings that may exist for controlling
> Gluster's data flushing behaviour.
>
> Steven Truelove
>
>
> On 29/03/2010 5:09 PM, Jeremy Enos wrote:
>> I've already determined that && sync brings the values at least to
>> the same order (gluster is about 75% of direct disk there). I could
>> accept that for the benefit of having a parallel fileystem.
>> What I'm actually trying to achieve now is exactly what leaving out
>> the && sync yields in perceived performance, which translates to real
>> performance if the user can continue on to another task instead of
>> blocking because Gluster isn't utilizing cache. How, with Gluster,
>> can I achieve the same cache benefit that direct disk gets? Will a
>> user ever be able to untar a moderately sized (below physical memory)
>> file on to a Gluster filesystem as fast as to a single disk? (as I
>> did in my initial comparison) Is there something fundamentally
>> preventing that in Gluster's design, or am I misconfiguring it?
>> thx-
>>
>> Jeremy
>>
>> On 3/29/2010 2:00 PM, Bryan Whitehead wrote:
>>> heh, don't forget the&& sync
>>>
>>> :)
>>>
>>> On Mon, Mar 29, 2010 at 11:21 AM, Jeremy Enos<jenos at ncsa.uiuc.edu>
>>> wrote:
>>>> Got a chance to run your suggested test:
>>>>
>>>> ##############GLUSTER SINGLE DISK##############
>>>>
>>>> [root at ac33 gjenos]# dd bs=4096 count=32768 if=/dev/zero
>>>> of=./filename.test
>>>> 32768+0 records in
>>>> 32768+0 records out
>>>> 134217728 bytes (134 MB) copied, 8.60486 s, 15.6 MB/s
>>>> [root at ac33 gjenos]#
>>>> [root at ac33 gjenos]# cd /export/jenos/
>>>>
>>>> ##############DIRECT SINGLE DISK##############
>>>>
>>>> [root at ac33 jenos]# dd bs=4096 count=32768 if=/dev/zero
>>>> of=./filename.test
>>>> 32768+0 records in
>>>> 32768+0 records out
>>>> 134217728 bytes (134 MB) copied, 0.21915 s, 612 MB/s
>>>> [root at ac33 jenos]#
>>>>
>>>> If doing anything that can see a cache benefit, the performance of
>>>> Gluster
>>>> can't compare. Is it even using cache?
>>>>
>>>> This is the client vol file I used for that test:
>>>>
>>>> [root at ac33 jenos]# cat /etc/glusterfs/ghome.vol
>>>> #-----------IB remotes------------------
>>>> volume ghome
>>>> type protocol/client
>>>> option transport-type tcp/client
>>>> option remote-host ac33
>>>> option remote-subvolume ibstripe
>>>> end-volume
>>>>
>>>> #------------Performance Options-------------------
>>>>
>>>> volume readahead
>>>> type performance/read-ahead
>>>> option page-count 4 # 2 is default option
>>>> option force-atime-update off # default is off
>>>> subvolumes ghome
>>>> end-volume
>>>>
>>>> volume writebehind
>>>> type performance/write-behind
>>>> option cache-size 1MB
>>>> subvolumes readahead
>>>> end-volume
>>>>
>>>> volume cache
>>>> type performance/io-cache
>>>> option cache-size 2GB
>>>> subvolumes writebehind
>>>> end-volume
>>>>
>>>>
>>>> Any suggestions appreciated. thx-
>>>>
>>>> Jeremy
>>>>
>>>> On 3/26/2010 6:09 PM, Bryan Whitehead wrote:
>>>>> One more thought, looks like (from your emails) you are always
>>>>> running
>>>>> the gluster test first. Maybe the tar file is being read from disk
>>>>> when you do the gluster test, then being read from cache when you run
>>>>> for the disk.
>>>>>
>>>>> What if you just pull a chunk of 0's off /dev/zero?
>>>>>
>>>>> dd bs=4096 count=32768 if=/dev/zero of=./filename.test
>>>>>
>>>>> or stick the tar in a ramdisk?
>>>>>
>>>>> (or run the benchmark 10 times for each, drop the best and the worse,
>>>>> and average the remaining 8)
>>>>>
>>>>> Would also be curious if you add another node if the time would be
>>>>> halved, then add another 2... then it would be halved again? I guess
>>>>> that depends on if striping or just replicating is being used.
>>>>> (unfortunately I don't have access to more than 1 test box right
>>>>> now).
>>>>>
>>>>> On Wed, Mar 24, 2010 at 11:06 PM, Jeremy
>>>>> Enos<jenos at ncsa.uiuc.edu> wrote:
>>>>>
>>>>>> For completeness:
>>>>>>
>>>>>> ##############GLUSTER SINGLE DISK NO PERFORMANCE
>>>>>> OPTIONS##############
>>>>>> [root at ac33 gjenos]# time (tar xzf
>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&& sync )
>>>>>>
>>>>>> real 0m41.052s
>>>>>> user 0m7.705s
>>>>>> sys 0m3.122s
>>>>>> ##############DIRECT SINGLE DISK##############
>>>>>> [root at ac33 gjenos]# cd /export/jenos
>>>>>> [root at ac33 jenos]# time (tar xzf
>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&& sync )
>>>>>>
>>>>>> real 0m22.093s
>>>>>> user 0m6.932s
>>>>>> sys 0m2.459s
>>>>>> [root at ac33 jenos]#
>>>>>>
>>>>>> The performance options don't appear to be the problem. So the
>>>>>> question
>>>>>> stands- how do I get the disk cache advantage through the Gluster
>>>>>> mounted
>>>>>> filesystem? It seems to be key in the large performance difference.
>>>>>>
>>>>>> Jeremy
>>>>>>
>>>>>> On 3/24/2010 4:47 PM, Jeremy Enos wrote:
>>>>>>
>>>>>>> Good suggestion- I hadn't tried that yet. It brings them much
>>>>>>> closer.
>>>>>>>
>>>>>>> ##############GLUSTER SINGLE DISK##############
>>>>>>> [root at ac33 gjenos]# time (tar xzf
>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&& sync )
>>>>>>>
>>>>>>> real 0m32.089s
>>>>>>> user 0m6.516s
>>>>>>> sys 0m3.177s
>>>>>>> ##############DIRECT SINGLE DISK##############
>>>>>>> [root at ac33 gjenos]# cd /export/jenos/
>>>>>>> [root at ac33 jenos]# time (tar xzf
>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&& sync )
>>>>>>>
>>>>>>> real 0m25.089s
>>>>>>> user 0m6.850s
>>>>>>> sys 0m2.058s
>>>>>>> ##############DIRECT SINGLE DISK CACHED##############
>>>>>>> [root at ac33 jenos]# time (tar xzf
>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz )
>>>>>>>
>>>>>>> real 0m8.955s
>>>>>>> user 0m6.785s
>>>>>>> sys 0m1.848s
>>>>>>>
>>>>>>>
>>>>>>> Oddly, I'm seeing better performance on the gluster system than
>>>>>>> previous
>>>>>>> tests too (used to be ~39 s). The direct disk time is obviously
>>>>>>> benefiting
>>>>>>> from cache. There is still a difference, but it appears most of
>>>>>>> the
>>>>>>> difference disappears w/ the cache advantage removed. That
>>>>>>> said- the
>>>>>>> relative performance issue then still exists with Gluster. What
>>>>>>> can be
>>>>>>> done
>>>>>>> to make it benefit from cache the same way direct disk does?
>>>>>>> thx-
>>>>>>>
>>>>>>> Jeremy
>>>>>>>
>>>>>>> P.S.
>>>>>>> I'll be posting results w/ performance options completely
>>>>>>> removed from
>>>>>>> gluster as soon as I get a chance.
>>>>>>>
>>>>>>> Jeremy
>>>>>>>
>>>>>>> On 3/24/2010 4:23 PM, Bryan Whitehead wrote:
>>>>>>>
>>>>>>>> I'd like to see results with this:
>>>>>>>>
>>>>>>>> time ( tar xzf
>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&
>>>>>>>> sync )
>>>>>>>>
>>>>>>>> I've found local filesystems seem to use cache very heavily. The
>>>>>>>> untarred file could mostly be sitting in ram with local fs vs
>>>>>>>> going
>>>>>>>> though fuse (which might do many more sync'ed flushes to disk?).
>>>>>>>>
>>>>>>>> On Wed, Mar 24, 2010 at 2:25 AM, Jeremy Enos<jenos at ncsa.uiuc.edu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I also neglected to mention that the underlying filesystem is
>>>>>>>>> ext3.
>>>>>>>>>
>>>>>>>>> On 3/24/2010 3:44 AM, Jeremy Enos wrote:
>>>>>>>>>
>>>>>>>>>> I haven't tried all performance options disabled yet- I can
>>>>>>>>>> try that
>>>>>>>>>> tomorrow when the resource frees up. I was actually asking
>>>>>>>>>> first
>>>>>>>>>> before
>>>>>>>>>> blindly trying different configuration matrices in case
>>>>>>>>>> there's a
>>>>>>>>>> clear
>>>>>>>>>> direction I should take with it. I'll let you know.
>>>>>>>>>>
>>>>>>>>>> Jeremy
>>>>>>>>>>
>>>>>>>>>> On 3/24/2010 2:54 AM, Stephan von Krawczynski wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Jeremy,
>>>>>>>>>>>
>>>>>>>>>>> have you tried to reproduce with all performance options
>>>>>>>>>>> disabled?
>>>>>>>>>>> They
>>>>>>>>>>> are
>>>>>>>>>>> possibly no good idea on a local system.
>>>>>>>>>>> What local fs do you use?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Regards,
>>>>>>>>>>> Stephan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, 23 Mar 2010 19:11:28 -0500
>>>>>>>>>>> Jeremy Enos<jenos at ncsa.uiuc.edu> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Stephan is correct- I primarily did this test to show a
>>>>>>>>>>>> demonstrable
>>>>>>>>>>>> overhead example that I'm trying to eliminate. It's
>>>>>>>>>>>> pronounced
>>>>>>>>>>>> enough
>>>>>>>>>>>> that it can be seen on a single disk / single node
>>>>>>>>>>>> configuration,
>>>>>>>>>>>> which
>>>>>>>>>>>> is good in a way (so anyone can easily repro).
>>>>>>>>>>>>
>>>>>>>>>>>> My distributed/clustered solution would be ideal if it were
>>>>>>>>>>>> fast
>>>>>>>>>>>> enough
>>>>>>>>>>>> for small block i/o as well as large block- I was hoping that
>>>>>>>>>>>> single
>>>>>>>>>>>> node systems would achieve that, hence the single node test.
>>>>>>>>>>>> Because
>>>>>>>>>>>> the single node test performed poorly, I eventually reduced
>>>>>>>>>>>> down to
>>>>>>>>>>>> single disk to see if it could still be seen, and it
>>>>>>>>>>>> clearly can
>>>>>>>>>>>> be.
>>>>>>>>>>>> Perhaps it's something in my configuration? I've pasted my
>>>>>>>>>>>> config
>>>>>>>>>>>> files
>>>>>>>>>>>> below.
>>>>>>>>>>>> thx-
>>>>>>>>>>>>
>>>>>>>>>>>> Jeremy
>>>>>>>>>>>>
>>>>>>>>>>>> ######################glusterfsd.vol######################
>>>>>>>>>>>> volume posix
>>>>>>>>>>>> type storage/posix
>>>>>>>>>>>> option directory /export
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> volume locks
>>>>>>>>>>>> type features/locks
>>>>>>>>>>>> subvolumes posix
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> volume disk
>>>>>>>>>>>> type performance/io-threads
>>>>>>>>>>>> option thread-count 4
>>>>>>>>>>>> subvolumes locks
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> volume server-ib
>>>>>>>>>>>> type protocol/server
>>>>>>>>>>>> option transport-type ib-verbs/server
>>>>>>>>>>>> option auth.addr.disk.allow *
>>>>>>>>>>>> subvolumes disk
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> volume server-tcp
>>>>>>>>>>>> type protocol/server
>>>>>>>>>>>> option transport-type tcp/server
>>>>>>>>>>>> option auth.addr.disk.allow *
>>>>>>>>>>>> subvolumes disk
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> ######################ghome.vol######################
>>>>>>>>>>>>
>>>>>>>>>>>> #-----------IB remotes------------------
>>>>>>>>>>>> volume ghome
>>>>>>>>>>>> type protocol/client
>>>>>>>>>>>> option transport-type ib-verbs/client
>>>>>>>>>>>> # option transport-type tcp/client
>>>>>>>>>>>> option remote-host acfs
>>>>>>>>>>>> option remote-subvolume raid
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> #------------Performance Options-------------------
>>>>>>>>>>>>
>>>>>>>>>>>> volume readahead
>>>>>>>>>>>> type performance/read-ahead
>>>>>>>>>>>> option page-count 4 # 2 is default option
>>>>>>>>>>>> option force-atime-update off # default is off
>>>>>>>>>>>> subvolumes ghome
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> volume writebehind
>>>>>>>>>>>> type performance/write-behind
>>>>>>>>>>>> option cache-size 1MB
>>>>>>>>>>>> subvolumes readahead
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> volume cache
>>>>>>>>>>>> type performance/io-cache
>>>>>>>>>>>> option cache-size 1GB
>>>>>>>>>>>> subvolumes writebehind
>>>>>>>>>>>> end-volume
>>>>>>>>>>>>
>>>>>>>>>>>> ######################END######################
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 3/23/2010 6:02 AM, Stephan von Krawczynski wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, 23 Mar 2010 02:59:35 -0600 (CST)
>>>>>>>>>>>>> "Tejas N. Bhise"<tejas at gluster.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Out of curiosity, if you want to do stuff only on one
>>>>>>>>>>>>>> machine,
>>>>>>>>>>>>>> why do you want to use a distributed, multi node, clustered,
>>>>>>>>>>>>>> file system ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Because what he does is a very good way to show the overhead
>>>>>>>>>>>>> produced
>>>>>>>>>>>>> only by
>>>>>>>>>>>>> glusterfs and nothing else (i.e. no network involved).
>>>>>>>>>>>>> A pretty relevant test scenario I would say.
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am I missing something here ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Tejas.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>>> From: "Jeremy Enos"<jenos at ncsa.uiuc.edu>
>>>>>>>>>>>>>> To: gluster-users at gluster.org
>>>>>>>>>>>>>> Sent: Tuesday, March 23, 2010 2:07:06 PM GMT +05:30 Chennai,
>>>>>>>>>>>>>> Kolkata,
>>>>>>>>>>>>>> Mumbai, New Delhi
>>>>>>>>>>>>>> Subject: [Gluster-users] gluster local vs local = gluster x4
>>>>>>>>>>>>>> slower
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This test is pretty easy to replicate anywhere- only takes 1
>>>>>>>>>>>>>> disk,
>>>>>>>>>>>>>> one
>>>>>>>>>>>>>> machine, one tarball. Untarring to local disk directly
>>>>>>>>>>>>>> vs thru
>>>>>>>>>>>>>> gluster
>>>>>>>>>>>>>> is about 4.5x faster. At first I thought this may be due
>>>>>>>>>>>>>> to a
>>>>>>>>>>>>>> slow
>>>>>>>>>>>>>> host
>>>>>>>>>>>>>> (Opteron 2.4ghz). But it's not- same configuration, on a
>>>>>>>>>>>>>> much
>>>>>>>>>>>>>> faster
>>>>>>>>>>>>>> machine (dual 3.33ghz Xeon) yields the performance below.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ####THIS TEST WAS TO A LOCAL DISK THRU GLUSTER####
>>>>>>>>>>>>>> [root at ac33 jenos]# time tar xzf
>>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> real 0m41.290s
>>>>>>>>>>>>>> user 0m14.246s
>>>>>>>>>>>>>> sys 0m2.957s
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ####THIS TEST WAS TO A LOCAL DISK (BYPASS GLUSTER)####
>>>>>>>>>>>>>> [root at ac33 jenos]# cd /export/jenos/
>>>>>>>>>>>>>> [root at ac33 jenos]# time tar xzf
>>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> real 0m8.983s
>>>>>>>>>>>>>> user 0m6.857s
>>>>>>>>>>>>>> sys 0m1.844s
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ####THESE ARE TEST FILE DETAILS####
>>>>>>>>>>>>>> [root at ac33 jenos]# tar tzvf
>>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz |wc -l
>>>>>>>>>>>>>> 109
>>>>>>>>>>>>>> [root at ac33 jenos]# ls -l
>>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>>>>> -rw-r--r-- 1 jenos ac 804385203 2010-02-07 06:32
>>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>>>>> [root at ac33 jenos]#
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> These are the relevant performance options I'm using in
>>>>>>>>>>>>>> my .vol
>>>>>>>>>>>>>> file:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> #------------Performance Options-------------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> volume readahead
>>>>>>>>>>>>>> type performance/read-ahead
>>>>>>>>>>>>>> option page-count 4 # 2 is default option
>>>>>>>>>>>>>> option force-atime-update off # default is off
>>>>>>>>>>>>>> subvolumes ghome
>>>>>>>>>>>>>> end-volume
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> volume writebehind
>>>>>>>>>>>>>> type performance/write-behind
>>>>>>>>>>>>>> option cache-size 1MB
>>>>>>>>>>>>>> subvolumes readahead
>>>>>>>>>>>>>> end-volume
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> volume cache
>>>>>>>>>>>>>> type performance/io-cache
>>>>>>>>>>>>>> option cache-size 1GB
>>>>>>>>>>>>>> subvolumes writebehind
>>>>>>>>>>>>>> end-volume
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What can I do to improve gluster's performance?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jeremy
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>>>
>>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
>
More information about the Gluster-users
mailing list