[Gluster-devel] performance issue

Mon Jan 14 05:05:08 UTC 2008

Matt,
 we are currently investigating AFR + io-cache performance issues (io-cache
not really making full use of caching when AFR does load balanced reads).
You could override AFR read scheduling by specifying 'option read-subvolume
<first subvolume>' in AFR as a temporary workaround. That apart, I suggest
you mount glusterfs with large -e and -a argument values which should
improve performance in such cases quite a bit. Do let us know if that made
any difference.

avati

2008/1/14, Matt Drew <matt.drew at gmail.com>:
>
> I've been digging into a seemingly difficult performance issues over
> the last few days.  We're running glusterfs mainline 2.5 patch 628,
> fuse-2.7.2 with Marian's glfs patch, kernel 2.6.23, currently one
> server and two clients (soon to be two and four, respectively).  The
> server is a dual-core Opteron with a SATA2 disk (one, we're planning
> on AFR redundancy), the clients are dual-core Intel machines. The
> network transport is gigabit ethernet.  The server is 32-bit and the
> clients are 64-bit (I can rebuild the server no problem if that is the
> issue).  Throughput is good, and activity by one process seems to work
> fine.
>
> Our issue is with a PHP script running on the client via the glusterfs
> share.  The script has a number of includes, and those files have a
> few more includes.  This means a lot of stats as the webserver checks
> to make sure none of the files have changed.  If we make one call to
> the script, everything is fine - the code completes in 300ms.
> Similarly, if you run "ls -l" on a large directory (1700 files)
> everything appears to work fine (from local disk the code completes in
> 100ms).
>
> However, if we make two concurrent calls to the PHP script, or run two
> copies of ls -l on the large directory, everything slows down by an
> order of magnitude.  The output of the ls commands appears to stutter
> on each copy - usually one will stop and the other will start, but
> sometimes both will stop for a second or two.  Adding a third process
> makes it worse.  The PHP script takes 2.5 or 3 seconds to complete,
> instead of 300ms, and again more requests makes it worse - if you
> request four operations concurrently, the finish time jumps to 7
> seconds.  This issue occurs whether you are on a single client with
> two processes, or if you are on two clients with one process each.
>
> Inserting the trace translator doesn't turn up anything unusual that I
> can see, with the exception that it makes the processes run even
> slower (which is expected, of course).  A tcpdump of the filesystem
> traffic shows inexplicable gaps of 100ms or more with no traffic.  The
> single process "ls -l" test does not show these gaps.
>
> I stripped the server and client to the bare minimum with unify.  This
> didn't seem to make a difference.  I'm currently running this
> server/client stack, also without success:
>
> ns
> brick (x2)
> posix-locks
> io-threads(16, 64MB)
> server (ns, brick1, brick2)
>
> brick1
> brick2
> unify(alu)
> io-threads(16, 64MB)
> io-cache(256MB)
>
> At various times I've tried read-ahead with no discernable difference.
> An strace of the client process doesn't return anything interesting
> except a lot of these:
>
> futex(0x12345678, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource
> temporarily unavailable)
>
> These also appear during a single process test, but they are much more
> prevalent when two processes are running.
>
> What am I doing wrong? :)
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>

-- 
If I traveled to the end of the rainbow
As Dame Fortune did intend,
Murphy would be there to tell me
The pot's at the other end.