[Gluster-devel] performance issue

Anand Avati avati at zresearch.com
Mon Jan 14 23:41:15 UTC 2008


Matt,
 for the sake of diagnosing, can you try specifying "option self-heal off"
in cluster/unify volume and try as well?

thanks,
avati

2008/1/14, Matt Drew <matt.drew at gmail.com>:
>
> Avati,
>
> I tried values of 2, 4, 6, 30, 60, and 120 for -e and -a with no
> measurable effect.  We're not using AFR yet so there's no issue there.
>
> On Jan 14, 2008 12:05 AM, Anand Avati <avati at zresearch.com> wrote:
> > Matt,
> >  we are currently investigating AFR + io-cache performance issues
> (io-cache
> > not really making full use of caching when AFR does load balanced
> reads).
> > You could override AFR read scheduling by specifying 'option
> read-subvolume
> > <first subvolume>' in AFR as a temporary workaround. That apart, I
> suggest
> > you mount glusterfs with large -e and -a argument values which should
> > improve performance in such cases quite a bit. Do let us know if that
> made
> > any difference.
> >
> > avati
> >
> > 2008/1/14, Matt Drew <matt.drew at gmail.com>:
> > >
> > >
> > >
> > > I've been digging into a seemingly difficult performance issues over
> > > the last few days.  We're running glusterfs mainline 2.5 patch 628,
> > > fuse-2.7.2 with Marian's glfs patch, kernel 2.6.23, currently one
> > > server and two clients (soon to be two and four, respectively).  The
> > > server is a dual-core Opteron with a SATA2 disk (one, we're planning
> > > on AFR redundancy), the clients are dual-core Intel machines. The
> > > network transport is gigabit ethernet.  The server is 32-bit and the
> > > clients are 64-bit (I can rebuild the server no problem if that is the
> > > issue).  Throughput is good, and activity by one process seems to work
> > > fine.
> > >
> > > Our issue is with a PHP script running on the client via the glusterfs
> > > share.  The script has a number of includes, and those files have a
> > > few more includes.  This means a lot of stats as the webserver checks
> > > to make sure none of the files have changed.  If we make one call to
> > > the script, everything is fine - the code completes in 300ms.
> > > Similarly, if you run "ls -l" on a large directory (1700 files)
> > > everything appears to work fine (from local disk the code completes in
> > > 100ms).
> > >
> > > However, if we make two concurrent calls to the PHP script, or run two
> > > copies of ls -l on the large directory, everything slows down by an
> > > order of magnitude.  The output of the ls commands appears to stutter
> > > on each copy - usually one will stop and the other will start, but
> > > sometimes both will stop for a second or two.  Adding a third process
> > > makes it worse.  The PHP script takes 2.5 or 3 seconds to complete,
> > > instead of 300ms, and again more requests makes it worse - if you
> > > request four operations concurrently, the finish time jumps to 7
> > > seconds.  This issue occurs whether you are on a single client with
> > > two processes, or if you are on two clients with one process each.
> > >
> > > Inserting the trace translator doesn't turn up anything unusual that I
> > > can see, with the exception that it makes the processes run even
> > > slower (which is expected, of course).  A tcpdump of the filesystem
> > > traffic shows inexplicable gaps of 100ms or more with no traffic.  The
> > > single process "ls -l" test does not show these gaps.
> > >
> > > I stripped the server and client to the bare minimum with unify.  This
> > > didn't seem to make a difference.  I'm currently running this
> > > server/client stack, also without success:
> > >
> > > ns
> > > brick (x2)
> > > posix-locks
> > > io-threads(16, 64MB)
> > > server (ns, brick1, brick2)
> > >
> > > brick1
> > > brick2
> > > unify(alu)
> > > io-threads(16, 64MB)
> > > io-cache(256MB)
> > >
> > > At various times I've tried read-ahead with no discernable difference.
> > > An strace of the client process doesn't return anything interesting
> > > except a lot of these:
> > >
> > > futex(0x12345678, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource
> > > temporarily unavailable)
> > >
> > > These also appear during a single process test, but they are much more
> > > prevalent when two processes are running.
> > >
> > > What am I doing wrong? :)
> > >
> > >
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at nongnu.org
> > > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> > >
> >
> >
> >
> > --
> > If I traveled to the end of the rainbow
> > As Dame Fortune did intend,
> > Murphy would be there to tell me
> >  The pot's at the other end.
>



-- 
If I traveled to the end of the rainbow
As Dame Fortune did intend,
Murphy would be there to tell me
The pot's at the other end.



More information about the Gluster-devel mailing list