[Gluster-devel] performance improvements

Wed Oct 24 16:30:15 UTC 2007

On 10/24/07, Vincent Régnard <vregnard at tbs-internet.com> wrote:
>
> Daniel van Ham Colchete a écrit :
> > On 10/23/07, Vincent Régnard <vregnard at tbs-internet.com> wrote:
> >> Hi all,
> >>
> >> The afr synchronisation using "find -mtime -1 -type f -exec head -c1
> >> trick" takes approximately 30 minutes for a 20GB filesystem with
> 300.000
> >> files. Which seems too long to be acceptable for us. I'd like to tune
> >> some parameters to increase performance.
> >>
> >> Vincent.
> >>
> > Vicent,
> >
> > so, in 1800 seconds you lookup(), open(), read() and close() 300.000times?
> > That's 6ms for each file, and that's really good. Are you sure you
> > interconnect has a 5ms round-trip time? I would bet it is less.
>
> Hi Daniel, and thank you for your clear answer.
>
> Actually, according to logs I have, only about a hundred file is
> modified every day, so open() read() and close() only occur for these
> 100 files. Not the whole filesystem. So half an hours seams a long time
> for that ? I now do this file synchronisation on a per directory basis
> (only directories where I know changes might occur) to reduce the
> runtime and it actually runs much faster. find command really seams to
> spend a huge time going down and up in the whole directory tree.
> I tried to do the same operation for all the files (not restrincting to
> recently modified), so this means opening etc.. all the 300.000 files,
> but untill now I never managed to get it finish (beeing trying for 2
> weeks now) ! Either glusterfs crashes before or I have to stop client or
> server for another reason (software upgrade). I had it run once for more
> than 12 hours, but it did not complete.

Just for us to have a comparison base, on my setup here (2 loaded balanced
web servers with AFR with 29966 files, they are their clients). I finish
your command in 2m48s. It's something about 5ms per file.

18 seconds to read one line of one file is definitely not good. Do you get
this result trying head -c1 with only one file???

I don't have the answer for you, but I would try using tcpdump in the client
and the servers paying a lot of attention on what part of the process is
wasting more time. It will help you answer a lot of questions: How much time
does it takes for the server to respond? Am I having packet losses? What
does the GlusterFS tcp protocol looks like? Although the last one doesn't
seen very useful, you can actually see a few things happening there and gain
some information out of it :). Something is really wrong with your setup and
surely a tcpdump will help you.

If you don't know how to use it:

# First record everything happening on port 6996
tcpdump -i any -w /tmp/tcp.gluster -s 2048 port 6996

# Now read the file
tcpdump -r /tmp/tcp.gluster -s 2048 -A -n|less

The commands above will work both with servers and clients.

>
> > IMO, usually you shouldn't measure GlusterFS performance with things
> > happening serially. GlusterFS is really good when things are happening
> in
> > parallel. I prefer to measure a network filesystem performance not on
> how
> > much time it takes to do one operation, but on how many operations it
> can do
> > in at the same time. If you had 300 threads trying to read all those
> files
> > it would be a lot faster. Usually that's the way real utilization
> happens,
> > if you have a webserver and a mail server using your storage, you will
> have
> > lots of web requests and e-mail sessions reading and writing at the same
> > time.
> >
>
> Regarding read/write access, I monitor with 10 clients in parralel (I
> can see the activity of 10 gluster threads). This seams ok to me, I have
> between 5 and 2 MB/s on a 100Mb network. But there is certainly no
> readdir() in that case, at least not in directories with many files. My
> real problem is listing files in directory. Mainly for mail purpose
> (smtp+imap server). I have not performed any test yest, but transfering
> to glusterfs my maildirs  with about 10.000 files each really frightens
> me.
>
> I also made some tests restricting the configuration to client and
> servers in the same datacenter (round-trip about .1ms), but the result
> seems to be roughly the same regarding read/write performance.

When I went to production with my web servers here (mail servers are not
yet), I extracted everything outside GlusterFS, directly to the first AFR
server. The seconds started to be populated latter, after everything was
already online. I think that this would apply to any network filesystem.
But, as you can see, I do not have that slowness problem you're having.

When you find the problem, please tell me. Soon I want to go to GlusterFS
with my mail storage too. NFS+DRBD+Heartbeat's performance s****.

--
> Vincent Régnard
> vregnard at tbs-internet.com
> TBS-internet.com
> 027 630 5902

Daniel