[Gluster-users] Possible bug and performance of small files (with limited use-case workaround)

Sun Jan 1 23:32:03 UTC 2012

I am testing gluster for possible deployment.  The test is over internal
network between virtual machines, but if we go production it would
probably be infiniband.

Just pulled the latest binaries, namely 3.2.5-2.

First.  can anything be done to help performance?  It's rather slow for
doing a tar extract.  Here is the volumes:

Volume Name: v1

Type: Replicate

Status: Started

Number of Bricks: 2

Transport-type: tcp

Bricks:

Brick1: 10.0.12.141:/data1

Brick2: 10.0.12.142:/data1

Options Reconfigured:

performance.flush-behind: on

performance.write-behind: on

Volume Name: v2

Type: Replicate

Status: Started

Number of Bricks: 2

Transport-type: tcp

Bricks:

Brick1: 10.0.12.142:/data2

Brick2: 10.0.12.143:/data1

Options Reconfigured:

performance.flush-behind: on

performance.write-behind: on

I added the flush-behind and write-behind in hopes of it helping, but it
did not.  Any others?

Is naggle/TCP_NODAY set on sockets?  Looks like it used to be an option,
is it always on now?

In the time it took to tar xpf a file, I was able to significantly quicker
do a scp -r of an already extracted copy from the same source disk to both
the replicate bricks in different subdirectories.  (Probably shouldn't
access bricks directly like that, but this is just a test.)

In this exercise I noticed two things...

1.       Although files were mostly identical, the storage of as reported
from du -k was over 25% higher in the bricks compared to the files copied
over via scp.  Do extended attributes or something added to the files take
up that much space?

And more of concern (this is a bug?)

2.       One file didn't extract correctly at first.  It came out as 0
long.  Under further investigation, and after retrying the tar command
overtop the first (worked the second time), I noticed it was a symlink
that failed.  Perhaps one of the above options caused the problem?  Either
way, seems sylinks a slightly buggy.

Here is an interesting alternative/hack to improve performance when
working with lots of small files (and convince yourself gluster has
acceptable performance for random I/O on larger files) (this hack defeats
some of the advantages of gluster as you have to restrict access to one
client at a time, but you still get some of the benefits.  namely the
fault tolerant distributed network storage.  Considering that I am mainly
interested in a distributed backend store for virtual machines, this is
closer to how I would use it.  In summary, create a large file stored on
gluster, and format and mount that as a loopback device.  

Time to extract a tar file to a gluster mount directly:  5m 47.5 seconds +
.026s sync.

Time to extract a tar file to a loopback filesystem that was created on
and mounted from the same gluster mount: 33.8 sec + 6.9 sec sync 

That's less than 1/8th time and much closer to expected results.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120101/d62d8c3f/attachment.html>