[Gluster-users] Quick performance check?

Fri Feb 3 12:48:47 UTC 2017

Hi Alex,
 I don't use Gluster for storing large amounts of small files, however from
what I've read, that does appear to its big achilles heel.
Personally, if you're not looking to scale out to a lot more servers, I'd
go with Ceph or DRBD. Gluster's best features are in its scalability.
Also, it's worth pointing out that in any setup, you've got to be careful
with 2 node configurations as they're highly vulnerable to split-brain
scenarios.

Given the relatively small size of your data, caching tweaks & an arbiter
may well save you here, however I don't use enough of its caching features
to be able to give advice on it.

D

On 3 February 2017 at 08:28, Alex Sudakar <alex.sudakar at gmail.com> wrote:

> Hi.  I'm looking for a clustered filesystem for a very simple
> scenario.  I've set up Gluster but my tests have shown quite a
> performance penalty when compared to using a local XFS filesystem.
> This no doubt reflects the reality of moving to a proper distributed
> filesystem, but I'd like to quickly check that I haven't missed
> something obvious that might improve performance.
>
> I plan to have two Amazon AWS EC2 instances (virtual machines) both
> accessing the same filesystem for read/writes.  Access will be almost
> entirely reads, with the occasional modification, deletion or creation
> of files.  Ideally I wanted all those reads going straight to the
> local XFS filesystem and just the writes incurring a distributed
> performance penalty.  :-)
>
> So I've set up two VMs with Centos 7.2 and Gluster 3.8.8, each machine
> running as a combined Gluster server and client.  One brick on each
> machine, one volume in a 1 x 2 replica configuration.
>
> Everything works, it's just the performance penalty which is a surprise.
> :-)
>
> My test directory has 9,066 files and directories; 7,987 actual files.
> Total size is 63MB data, 85MB allocated; an average size of 8KB data
> per file.  The brick's files have a total of 117MB allocated, with the
> extra 32MB working out pretty much to be exactly the sum of the extra
> 4KB extents that would have been allocated for the XFS attributes per
> file - the VMs were installed with the default 256 byte inode size for
> the local filesystem, and from what I've read Gluster will force the
> filesystem to allocate an extent for its attributes.  'xfs_bmap' on a
> few files shows this is the case.
>
> A simple 'cat' of every file when laid out in 'native' directories on
> the XFS filesystem takes about 3 seconds.  A cat of all the files in
> the brick's directory on the same filesystem takes about 6.4 seconds,
> which I figure is due to the extra I/O for the inode metadata extents
> (although not quite certain; the additional extents added about 40%
> extra to the disk block allocation, so I'm unsure as to why the time
> increase was 100%).
>
> Doing the same test through the glusterfs mount takes about 25
> seconds; roughly four times longer than reading those same files
> directly from the brick itself.
>
> It took 30 seconds until I applied the 'md-cache' settings (for those
> variables that still exist in 3.8.8) mentioned in this very helpful
> article:
>
>   http://blog.gluster.org/category/performance/
>
> So use of the md-cache in a 'cold run' shaved off 5 seconds - due to
> common directory LOOKUP operations being cached I guess.
>
> Output of a 'volume info' is as follows:
>
> Volume Name: g1
> Type: Replicate
> Volume ID: bac6cd70-ca0d-4173-9122-644051444fe5
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: serverA:/data/brick1
> Brick2: serverC:/data/brick1
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> nfs.disable: on
> cluster.self-heal-daemon: enable
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> performance.stat-prefetch: on
> performance.md-cache-timeout: 60
> network.inode-lru-limit: 90000
>
> The article suggests a value of 600 for
> features.cache-invalidation-timeout but my Gluster version only
> permits a maximum value of 60.
>
> Network speed between the two VMs is about 120 MBytes/sec - the two
> VMs inhabit the same Amazon Virtual Private Cloud - so I don't think
> bandwidth is a factor.
>
> The 400% slowdown is no doubt the penalty incurred in moving to a
> proper distributed filesystem.  That article and other web pages I've
> read all say that each open of a file results in synchronous LOOKUP
> operations on all the replicas, so I'm guessing it just takes that
> much time for everything to happen before a file can be opened.
> Gluster profiling shows that there are 11,198 LOOKUP operations on the
> test cat of the 7,987 files.
>
> As a Gluster newbie I'd appreciate some quick advice if possible -
>
> 1.  Is this sort of performance hit - on directories of small files -
> typical for such a simple Gluster configuration?
>
> 2.  Is there anything I can do to speed things up?  :-)
>
> 3.  Repeating the 'cat' test immediately after the first test run saw
> the time dive from 25 seconds down to 4 seconds.  Before I'd set those
> md-cache variables it had taken 17 seconds, due, I assume, to the
> actual file data being cached in the Linux buffer cache.  So those
> md-cache settings really did make a change - taking off another 13
> seconds - once everything was cached.
>
> Flushing/invalidating the Linux memory cache made the next test go
> back to the 25 seconds.  So it seems to me that the md-cache must hold
> its contents in the Linux memory buffers cache ... which surprised me,
> because I thought a user-space system like Gluster would have the
> cache within the daemons or maybe a shared memory segment, nothing
> that would be affected by clearing the Linux buffer cache.  I was
> expecting a run after invalidating the linux cache would take
> something between 4 seconds and 25 seconds, with the md-cache still
> primed but the file data expired.
>
> Just out of curiosity in how the md-cache is implemented ... why does
> clearing the Linux buffers seem to affect it?
>
> 4.  The documentation says that Geo Gluster does 'asynchronous
> replication', which is something that would really help, but that it's
> 'master/slave', so I'm assuming that Geo Gluster won't fulfill my
> requirements of both servers being able to occasionally
> write/modify/delete files?
>
> 5.  In my brick directory I have a '.trashcan' subdirectory - which is
> documented - but also a '.glusterfs' directory, which seems to have
> lots of magical files in some sort of housekeeping structure.
> Surprisingly the total amount of data under .glusterfs is greater than
> the total size of the actual files in my test directory.  I haven't
> seen a description of what .glusterfs is used for ... are they vital
> to the operation of Gluster, or can they be deleted?  Just curious.
> At once stage I had 1.1 GB of files in my volume, which expanded to be
> 1.5GB in the brick (due to the metadata extents) and a whopping 1.6GB
> of extra data materialized under the .glusterfs directory!
>
> 6.  Since I'm using Centos I try to stick with things that are
> available through the Red Hat repository channel ... so in my looking
> for distributed filesystems I saw mention of Ceph.  Because I wanted
> only a simple replicated filesystem it seemed to me that Ceph - being
> based/focused on 'object' storage? - wouldn't be as good a fit as
> Gluster.  Evil question to a Gluster mailing list - will Ceph give me
> any significantly better performance in reading small files?
>
> I've tried to investigate and find out what I can but I could be
> missing something really obvious in my ignorance, so I would
> appreciate any quick tips/answers from the experts.  Thanks!
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170203/c0cacfd5/attachment.html>