[Gluster-users] Quick performance check?

Fri Feb 3 13:02:59 UTC 2017

> On 3 Feb 2017, at 13:48, Gambit15 <dougti+gluster at gmail.com> wrote:
> 
> Hi Alex,
>  I don't use Gluster for storing large amounts of small files, however from what I've read, that does appear to its big achilles heel.

I am not an expert but I agree, due to its distributed nature, the induced per file access latency plays a big role when you have to deal with lot of small files, but it seems there are some tuning options available, a good place to start could be : https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/Small_File_Performance_Enhancements.html

> Personally, if you're not looking to scale out to a lot more servers, I'd go with Ceph or DRBD. Gluster's best features are in its scalability.

AFAIK Ceph need at least 3 monitors (aka a quorum) to be fully “hight available”, so the entry ticket is pretty high and from my point of view over-kill for such needs, except if you plane to scale out too. DRBD seems a more reasonable approach.

Cheers 

> Also, it's worth pointing out that in any setup, you've got to be careful with 2 node configurations as they're highly vulnerable to split-brain scenarios.
> 
> Given the relatively small size of your data, caching tweaks & an arbiter may well save you here, however I don't use enough of its caching features to be able to give advice on it.
> 
> D
> 
> On 3 February 2017 at 08:28, Alex Sudakar <alex.sudakar at gmail.com <mailto:alex.sudakar at gmail.com>> wrote:
> Hi.  I'm looking for a clustered filesystem for a very simple
> scenario.  I've set up Gluster but my tests have shown quite a
> performance penalty when compared to using a local XFS filesystem.
> This no doubt reflects the reality of moving to a proper distributed
> filesystem, but I'd like to quickly check that I haven't missed
> something obvious that might improve performance.
> 
> I plan to have two Amazon AWS EC2 instances (virtual machines) both
> accessing the same filesystem for read/writes.  Access will be almost
> entirely reads, with the occasional modification, deletion or creation
> of files.  Ideally I wanted all those reads going straight to the
> local XFS filesystem and just the writes incurring a distributed
> performance penalty.  :-)
> 
> So I've set up two VMs with Centos 7.2 and Gluster 3.8.8, each machine
> running as a combined Gluster server and client.  One brick on each
> machine, one volume in a 1 x 2 replica configuration.
> 
> Everything works, it's just the performance penalty which is a surprise.  :-)
> 
> My test directory has 9,066 files and directories; 7,987 actual files.
> Total size is 63MB data, 85MB allocated; an average size of 8KB data
> per file.  The brick's files have a total of 117MB allocated, with the
> extra 32MB working out pretty much to be exactly the sum of the extra
> 4KB extents that would have been allocated for the XFS attributes per
> file - the VMs were installed with the default 256 byte inode size for
> the local filesystem, and from what I've read Gluster will force the
> filesystem to allocate an extent for its attributes.  'xfs_bmap' on a
> few files shows this is the case.
> 
> A simple 'cat' of every file when laid out in 'native' directories on
> the XFS filesystem takes about 3 seconds.  A cat of all the files in
> the brick's directory on the same filesystem takes about 6.4 seconds,
> which I figure is due to the extra I/O for the inode metadata extents
> (although not quite certain; the additional extents added about 40%
> extra to the disk block allocation, so I'm unsure as to why the time
> increase was 100%).
> 
> Doing the same test through the glusterfs mount takes about 25
> seconds; roughly four times longer than reading those same files
> directly from the brick itself.
> 
> It took 30 seconds until I applied the 'md-cache' settings (for those
> variables that still exist in 3.8.8) mentioned in this very helpful
> article:
> 
>   http://blog.gluster.org/category/performance/ <http://blog.gluster.org/category/performance/>
> 
> So use of the md-cache in a 'cold run' shaved off 5 seconds - due to
> common directory LOOKUP operations being cached I guess.
> 
> Output of a 'volume info' is as follows:
> 
> Volume Name: g1
> Type: Replicate
> Volume ID: bac6cd70-ca0d-4173-9122-644051444fe5
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: serverA:/data/brick1
> Brick2: serverC:/data/brick1
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> nfs.disable: on
> cluster.self-heal-daemon: enable
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> performance.stat-prefetch: on
> performance.md-cache-timeout: 60
> network.inode-lru-limit: 90000
> 
> The article suggests a value of 600 for
> features.cache-invalidation-timeout but my Gluster version only
> permits a maximum value of 60.
> 
> Network speed between the two VMs is about 120 MBytes/sec - the two
> VMs inhabit the same Amazon Virtual Private Cloud - so I don't think
> bandwidth is a factor.
> 
> The 400% slowdown is no doubt the penalty incurred in moving to a
> proper distributed filesystem.  That article and other web pages I've
> read all say that each open of a file results in synchronous LOOKUP
> operations on all the replicas, so I'm guessing it just takes that
> much time for everything to happen before a file can be opened.
> Gluster profiling shows that there are 11,198 LOOKUP operations on the
> test cat of the 7,987 files.
> 
> As a Gluster newbie I'd appreciate some quick advice if possible -
> 
> 1.  Is this sort of performance hit - on directories of small files -
> typical for such a simple Gluster configuration?
> 
> 2.  Is there anything I can do to speed things up?  :-)
> 
> 3.  Repeating the 'cat' test immediately after the first test run saw
> the time dive from 25 seconds down to 4 seconds.  Before I'd set those
> md-cache variables it had taken 17 seconds, due, I assume, to the
> actual file data being cached in the Linux buffer cache.  So those
> md-cache settings really did make a change - taking off another 13
> seconds - once everything was cached.
> 
> Flushing/invalidating the Linux memory cache made the next test go
> back to the 25 seconds.  So it seems to me that the md-cache must hold
> its contents in the Linux memory buffers cache ... which surprised me,
> because I thought a user-space system like Gluster would have the
> cache within the daemons or maybe a shared memory segment, nothing
> that would be affected by clearing the Linux buffer cache.  I was
> expecting a run after invalidating the linux cache would take
> something between 4 seconds and 25 seconds, with the md-cache still
> primed but the file data expired.
> 
> Just out of curiosity in how the md-cache is implemented ... why does
> clearing the Linux buffers seem to affect it?
> 
> 4.  The documentation says that Geo Gluster does 'asynchronous
> replication', which is something that would really help, but that it's
> 'master/slave', so I'm assuming that Geo Gluster won't fulfill my
> requirements of both servers being able to occasionally
> write/modify/delete files?
> 
> 5.  In my brick directory I have a '.trashcan' subdirectory - which is
> documented - but also a '.glusterfs' directory, which seems to have
> lots of magical files in some sort of housekeeping structure.
> Surprisingly the total amount of data under .glusterfs is greater than
> the total size of the actual files in my test directory.  I haven't
> seen a description of what .glusterfs is used for ... are they vital
> to the operation of Gluster, or can they be deleted?  Just curious.
> At once stage I had 1.1 GB of files in my volume, which expanded to be
> 1.5GB in the brick (due to the metadata extents) and a whopping 1.6GB
> of extra data materialized under the .glusterfs directory!
> 
> 6.  Since I'm using Centos I try to stick with things that are
> available through the Red Hat repository channel ... so in my looking
> for distributed filesystems I saw mention of Ceph.  Because I wanted
> only a simple replicated filesystem it seemed to me that Ceph - being
> based/focused on 'object' storage? - wouldn't be as good a fit as
> Gluster.  Evil question to a Gluster mailing list - will Ceph give me
> any significantly better performance in reading small files?
> 
> I've tried to investigate and find out what I can but I could be
> missing something really obvious in my ignorance, so I would
> appreciate any quick tips/answers from the experts.  Thanks!
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users>
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170203/93280db4/attachment.html>