[Gluster-users] Quick performance check?
Cedric Lemarchand
yipikai7 at gmail.com
Fri Feb 3 13:02:59 UTC 2017
> On 3 Feb 2017, at 13:48, Gambit15 <dougti+gluster at gmail.com> wrote:
>
> Hi Alex,
> I don't use Gluster for storing large amounts of small files, however from what I've read, that does appear to its big achilles heel.
I am not an expert but I agree, due to its distributed nature, the induced per file access latency plays a big role when you have to deal with lot of small files, but it seems there are some tuning options available, a good place to start could be : https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/Small_File_Performance_Enhancements.html
> Personally, if you're not looking to scale out to a lot more servers, I'd go with Ceph or DRBD. Gluster's best features are in its scalability.
AFAIK Ceph need at least 3 monitors (aka a quorum) to be fully “hight available”, so the entry ticket is pretty high and from my point of view over-kill for such needs, except if you plane to scale out too. DRBD seems a more reasonable approach.
Cheers
> Also, it's worth pointing out that in any setup, you've got to be careful with 2 node configurations as they're highly vulnerable to split-brain scenarios.
>
> Given the relatively small size of your data, caching tweaks & an arbiter may well save you here, however I don't use enough of its caching features to be able to give advice on it.
>
> D
>
> On 3 February 2017 at 08:28, Alex Sudakar <alex.sudakar at gmail.com <mailto:alex.sudakar at gmail.com>> wrote:
> Hi. I'm looking for a clustered filesystem for a very simple
> scenario. I've set up Gluster but my tests have shown quite a
> performance penalty when compared to using a local XFS filesystem.
> This no doubt reflects the reality of moving to a proper distributed
> filesystem, but I'd like to quickly check that I haven't missed
> something obvious that might improve performance.
>
> I plan to have two Amazon AWS EC2 instances (virtual machines) both
> accessing the same filesystem for read/writes. Access will be almost
> entirely reads, with the occasional modification, deletion or creation
> of files. Ideally I wanted all those reads going straight to the
> local XFS filesystem and just the writes incurring a distributed
> performance penalty. :-)
>
> So I've set up two VMs with Centos 7.2 and Gluster 3.8.8, each machine
> running as a combined Gluster server and client. One brick on each
> machine, one volume in a 1 x 2 replica configuration.
>
> Everything works, it's just the performance penalty which is a surprise. :-)
>
> My test directory has 9,066 files and directories; 7,987 actual files.
> Total size is 63MB data, 85MB allocated; an average size of 8KB data
> per file. The brick's files have a total of 117MB allocated, with the
> extra 32MB working out pretty much to be exactly the sum of the extra
> 4KB extents that would have been allocated for the XFS attributes per
> file - the VMs were installed with the default 256 byte inode size for
> the local filesystem, and from what I've read Gluster will force the
> filesystem to allocate an extent for its attributes. 'xfs_bmap' on a
> few files shows this is the case.
>
> A simple 'cat' of every file when laid out in 'native' directories on
> the XFS filesystem takes about 3 seconds. A cat of all the files in
> the brick's directory on the same filesystem takes about 6.4 seconds,
> which I figure is due to the extra I/O for the inode metadata extents
> (although not quite certain; the additional extents added about 40%
> extra to the disk block allocation, so I'm unsure as to why the time
> increase was 100%).
>
> Doing the same test through the glusterfs mount takes about 25
> seconds; roughly four times longer than reading those same files
> directly from the brick itself.
>
> It took 30 seconds until I applied the 'md-cache' settings (for those
> variables that still exist in 3.8.8) mentioned in this very helpful
> article:
>
> http://blog.gluster.org/category/performance/ <http://blog.gluster.org/category/performance/>
>
> So use of the md-cache in a 'cold run' shaved off 5 seconds - due to
> common directory LOOKUP operations being cached I guess.
>
> Output of a 'volume info' is as follows:
>
> Volume Name: g1
> Type: Replicate
> Volume ID: bac6cd70-ca0d-4173-9122-644051444fe5
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: serverA:/data/brick1
> Brick2: serverC:/data/brick1
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> nfs.disable: on
> cluster.self-heal-daemon: enable
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> performance.stat-prefetch: on
> performance.md-cache-timeout: 60
> network.inode-lru-limit: 90000
>
> The article suggests a value of 600 for
> features.cache-invalidation-timeout but my Gluster version only
> permits a maximum value of 60.
>
> Network speed between the two VMs is about 120 MBytes/sec - the two
> VMs inhabit the same Amazon Virtual Private Cloud - so I don't think
> bandwidth is a factor.
>
> The 400% slowdown is no doubt the penalty incurred in moving to a
> proper distributed filesystem. That article and other web pages I've
> read all say that each open of a file results in synchronous LOOKUP
> operations on all the replicas, so I'm guessing it just takes that
> much time for everything to happen before a file can be opened.
> Gluster profiling shows that there are 11,198 LOOKUP operations on the
> test cat of the 7,987 files.
>
> As a Gluster newbie I'd appreciate some quick advice if possible -
>
> 1. Is this sort of performance hit - on directories of small files -
> typical for such a simple Gluster configuration?
>
> 2. Is there anything I can do to speed things up? :-)
>
> 3. Repeating the 'cat' test immediately after the first test run saw
> the time dive from 25 seconds down to 4 seconds. Before I'd set those
> md-cache variables it had taken 17 seconds, due, I assume, to the
> actual file data being cached in the Linux buffer cache. So those
> md-cache settings really did make a change - taking off another 13
> seconds - once everything was cached.
>
> Flushing/invalidating the Linux memory cache made the next test go
> back to the 25 seconds. So it seems to me that the md-cache must hold
> its contents in the Linux memory buffers cache ... which surprised me,
> because I thought a user-space system like Gluster would have the
> cache within the daemons or maybe a shared memory segment, nothing
> that would be affected by clearing the Linux buffer cache. I was
> expecting a run after invalidating the linux cache would take
> something between 4 seconds and 25 seconds, with the md-cache still
> primed but the file data expired.
>
> Just out of curiosity in how the md-cache is implemented ... why does
> clearing the Linux buffers seem to affect it?
>
> 4. The documentation says that Geo Gluster does 'asynchronous
> replication', which is something that would really help, but that it's
> 'master/slave', so I'm assuming that Geo Gluster won't fulfill my
> requirements of both servers being able to occasionally
> write/modify/delete files?
>
> 5. In my brick directory I have a '.trashcan' subdirectory - which is
> documented - but also a '.glusterfs' directory, which seems to have
> lots of magical files in some sort of housekeeping structure.
> Surprisingly the total amount of data under .glusterfs is greater than
> the total size of the actual files in my test directory. I haven't
> seen a description of what .glusterfs is used for ... are they vital
> to the operation of Gluster, or can they be deleted? Just curious.
> At once stage I had 1.1 GB of files in my volume, which expanded to be
> 1.5GB in the brick (due to the metadata extents) and a whopping 1.6GB
> of extra data materialized under the .glusterfs directory!
>
> 6. Since I'm using Centos I try to stick with things that are
> available through the Red Hat repository channel ... so in my looking
> for distributed filesystems I saw mention of Ceph. Because I wanted
> only a simple replicated filesystem it seemed to me that Ceph - being
> based/focused on 'object' storage? - wouldn't be as good a fit as
> Gluster. Evil question to a Gluster mailing list - will Ceph give me
> any significantly better performance in reading small files?
>
> I've tried to investigate and find out what I can but I could be
> missing something really obvious in my ignorance, so I would
> appreciate any quick tips/answers from the experts. Thanks!
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170203/93280db4/attachment.html>
More information about the Gluster-users
mailing list