<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On 3 Feb 2017, at 13:48, Gambit15 &lt;<a href="mailto:dougti+gluster@gmail.com" class="">dougti+gluster@gmail.com</a>&gt; wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class=""><div class=""><div class=""><div class="">Hi Alex,<br class=""></div>&nbsp;I don't use Gluster for storing large amounts of small files, however from what I've read, that does appear to its big achilles heel.<br class=""></div></div></div></div></div></blockquote><div><br class=""></div><div>I am not an expert but I agree, due to its distributed nature, the induced per file access latency plays a big role when you have to deal with lot of small files, but it seems there are some tuning options available, a good place to start could be : <a href="https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/Small_File_Performance_Enhancements.html" class="">https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/Small_File_Performance_Enhancements.html</a></div><div><br class=""></div><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class="">Personally, if you're not looking to scale out to a lot more servers, I'd go with Ceph or DRBD. Gluster's best features are in its scalability.<br class=""></div></div></div></div></blockquote><div><br class=""></div><div>AFAIK Ceph need at least 3 monitors (aka a quorum) to be fully “hight available”, so the entry ticket is pretty high and from my point of view over-kill for such needs, except if you plane to scale out too. DRBD seems a more reasonable approach.</div><div><br class=""></div><div>Cheers&nbsp;</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="">Also, it's worth pointing out that in any setup, you've got to be careful with 2 node configurations as they're highly vulnerable to split-brain scenarios.<br class=""><br class=""></div><div class="">Given the relatively small size of your data, caching tweaks &amp; an arbiter may well save you here, however I don't use enough of its caching features to be able to give advice on it.<br class=""></div><div class=""><br class=""></div>D<br class=""></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On 3 February 2017 at 08:28, Alex Sudakar <span dir="ltr" class="">&lt;<a href="mailto:alex.sudakar@gmail.com" target="_blank" class="">alex.sudakar@gmail.com</a>&gt;</span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi.&nbsp; I'm looking for a clustered filesystem for a very simple<br class="">

scenario.&nbsp; I've set up Gluster but my tests have shown quite a<br class="">

performance penalty when compared to using a local XFS filesystem.<br class="">

This no doubt reflects the reality of moving to a proper distributed<br class="">

filesystem, but I'd like to quickly check that I haven't missed<br class="">

something obvious that might improve performance.<br class="">

<br class="">

I plan to have two Amazon AWS EC2 instances (virtual machines) both<br class="">

accessing the same filesystem for read/writes.&nbsp; Access will be almost<br class="">

entirely reads, with the occasional modification, deletion or creation<br class="">

of files.&nbsp; Ideally I wanted all those reads going straight to the<br class="">

local XFS filesystem and just the writes incurring a distributed<br class="">

performance penalty.&nbsp; :-)<br class="">

<br class="">

So I've set up two VMs with Centos 7.2 and Gluster 3.8.8, each machine<br class="">

running as a combined Gluster server and client.&nbsp; One brick on each<br class="">

machine, one volume in a 1 x 2 replica configuration.<br class="">

<br class="">

Everything works, it's just the performance penalty which is a surprise.&nbsp; :-)<br class="">

<br class="">

My test directory has 9,066 files and directories; 7,987 actual files.<br class="">

Total size is 63MB data, 85MB allocated; an average size of 8KB data<br class="">

per file.&nbsp; The brick's files have a total of 117MB allocated, with the<br class="">

extra 32MB working out pretty much to be exactly the sum of the extra<br class="">

4KB extents that would have been allocated for the XFS attributes per<br class="">

file - the VMs were installed with the default 256 byte inode size for<br class="">

the local filesystem, and from what I've read Gluster will force the<br class="">

filesystem to allocate an extent for its attributes.&nbsp; 'xfs_bmap' on a<br class="">

few files shows this is the case.<br class="">

<br class="">

A simple 'cat' of every file when laid out in 'native' directories on<br class="">

the XFS filesystem takes about 3 seconds.&nbsp; A cat of all the files in<br class="">

the brick's directory on the same filesystem takes about 6.4 seconds,<br class="">

which I figure is due to the extra I/O for the inode metadata extents<br class="">

(although not quite certain; the additional extents added about 40%<br class="">

extra to the disk block allocation, so I'm unsure as to why the time<br class="">

increase was 100%).<br class="">

<br class="">

Doing the same test through the glusterfs mount takes about 25<br class="">

seconds; roughly four times longer than reading those same files<br class="">

directly from the brick itself.<br class="">

<br class="">

It took 30 seconds until I applied the 'md-cache' settings (for those<br class="">

variables that still exist in 3.8.8) mentioned in this very helpful<br class="">

article:<br class="">

<br class="">

&nbsp; <a href="http://blog.gluster.org/category/performance/" rel="noreferrer" target="_blank" class="">http://blog.gluster.org/<wbr class="">category/performance/</a><br class="">

<br class="">

So use of the md-cache in a 'cold run' shaved off 5 seconds - due to<br class="">

common directory LOOKUP operations being cached I guess.<br class="">

<br class="">

Output of a 'volume info' is as follows:<br class="">

<br class="">

Volume Name: g1<br class="">

Type: Replicate<br class="">

Volume ID: bac6cd70-ca0d-4173-9122-<wbr class="">644051444fe5<br class="">

Status: Started<br class="">

Snapshot Count: 0<br class="">

Number of Bricks: 1 x 2 = 2<br class="">

Transport-type: tcp<br class="">

Bricks:<br class="">

Brick1: serverA:/data/brick1<br class="">

Brick2: serverC:/data/brick1<br class="">

Options Reconfigured:<br class="">

transport.address-family: inet<br class="">

performance.readdir-ahead: on<br class="">

nfs.disable: on<br class="">

cluster.self-heal-daemon: enable<br class="">

features.cache-invalidation: on<br class="">

features.cache-invalidation-<wbr class="">timeout: 600<br class="">

performance.stat-prefetch: on<br class="">

performance.md-cache-timeout: 60<br class="">

network.inode-lru-limit: 90000<br class="">

<br class="">

The article suggests a value of 600 for<br class="">

features.cache-invalidation-<wbr class="">timeout but my Gluster version only<br class="">

permits a maximum value of 60.<br class="">

<br class="">

Network speed between the two VMs is about 120 MBytes/sec - the two<br class="">

VMs inhabit the same Amazon Virtual Private Cloud - so I don't think<br class="">

bandwidth is a factor.<br class="">

<br class="">

The 400% slowdown is no doubt the penalty incurred in moving to a<br class="">

proper distributed filesystem.&nbsp; That article and other web pages I've<br class="">

read all say that each open of a file results in synchronous LOOKUP<br class="">

operations on all the replicas, so I'm guessing it just takes that<br class="">

much time for everything to happen before a file can be opened.<br class="">

Gluster profiling shows that there are 11,198 LOOKUP operations on the<br class="">

test cat of the 7,987 files.<br class="">

<br class="">

As a Gluster newbie I'd appreciate some quick advice if possible -<br class="">

<br class="">

1.&nbsp; Is this sort of performance hit - on directories of small files -<br class="">

typical for such a simple Gluster configuration?<br class="">

<br class="">

2.&nbsp; Is there anything I can do to speed things up?&nbsp; :-)<br class="">

<br class="">

3.&nbsp; Repeating the 'cat' test immediately after the first test run saw<br class="">

the time dive from 25 seconds down to 4 seconds.&nbsp; Before I'd set those<br class="">

md-cache variables it had taken 17 seconds, due, I assume, to the<br class="">

actual file data being cached in the Linux buffer cache.&nbsp; So those<br class="">

md-cache settings really did make a change - taking off another 13<br class="">

seconds - once everything was cached.<br class="">

<br class="">

Flushing/invalidating the Linux memory cache made the next test go<br class="">

back to the 25 seconds.&nbsp; So it seems to me that the md-cache must hold<br class="">

its contents in the Linux memory buffers cache ... which surprised me,<br class="">

because I thought a user-space system like Gluster would have the<br class="">

cache within the daemons or maybe a shared memory segment, nothing<br class="">

that would be affected by clearing the Linux buffer cache.&nbsp; I was<br class="">

expecting a run after invalidating the linux cache would take<br class="">

something between 4 seconds and 25 seconds, with the md-cache still<br class="">

primed but the file data expired.<br class="">

<br class="">

Just out of curiosity in how the md-cache is implemented ... why does<br class="">

clearing the Linux buffers seem to affect it?<br class="">

<br class="">

4.&nbsp; The documentation says that Geo Gluster does 'asynchronous<br class="">

replication', which is something that would really help, but that it's<br class="">

'master/slave', so I'm assuming that Geo Gluster won't fulfill my<br class="">

requirements of both servers being able to occasionally<br class="">

write/modify/delete files?<br class="">

<br class="">

5.&nbsp; In my brick directory I have a '.trashcan' subdirectory - which is<br class="">

documented - but also a '.glusterfs' directory, which seems to have<br class="">

lots of magical files in some sort of housekeeping structure.<br class="">

Surprisingly the total amount of data under .glusterfs is greater than<br class="">

the total size of the actual files in my test directory.&nbsp; I haven't<br class="">

seen a description of what .glusterfs is used for ... are they vital<br class="">

to the operation of Gluster, or can they be deleted?&nbsp; Just curious.<br class="">

At once stage I had 1.1 GB of files in my volume, which expanded to be<br class="">

1.5GB in the brick (due to the metadata extents) and a whopping 1.6GB<br class="">

of extra data materialized under the .glusterfs directory!<br class="">

<br class="">

6.&nbsp; Since I'm using Centos I try to stick with things that are<br class="">

available through the Red Hat repository channel ... so in my looking<br class="">

for distributed filesystems I saw mention of Ceph.&nbsp; Because I wanted<br class="">

only a simple replicated filesystem it seemed to me that Ceph - being<br class="">

based/focused on 'object' storage? - wouldn't be as good a fit as<br class="">

Gluster.&nbsp; Evil question to a Gluster mailing list - will Ceph give me<br class="">

any significantly better performance in reading small files?<br class="">

<br class="">

I've tried to investigate and find out what I can but I could be<br class="">

missing something really obvious in my ignorance, so I would<br class="">

appreciate any quick tips/answers from the experts.&nbsp; Thanks!<br class="">

______________________________<wbr class="">_________________<br class="">

Gluster-users mailing list<br class="">

<a href="mailto:Gluster-users@gluster.org" class="">Gluster-users@gluster.org</a><br class="">

<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank" class="">http://lists.gluster.org/<wbr class="">mailman/listinfo/gluster-users</a><br class="">

</blockquote></div><br class=""></div>

_______________________________________________<br class="">Gluster-users mailing list<br class=""><a href="mailto:Gluster-users@gluster.org" class="">Gluster-users@gluster.org</a><br class="">http://lists.gluster.org/mailman/listinfo/gluster-users</div></blockquote></div><br class=""></body></html>