[Gluster-users] Couple of questions

Amar Tumballi atumball at redhat.com
Mon May 22 07:07:06 UTC 2017


Hi Chris,

Some answers inline.

On Sun, May 21, 2017 at 2:17 AM, Chris Knipe <savage at savage.za.org> wrote:

> Hi All,
>
>
>
> I have a couple of questions which I hope that there’s someone to just
> shed some light on for me please.  I don’t think it’s anything serious,
> other than really just trying to better understand how the underlying
> GlusterFS works.
>
>
>
> Firstly, I plan to build a GluserFS using SM 5018D8-* boxes.  Essentially,
> 12 x 10TB disks, 128GB Ram, and a Xeon D-1537 CPU.  It does have an
> on-board LSI RAID controller, but there’s not a lot of detail forthcoming
> in terms of RAID configurations (caches for example).
>
>
>
> Firstly, in terms of the nodes, I don’t care TOO much for data integrity
> (i.e. it is OK to lose SOME of data, but availability in terms of
> underlying hardware is more important).  Secondly, it may not be the
> perfect scenario for GlusterFS (although it works perfectly fine currently
> through NFS on standard servers), but we are talking about millions of >
> 500K < 1M files.  Files are stored in a very specific structure, so each
> file is read/written precisely to a unique directory.  There’s no expensive
> scanning of directories (i.e. ls) happening or anything like that.  It’s a
> simple and very static read/write operation for each file on the system.
>
>
>
> Currently we store articles using a MD5 hash algorithm for a file name,
> and use 5 directory levels, so, /a/a0/a02/a02b/a02ba/
> a02ba1234567813dfa23bd2348901d33 Again everything works fine using
> multiple servers and standard ext4 / nfs exports.  We host /a on one
> server, /b on another server, etc.  So whilst the directories (and IO load)
> is split to address load issues, we are a bit limited in terms of how and
> how much we can expand.  I’m hoping to move all of this to GlusterFS.  The
> applications are very random IO intensive, and whilst we are nowhere CLOSE
> to the capabilities of the hardware, it is actually the seek times that are
> our limiting factor and the biggest bottleneck.  Therefore, I am fairly
> certain that growing through NFS, or, GlusterFS should be suitable and
> workable for us.
>
>
>
> My main reason for wanting to go GlusterFS is mostly related to better and
> easier expansion of storage.  It seems that it is easier to manage, whilst
> also providing some degree of redundancy (even if only partially in the
> case of a Distributed volume, which I believe would be adequate for us).
> All drives are hot swappable, and we will more than likely either look at a
> Distributed, or Stripped volume.  In the case of a Distributed system, we
> can still live with the fact that the majority of files remain available,
> whilst a certain amount of files becomes unavailable should a node or brick
> fail, so Distributed will more than likely be adequate for our needs.
> Stripped would be nice to have, but I think it would have some complexities
> given our specific use case.  We are also talking high concurrently (we do
> about 6K read/writes per second over NFS currently, per NFS server)
>
>
>
> 1 On the client(s), mounting the GlusterFS the documentation is clear in
> that it will only fetch the GlusterFS configuration, whilst there after
> reading/writing directly to the GlusterFS nodes.  How non-stop is this?  If
> there is already a mount made and additional nodes are added / removed from
> the GlusterFS, does the client(s) get informed of this without the need to
> re-mount the file system?  What about the capacity on the mount (at the
> client) when a node is added?  Basically, how non-stop is this operation?
> Can I assume that (in a perfect world) the client would never need to
> re-mount the file system? Are there any operations in GlusterFS that would
> require a client to re-mount?
>

No need to re-mount. GlusterFS fetches the volume config changes from the
server from which it is mounted, and gives the scaled out storage layout to
its clients / applications.


>
>
> 2 Given the Distributed nature of GlusterFS and how files are written to
> nodes, would it be safe to assume that how more nodes there are in the
> GlusterFS, how better the IO performance would be?  Surely, the IO load is
> distributed between the nodes, together with the individual files, right?
> What kind of IO could (or should) reasonably be expected given the hardware
> mentioned above (yes, I know this is a how long is a piece of string
> question)?
>

Here, the performance improvements can be seen when there are more clients
using the volume while you add the server. Most of the cases when you keep
the number of clients same, client n/w would become bottleneck to not see
any performance impact. But if you increase the client too along with
server, in general we see linear improvement in performance for file I/O.


>
>
> 3 When bricks are added / removed / expanded / rebalanced / etc… What does
> GlusterFS actually do?  What would happen for instance, if I have a 250TB
> volume, with 10M files on it, and I add another node with ~50TB?  What is
> the impact on performance whilst these expensive operations are run?
> Again, how non-stop is this in terms of the clients reading/writing a few
> thousand files per second?   If running a **purely** Distributed volume,
> would a rebalance still be required when adding a new node?  What impact
> does add/remove/rebalance have on large GlusterFS systems?  Especially a
> rebalance, I would expect the operation to become more and more expensive
> as more and more bricks are added?  Given the large amount of files I
> intend to have on GlusterFS, I am concerned about directory scans (for
> example) happening internally in GlusterFS…
>
>
>

This is where *lot* of planning would be required. Gluster provides easy
CLI options to manage your scale out and shrink operation on volume, but
while these operations are taking place, we are seeing lot of complaints
about *performance* from the users.

Hence if you are starting fresh, our recommendation for you is to start
with more number of bricks than nodes, (say if you have 16 nodes, start
with 48 or 64 bricks). This way, when you add nodes, you can do just
'add-brick' and then 'remove-brick' to migrate subset of data, which will
reduce the number of extra migrations, and will work optimally for you.
Again, all these operations can work while the volume is online, so clients
won't see any downtime. Ofcourse there would be some hit in performance, as
server nodes would be busy rebalancing the data in-between them.



> Hopefully I wasn’t too vague in my questions, but let’s see of some
> questions at least could be dealt with :-)
>
>
I guess I got the questions, and hence answered in my limited knowledge.

-Amar


>
>
> Thanks,
>
> Chris.
>
>
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170522/b3c7d5b9/attachment.html>


More information about the Gluster-users mailing list