[Gluster-users] gluster for home directories?

Thu Mar 8 09:52:26 UTC 2018

Hi Rik,

Nice clarity and detail in the description. Thanks!

inline...

On Wed, Mar 7, 2018 at 8:29 PM, Rik Theys <Rik.Theys at esat.kuleuven.be>
wrote:

> Hi,
>
> We are looking into replacing our current storage solution and are
> evaluating gluster for this purpose. Our current solution uses a SAN
> with two servers attached that serve samba and NFS 4. Clients connect to
> those servers using NFS or SMB. All users' home directories live on this
> server.
>
> I would like to have some insight in who else is using gluster for home
> directories for about 500 users and what performance they get out of the
> solution. Which connectivity method are you using on the clients
> (gluster native, nfs, smb)? Which volume options do you have configured
> for your gluster volume? What hardware are you using? Are you using
> snapshots and/or quota? If so, any number on performance impact?
>
> The solution I had in mind for our setup is multiple servers/bricks with
> replica 3 arbiter 1 volume where each server is also running nfs-ganesha
> and samba in HA. Clients would be connecting to one of the nfs servers
> (dns round robin). In this case the nfs servers would be the gluster
> clients. Gluster traffic would go over a dedicated network with 10G and
> jumbo frames.
>
> I'm currently testing gluster (3.12, now 3.13) on older machines[1] and
> have created a replica 3 arbiter 1 volume 2x(2+1). I seem to run in all
> sorts of (performance) problems. I must be doing something wrong but
> I've tried all sorts of benchmarks and nothing seems to make my setup
> live up to what I would expect from this hardware.
>
> * I understand that gluster only starts to work well when multiple
> clients are connecting in parallel, but I did expect the single client
> performance to be better.
>
> * Unpacking the linux-4.15.7.tar.xz file on the brick XFS filesystem
> followed by a sync takes about 1 minute. Doing the same on the gluster
> volume using the fuse client (client is one of the brick servers) takes
> over 9 minutes and neither disk nor cpu nor network are reaching their
> bottleneck. Doing the same over NFS-ganesha (client is a workstation
> connected through gbit) takes even longer (more than 30min!?).
>
> I understand that unpacking a lot of small files may be the worst
> workload for a distributed filesystem, but when I look at the file sizes
> of the files in our users' home directories, more than 90% is smaller
> than 1MB.
>
> * A file copy of a 300GB file over NFS 4 (nfs-ganesha) starts fast
> (90MB/s) and then drops to 20MB/s. When I look at the servers during the
> copy, I don't see where the bottleneck is as the cpu, disk and network
> are not maxing out (on none of the bricks). When the same client copies
> the file to our current NFS storage it is limited by the gbit network
> connection of the client.
>

Both untar and cp are single-threaded, which means throughput is mostly
dictated by latency. Latency is generally higher in a distributed FS;
nfs-ganesha has an extra hop to the backend, and hence higher latency for
most operations compared to glusterfs-fuse.

You don't necessarily need multiple clients for good performance with
gluster. Many multi-threaded benchmarks give good performance from a single
client. Here for e.g., if you run multiple copy commands in parallel from
the same client, I'd expect your aggregate transfer rate to improve.

Been a long while since I looked at nfs-ganesha. But in terms of upper
bounds for throughput tests: data needs to flow over the client->nfs-server
link, and then, depending on which servers the file is located on, either
1x (if the nfs-ganesha node is also hosting one copy of the file, and
neglecting arbiter) or 2x over the s2s link. With 1Gbps links, that means
an upper bound between 125 MB/s and 62.5 MB/s, in the steady state, unless
I miscalculated.

-- Manoj

>
> * I had the 'cluster.optimize-lookup' option enabled but ran into all
> sorts of issues where ls is showing either the wrong files (content of a
> different directory), or claiming a directory does not exist when mkdir
> says it already exists... I current have the following options set:
>
> server.outstanding-rpc-limit: 256
> client.event-threads: 4
> performance.io-thread-count: 16
> performance.parallel-readdir: on
> server.event-threads: 4
> performance.cache-size: 2GB
> performance.rda-cache-limit: 128MB
> performance.write-behind-window-size: 8MB
> performance.md-cache-timeout: 600
> performance.cache-invalidation: on
> performance.stat-prefetch: on
> network.inode-lru-limit: 500000
> performance.nl-cache-timeout: 600
> performance.nl-cache: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> transport.address-family: inet
> nfs.disable: on
> cluster.enable-shared-storage: enable
>
> The brick servers have 2 dual-core cpu's so I've set the client and
> server event threads to 4.
>
> * When using nfs-ganesha I run into bugs that makes me wonder who is
> using nfs-ganesha with gluster and why are they not hitting these bugs:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1543996
> https://bugzilla.redhat.com/show_bug.cgi?id=1405147
>
> * nfs-ganesha does not have the 'async' option that kernel nfs has. I
> can understand why they don't want to implement this feature, but do
> wonder how others are increasing their nfs-ganesha performance. I've put
> some SSD's in each brick and have them configured as lvmcache to the
> bricks. This setup only increases throughput once the data is on the ssd
> and not for just-written data.
>
> Regards,
>
> Rik
>
> [1] 4 servers with 2 1Gbit nics (one for the client traffic, one for s2s
> traffic with jumbo frames enabled). Each server has two disks (bricks).
>
> [2] ioping from the nfs client shows the following latencies:
> min/avg/max/mdev = 695.2 us / 2.17 ms / 7.05 ms / 1.92 ms
>
> ping rtt from client to nfs-ganesha server:
> rtt min/avg/max/mdev = 0.106/1.551/6.195/2.098 ms
>
> ioping on the volume fuse mounted from a brick:
> min/avg/max/mdev = 557.0 us / 824.4 us / 2.68 ms / 421.9 us
>
> ioping on the brick xfs filesystem:
> min/avg/max/mdev = 275.2 us / 515.2 us / 12.4 ms / 1.21 ms
>
> Are these normal numbers?
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180308/433cf0a3/attachment.html>