<div dir="ltr">Hi Rik,<div><br></div><div>Nice clarity and detail in the description. Thanks!</div><div><br></div><div>inline...<br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 7, 2018 at 8:29 PM, Rik Theys <span dir="ltr"><<a href="mailto:Rik.Theys@esat.kuleuven.be" target="_blank">Rik.Theys@esat.kuleuven.be</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
We are looking into replacing our current storage solution and are<br>
evaluating gluster for this purpose. Our current solution uses a SAN<br>
with two servers attached that serve samba and NFS 4. Clients connect to<br>
those servers using NFS or SMB. All users' home directories live on this<br>
server.<br>
<br>
I would like to have some insight in who else is using gluster for home<br>
directories for about 500 users and what performance they get out of the<br>
solution. Which connectivity method are you using on the clients<br>
(gluster native, nfs, smb)? Which volume options do you have configured<br>
for your gluster volume? What hardware are you using? Are you using<br>
snapshots and/or quota? If so, any number on performance impact?<br>
<br>
The solution I had in mind for our setup is multiple servers/bricks with<br>
replica 3 arbiter 1 volume where each server is also running nfs-ganesha<br>
and samba in HA. Clients would be connecting to one of the nfs servers<br>
(dns round robin). In this case the nfs servers would be the gluster<br>
clients. Gluster traffic would go over a dedicated network with 10G and<br>
jumbo frames.<br>
<br>
I'm currently testing gluster (3.12, now 3.13) on older machines[1] and<br>
have created a replica 3 arbiter 1 volume 2x(2+1). I seem to run in all<br>
sorts of (performance) problems. I must be doing something wrong but<br>
I've tried all sorts of benchmarks and nothing seems to make my setup<br>
live up to what I would expect from this hardware.<br>
<br>
* I understand that gluster only starts to work well when multiple<br>
clients are connecting in parallel, but I did expect the single client<br>
performance to be better.<br>
<br>
* Unpacking the linux-4.15.7.tar.xz file on the brick XFS filesystem<br>
followed by a sync takes about 1 minute. Doing the same on the gluster<br>
volume using the fuse client (client is one of the brick servers) takes<br>
over 9 minutes and neither disk nor cpu nor network are reaching their<br>
bottleneck. Doing the same over NFS-ganesha (client is a workstation<br>
connected through gbit) takes even longer (more than 30min!?).<br>
<br>
I understand that unpacking a lot of small files may be the worst<br>
workload for a distributed filesystem, but when I look at the file sizes<br>
of the files in our users' home directories, more than 90% is smaller<br>
than 1MB.<br>
<br>
* A file copy of a 300GB file over NFS 4 (nfs-ganesha) starts fast<br>
(90MB/s) and then drops to 20MB/s. When I look at the servers during the<br>
copy, I don't see where the bottleneck is as the cpu, disk and network<br>
are not maxing out (on none of the bricks). When the same client copies<br>
the file to our current NFS storage it is limited by the gbit network<br>
connection of the client.<br></blockquote><div><br></div><div>Both untar and cp are single-threaded, which means throughput is mostly dictated by latency. Latency is generally higher in a distributed FS; nfs-ganesha has an extra hop to the backend, and hence higher latency for most operations compared to glusterfs-fuse.</div><div><br></div><div>You don't necessarily need multiple clients for good performance with gluster. Many multi-threaded benchmarks give good performance from a single client. Here for e.g., if you run multiple copy commands in parallel from the same client, I'd expect your aggregate transfer rate to improve.</div><div><br></div><div>Been a long while since I looked at nfs-ganesha. But in terms of upper bounds for throughput tests: data needs to flow over the client->nfs-server link, and then, depending on which servers the file is located on, either 1x (if the nfs-ganesha node is also hosting one copy of the file, and neglecting arbiter) or 2x over the s2s link. With 1Gbps links, that means an upper bound between 125 MB/s and 62.5 MB/s, in the steady state, unless I miscalculated.</div><div><br></div><div>-- Manoj</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
* I had the 'cluster.optimize-lookup' option enabled but ran into all<br>
sorts of issues where ls is showing either the wrong files (content of a<br>
different directory), or claiming a directory does not exist when mkdir<br>
says it already exists... I current have the following options set:<br>
<br>
server.outstanding-rpc-limit: 256<br>
client.event-threads: 4<br>
performance.io-thread-count: 16<br>
performance.parallel-readdir: on<br>
server.event-threads: 4<br>
performance.cache-size: 2GB<br>
performance.rda-cache-limit: 128MB<br>
performance.write-behind-<wbr>window-size: 8MB<br>
performance.md-cache-timeout: 600<br>
performance.cache-<wbr>invalidation: on<br>
performance.stat-prefetch: on<br>
network.inode-lru-limit: 500000<br>
performance.nl-cache-timeout: 600<br>
performance.nl-cache: on<br>
features.cache-invalidation-<wbr>timeout: 600<br>
features.cache-invalidation: on<br>
transport.address-family: inet<br>
nfs.disable: on<br>
cluster.enable-shared-storage: enable<br>
<br>
The brick servers have 2 dual-core cpu's so I've set the client and<br>
server event threads to 4.<br>
<br>
* When using nfs-ganesha I run into bugs that makes me wonder who is<br>
using nfs-ganesha with gluster and why are they not hitting these bugs:<br>
<br>
<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1543996" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/<wbr>show_bug.cgi?id=1543996</a><br>
<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1405147" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/<wbr>show_bug.cgi?id=1405147</a><br>
<br>
* nfs-ganesha does not have the 'async' option that kernel nfs has. I<br>
can understand why they don't want to implement this feature, but do<br>
wonder how others are increasing their nfs-ganesha performance. I've put<br>
some SSD's in each brick and have them configured as lvmcache to the<br>
bricks. This setup only increases throughput once the data is on the ssd<br>
and not for just-written data.<br>
<br>
Regards,<br>
<br>
Rik<br>
<br>
[1] 4 servers with 2 1Gbit nics (one for the client traffic, one for s2s<br>
traffic with jumbo frames enabled). Each server has two disks (bricks).<br>
<br>
[2] ioping from the nfs client shows the following latencies:<br>
min/avg/max/mdev = 695.2 us / 2.17 ms / 7.05 ms / 1.92 ms<br>
<br>
ping rtt from client to nfs-ganesha server:<br>
rtt min/avg/max/mdev = 0.106/1.551/6.195/2.098 ms<br>
<br>
ioping on the volume fuse mounted from a brick:<br>
min/avg/max/mdev = 557.0 us / 824.4 us / 2.68 ms / 421.9 us<br>
<br>
ioping on the brick xfs filesystem:<br>
min/avg/max/mdev = 275.2 us / 515.2 us / 12.4 ms / 1.21 ms<br>
<br>
Are these normal numbers?<br>
<br>
<br>
______________________________<wbr>_________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br>
</blockquote></div><br></div></div></div>