<div dir="ltr">Hi Rik,<div><br></div><div>Nice clarity and detail in the description. Thanks!</div><div><br></div><div>inline...<br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 7, 2018 at 8:29 PM, Rik Theys <span dir="ltr">&lt;<a href="mailto:Rik.Theys@esat.kuleuven.be" target="_blank">Rik.Theys@esat.kuleuven.be</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>

<br>

We are looking into replacing our current storage solution and are<br>

evaluating gluster for this purpose. Our current solution uses a SAN<br>

with two servers attached that serve samba and NFS 4. Clients connect to<br>

those servers using NFS or SMB. All users&#39; home directories live on this<br>

server.<br>

<br>

I would like to have some insight in who else is using gluster for home<br>

directories for about 500 users and what performance they get out of the<br>

solution. Which connectivity method are you using on the clients<br>

(gluster native, nfs, smb)? Which volume options do you have configured<br>

for your gluster volume? What hardware are you using? Are you using<br>

snapshots and/or quota? If so, any number on performance impact?<br>

<br>

The solution I had in mind for our setup is multiple servers/bricks with<br>

replica 3 arbiter 1 volume where each server is also running nfs-ganesha<br>

and samba in HA. Clients would be connecting to one of the nfs servers<br>

(dns round robin). In this case the nfs servers would be the gluster<br>

clients. Gluster traffic would go over a dedicated network with 10G and<br>

jumbo frames.<br>

<br>

I&#39;m currently testing gluster (3.12, now 3.13) on older machines[1] and<br>

have created a replica 3 arbiter 1 volume 2x(2+1). I seem to run in all<br>

sorts of (performance) problems. I must be doing something wrong but<br>

I&#39;ve tried all sorts of benchmarks and nothing seems to make my setup<br>

live up to what I would expect from this hardware.<br>

<br>

* I understand that gluster only starts to work well when multiple<br>

clients are connecting in parallel, but I did expect the single client<br>

performance to be better.<br>

<br>

* Unpacking the linux-4.15.7.tar.xz file on the brick XFS filesystem<br>

followed by a sync takes about 1 minute. Doing the same on the gluster<br>

volume using the fuse client (client is one of the brick servers) takes<br>

over 9 minutes and neither disk nor cpu nor network are reaching their<br>

bottleneck. Doing the same over NFS-ganesha (client is a workstation<br>

connected through gbit) takes even longer (more than 30min!?).<br>

<br>

I understand that unpacking a lot of small files may be the worst<br>

workload for a distributed filesystem, but when I look at the file sizes<br>

of the files in our users&#39; home directories, more than 90% is smaller<br>

than 1MB.<br>

<br>

* A file copy of a 300GB file over NFS 4 (nfs-ganesha) starts fast<br>

(90MB/s) and then drops to 20MB/s. When I look at the servers during the<br>

copy, I don&#39;t see where the bottleneck is as the cpu, disk and network<br>

are not maxing out (on none of the bricks). When the same client copies<br>

the file to our current NFS storage it is limited by the gbit network<br>

connection of the client.<br></blockquote><div><br></div><div>Both untar and cp are single-threaded, which means throughput is mostly dictated by latency. Latency is generally higher in a distributed FS; nfs-ganesha has an extra hop to the backend, and hence higher latency for most operations compared to glusterfs-fuse.</div><div><br></div><div>You don&#39;t necessarily need multiple clients for good performance with gluster. Many multi-threaded benchmarks give good performance from a single client. Here for e.g., if you run multiple copy commands in parallel from the same client, I&#39;d expect your aggregate transfer rate to improve.</div><div><br></div><div>Been a long while since I looked at nfs-ganesha. But in terms of upper bounds for throughput tests: data needs to flow over the client-&gt;nfs-server link, and then, depending on which servers the file is located on, either 1x (if the nfs-ganesha node is also hosting one copy of the file, and neglecting arbiter) or 2x over the s2s link. With 1Gbps links, that means an upper bound between 125 MB/s and 62.5 MB/s, in the steady state, unless I miscalculated.</div><div><br></div><div>-- Manoj</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

* I had the &#39;cluster.optimize-lookup&#39; option enabled but ran into all<br>

sorts of issues where ls is showing either the wrong files (content of a<br>

different directory), or claiming a directory does not exist when mkdir<br>

says it already exists... I current have the following options set:<br>

<br>

server.outstanding-rpc-limit: 256<br>

client.event-threads: 4<br>

performance.io-thread-count: 16<br>

performance.parallel-readdir: on<br>

server.event-threads: 4<br>

performance.cache-size: 2GB<br>

performance.rda-cache-limit: 128MB<br>

performance.write-behind-<wbr>window-size: 8MB<br>

performance.md-cache-timeout: 600<br>

performance.cache-<wbr>invalidation: on<br>

performance.stat-prefetch: on<br>

network.inode-lru-limit: 500000<br>

performance.nl-cache-timeout: 600<br>

performance.nl-cache: on<br>

features.cache-invalidation-<wbr>timeout: 600<br>

features.cache-invalidation: on<br>

transport.address-family: inet<br>

nfs.disable: on<br>

cluster.enable-shared-storage: enable<br>

<br>

The brick servers have 2 dual-core cpu&#39;s so I&#39;ve set the client and<br>

server event threads to 4.<br>

<br>

* When using nfs-ganesha I run into bugs that makes me wonder who is<br>

using nfs-ganesha with gluster and why are they not hitting these bugs:<br>

<br>

<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1543996" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/<wbr>show_bug.cgi?id=1543996</a><br>

<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1405147" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/<wbr>show_bug.cgi?id=1405147</a><br>

<br>

* nfs-ganesha does not have the &#39;async&#39; option that kernel nfs has. I<br>

can understand why they don&#39;t want to implement this feature, but do<br>

wonder how others are increasing their nfs-ganesha performance. I&#39;ve put<br>

some SSD&#39;s in each brick and have them configured as lvmcache to the<br>

bricks. This setup only increases throughput once the data is on the ssd<br>

and not for just-written data.<br>

<br>

Regards,<br>

<br>

Rik<br>

<br>

[1] 4 servers with 2 1Gbit nics (one for the client traffic, one for s2s<br>

traffic with jumbo frames enabled). Each server has two disks (bricks).<br>

<br>

[2] ioping from the nfs client shows the following latencies:<br>

min/avg/max/mdev = 695.2 us / 2.17 ms / 7.05 ms / 1.92 ms<br>

<br>

ping rtt from client to nfs-ganesha server:<br>

rtt min/avg/max/mdev = 0.106/1.551/6.195/2.098 ms<br>

<br>

ioping on the volume fuse mounted from a brick:<br>

min/avg/max/mdev = 557.0 us / 824.4 us / 2.68 ms / 421.9 us<br>

<br>

ioping on the brick xfs filesystem:<br>

min/avg/max/mdev = 275.2 us / 515.2 us / 12.4 ms / 1.21 ms<br>

<br>

Are these normal numbers?<br>

<br>

<br>

______________________________<wbr>_________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br>

</blockquote></div><br></div></div></div>