[Gluster-users] Very poor GlusterFS Volume performance (glusterfs 8.2)

Thu Nov 12 19:13:39 UTC 2020

Hello,

thanks for your answer! Yes, I am using sharding. That the 1 Gigabit NIC
is the limit was my second thought as well, because when I run this
"performance" test on all three nodes at the same time, I get roughly
around ~80MB/sec, which is close(r) to the theoretical maximum
throughput of the NICs.

I was hoping that there is a way to more "loosen" the strict synchronous
approach of writing files to the other nodes to gain write performance
(and loose some security) so that files are being written faster
locally. But it seems like there is no way around this (other than the
way you mentioned w/ the new Thin Arbiter) and if the files would be
written asynchronously that would create other problems (like that the
destination must "poll" until a file appears).

So, the way to go for me would be to switch to 10GBit/sec NICs (maybe
even with Bonding) to get a better write performance that is closer to
the speed of the NVMe disks. In the mean-time, I have changed the
software architecture a bit and let the application that does the
writes, write directly to the local NVMe disk and then asynchronously
move the file(s) from the "buffer" to the gluster volume. This helps to
prevent that the application process is blocked for a too long time.

Thanks for your help and explanation! Marc

> On 11/9/2020 12:59 PM, Marc Jakobs wrote:
>> I have a GlusterFS Volume on three Linux Servers (Ubuntu 20.04LTS) which
>> are connected via 1GBit/sec NIC with each other over a dedicated switch.
>>
>> Every server has a NVMe disk which is used for the GlusterFS Volume
>> called "data".
> 
> So I assume you have a simple replica 3 setup.
> 
> Are you using sharding?
> 
> 
>> I have mounted the Volume like this
>>
>> mount -t glusterfs -o direct-io-mode=disable 127.0.0.1:/data /mnt/test/
>>
>> so it does not even go over the local NIC but instead over the loopback
>> device.
> 
> You are Network constrained.
> 
> Your mount is direct, but if you have replica 3 the data still has to
> travel to the other two gluster bricks and that is occurring over a
> single 1 Gbit/s ethernet port which would have a maximum throughput of
> 125 MB/s.
> 
> Since you have two streams going out that is roughly 62+ MB/s assuming
> full replica 3.
> 
> My understanding is that gluster doesn't acknowledge a write until its
> been written to at least one of the replicas ( I am sure others will
> jump in and correct me).  So 60 MB/s under those circumstances is what I
> would expect to see.
> 
> You can improve things by using an arbiter and supposedly the new Thin
> Arbiter is even faster (though I haven't tried it), but you lose a
> little safety The arbiter node only receives the metadata so it can
> referee on split-brain decisions, freeing up more BW for the actually
> data replica node.
> 
> A huge improvement would be if you were to bond two or more Gbit/s
> ports. Round-Robin teamd is really easy to setup, or use the traditional
> bonding in its various flavors. You probably have some spare NIC cards
> lying around so its usually a 'freebie'
> 
> Of course best case  would be to make the jump to 10Gb/s kit.
> 
> -wk
>