[Gluster-users] Gluster Performance - 12 Gbps SSDs and 10 Gbps NIC

Tue Dec 12 20:08:33 UTC 2023

Wow, HUGE improvement with NFS-Ganesha!

sudo dnf -y install glusterfs-ganesha
sudo vim /etc/ganesha/ganesha.conf

NFS_CORE_PARAM {
    mount_path_pseudo = true;
    Protocols = 3,4;
}
EXPORT_DEFAULTS {
    Access_Type = RW;
}

LOG {
    Default_Log_Level = WARN;
}

EXPORT{
    Export_Id = 1 ;     # Export ID unique to each export
    Path = "/data";     # Path of the volume to be exported

    FSAL {
        name = GLUSTER;
        hostname = "localhost"; # IP of one of the nodes in the trusted pool
        volume = "data";        # Volume name. Eg: "test_volume"
    }

    Access_type = RW;           # Access permissions
    Squash = No_root_squash;    # To enable/disable root squashing
    Disable_ACL = TRUE;         # To enable/disable ACL
    Pseudo = "/data";           # NFSv4 pseudo path for this export
    Protocols = "3","4" ;       # NFS protocols supported
    Transports = "UDP","TCP" ;  # Transport protocols supported
    SecType = "sys";            # Security flavors supported
}

sudo systemctl enable --now nfs-ganesha
sudo vim /etc/fstab

localhost:/data             /data                 nfs
defaults,_netdev          0 0

sudo systemctl daemon-reload
sudo mount -a

fio --name=test --filename=/data/wow --size=1G --readwrite=write

Run status group 0 (all jobs):
  WRITE: bw=2246MiB/s (2355MB/s), 2246MiB/s-2246MiB/s (2355MB/s-2355MB/s),
io=1024MiB (1074MB), run=456-456msec

Yeah 2355MB/s is much better than the original 115MB/s

So in the end, I guess FUSE isn't the best choice.

On Tue, Dec 12, 2023 at 3:00 PM Gilberto Ferreira <
gilberto.nunes32 at gmail.com> wrote:

> Fuse there some overhead.
> Take a look at libgfapi:
>
> https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/libgfapi/
>
> I know this doc somehow is out of date, but could be a hint
>
>
> ---
> Gilberto Nunes Ferreira
> (47) 99676-7530 - Whatsapp / Telegram
>
>
>
>
>
>
> Em ter., 12 de dez. de 2023 às 16:29, Danny <dbray925+gluster at gmail.com>
> escreveu:
>
>> Nope, not a caching thing. I've tried multiple different types of fio
>> tests, all produce the same results. Gbps when hitting the disks locally,
>> slow MB\s when hitting the Gluster FUSE mount.
>>
>> I've been reading up on glustr-ganesha, and will give that a try.
>>
>> On Tue, Dec 12, 2023 at 1:58 PM Ramon Selga <ramon.selga at gmail.com>
>> wrote:
>>
>>> Dismiss my first question: you have SAS 12Gbps SSDs  Sorry!
>>>
>>> El 12/12/23 a les 19:52, Ramon Selga ha escrit:
>>>
>>> May ask you which kind of disks you have in this setup? rotational, ssd
>>> SAS/SATA, nvme?
>>>
>>> Is there a RAID controller with writeback caching?
>>>
>>> It seems to me your fio test on local brick has a unclear result due to
>>> some caching.
>>>
>>> Try something like (you can consider to increase test file size
>>> depending of your caching memory) :
>>>
>>> fio --size=16G --name=test --filename=/gluster/data/brick/wow --bs=1M
>>> --nrfiles=1 --direct=1 --sync=0 --randrepeat=0 --rw=write --refill_buffers
>>> --end_fsync=1 --iodepth=200 --ioengine=libaio
>>>
>>> Also remember a replica 3 arbiter 1 volume writes synchronously to two
>>> data bricks, halving throughput of your network backend.
>>>
>>> Try similar fio on gluster mount but I hardly see more than 300MB/s
>>> writing sequentially on only one fuse mount even with nvme backend. On the
>>> other side, with 4 to 6 clients, you can easily reach 1.5GB/s of aggregate
>>> throughput
>>>
>>> To start, I think is better to try with default parameters for your
>>> replica volume.
>>>
>>> Best regards!
>>>
>>> Ramon
>>>
>>>
>>> El 12/12/23 a les 19:10, Danny ha escrit:
>>>
>>> Sorry, I noticed that too after I posted, so I instantly upgraded to 10.
>>> Issue remains.
>>>
>>> On Tue, Dec 12, 2023 at 1:09 PM Gilberto Ferreira <
>>> gilberto.nunes32 at gmail.com> wrote:
>>>
>>>> I strongly suggest you update to version 10 or higher.
>>>> It's come with significant improvement regarding performance.
>>>> ---
>>>> Gilberto Nunes Ferreira
>>>> (47) 99676-7530 - Whatsapp / Telegram
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Em ter., 12 de dez. de 2023 às 13:03, Danny <dbray925+gluster at gmail.com>
>>>> escreveu:
>>>>
>>>>> MTU is already 9000, and as you can see from the IPERF results, I've
>>>>> got a nice, fast connection between the nodes.
>>>>>
>>>>> On Tue, Dec 12, 2023 at 9:49 AM Strahil Nikolov <hunter86_bg at yahoo.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Let’s try the simple things:
>>>>>>
>>>>>> Check if you can use MTU9000 and if it’s possible, set it on the Bond
>>>>>> Slaves and the bond devices:
>>>>>>  ping GLUSTER_PEER -c 10 -M do -s 8972
>>>>>>
>>>>>> Then try to follow up the recommendations from
>>>>>> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/chap-configuring_red_hat_storage_for_enhancing_performance
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best Regards,
>>>>>> Strahil Nikolov
>>>>>>
>>>>>> On Monday, December 11, 2023, 3:32 PM, Danny <
>>>>>> dbray925+gluster at gmail.com> wrote:
>>>>>>
>>>>>> Hello list, I'm hoping someone can let me know what setting I missed.
>>>>>>
>>>>>> Hardware:
>>>>>> Dell R650 servers, Dual 24 Core Xeon 2.8 GHz, 1 TB RAM
>>>>>> 8x SSD s Negotiated Speed 12 Gbps
>>>>>> PERC H755 Controller - RAID 6
>>>>>> Created virtual "data" disk from the above 8 SSD drives, for a ~20 TB
>>>>>> /dev/sdb
>>>>>>
>>>>>> OS:
>>>>>> CentOS Stream
>>>>>> kernel-4.18.0-526.el8.x86_64
>>>>>> glusterfs-7.9-1.el8.x86_64
>>>>>>
>>>>>> IPERF Test between nodes:
>>>>>> [ ID] Interval           Transfer     Bitrate         Retr
>>>>>> [  5]   0.00-10.00  sec  11.5 GBytes  9.90 Gbits/sec    0
>>>>>> sender
>>>>>> [  5]   0.00-10.04  sec  11.5 GBytes  9.86 Gbits/sec
>>>>>>  receiver
>>>>>>
>>>>>> All good there. ~10 Gbps, as expected.
>>>>>>
>>>>>> LVM Install:
>>>>>> export DISK="/dev/sdb"
>>>>>> sudo parted --script $DISK "mklabel gpt"
>>>>>> sudo parted --script $DISK "mkpart primary 0% 100%"
>>>>>> sudo parted --script $DISK "set 1 lvm on"
>>>>>> sudo pvcreate --dataalignment 128K /dev/sdb1
>>>>>> sudo vgcreate --physicalextentsize 128K gfs_vg /dev/sdb1
>>>>>> sudo lvcreate -L 16G -n gfs_pool_meta gfs_vg
>>>>>> sudo lvcreate -l 95%FREE -n gfs_pool gfs_vg
>>>>>> sudo lvconvert --chunksize 1280K --thinpool gfs_vg/gfs_pool
>>>>>> --poolmetadata gfs_vg/gfs_pool_meta
>>>>>> sudo lvchange --zero n gfs_vg/gfs_pool
>>>>>> sudo lvcreate -V 19.5TiB --thinpool gfs_vg/gfs_pool -n gfs_lv
>>>>>> sudo mkfs.xfs -f -i size=512 -n size=8192 -d su=128k,sw=10
>>>>>> /dev/mapper/gfs_vg-gfs_lv
>>>>>> sudo vim /etc/fstab
>>>>>> /dev/mapper/gfs_vg-gfs_lv   /gluster/data/brick   xfs
>>>>>> rw,inode64,noatime,nouuid 0 0
>>>>>>
>>>>>> sudo systemctl daemon-reload && sudo mount -a
>>>>>> fio --name=test --filename=/gluster/data/brick/wow --size=1G
>>>>>> --readwrite=write
>>>>>>
>>>>>> Run status group 0 (all jobs):
>>>>>>   WRITE: bw=2081MiB/s (2182MB/s), 2081MiB/s-2081MiB/s
>>>>>> (2182MB/s-2182MB/s), io=1024MiB (1074MB), run=492-492msec
>>>>>>
>>>>>> All good there. 2182MB/s =~ 17.5 Gbps. Nice!
>>>>>>
>>>>>>
>>>>>> Gluster install:
>>>>>> export NODE1='10.54.95.123'
>>>>>> export NODE2='10.54.95.124'
>>>>>> export NODE3='10.54.95.125'
>>>>>> sudo gluster peer probe $NODE2
>>>>>> sudo gluster peer probe $NODE3
>>>>>> sudo gluster volume create data replica 3 arbiter 1
>>>>>> $NODE1:/gluster/data/brick $NODE2:/gluster/data/brick
>>>>>> $NODE3:/gluster/data/brick force
>>>>>> sudo gluster volume set data network.ping-timeout 5
>>>>>> sudo gluster volume set data performance.client-io-threads on
>>>>>> sudo gluster volume set data group metadata-cache
>>>>>> sudo gluster volume start data
>>>>>> sudo gluster volume info all
>>>>>>
>>>>>> Volume Name: data
>>>>>> Type: Replicate
>>>>>> Volume ID: b52b5212-82c8-4b1a-8db3-52468bc0226e
>>>>>> Status: Started
>>>>>> Snapshot Count: 0
>>>>>> Number of Bricks: 1 x (2 + 1) = 3
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: 10.54.95.123:/gluster/data/brick
>>>>>> Brick2: 10.54.95.124:/gluster/data/brick
>>>>>> Brick3: 10.54.95.125:/gluster/data/brick (arbiter)
>>>>>> Options Reconfigured:
>>>>>> network.inode-lru-limit: 200000
>>>>>> performance.md-cache-timeout: 600
>>>>>> performance.cache-invalidation: on
>>>>>> performance.stat-prefetch: on
>>>>>> features.cache-invalidation-timeout: 600
>>>>>> features.cache-invalidation: on
>>>>>> network.ping-timeout: 5
>>>>>> transport.address-family: inet
>>>>>> storage.fips-mode-rchecksum: on
>>>>>> nfs.disable: on
>>>>>> performance.client-io-threads: on
>>>>>>
>>>>>> sudo vim /etc/fstab
>>>>>> localhost:/data             /data                 glusterfs
>>>>>> defaults,_netdev      0 0
>>>>>>
>>>>>> sudo systemctl daemon-reload && sudo mount -a
>>>>>> fio --name=test --filename=/data/wow --size=1G --readwrite=write
>>>>>>
>>>>>> Run status group 0 (all jobs):
>>>>>>   WRITE: bw=109MiB/s (115MB/s), 109MiB/s-109MiB/s (115MB/s-115MB/s),
>>>>>> io=1024MiB (1074MB), run=9366-9366msec
>>>>>>
>>>>>> Oh no, what's wrong? From 2182MB/s down to only 115MB/s? What am I
>>>>>> missing? I'm not expecting the above ~17 Gbps, but I'm thinking it should
>>>>>> at least be close(r) to ~10 Gbps.
>>>>>>
>>>>>> Any suggestions?
>>>>>> ________
>>>>>>
>>>>>>
>>>>>>
>>>>>> Community Meeting Calendar:
>>>>>>
>>>>>> Schedule -
>>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>> ________
>>>>>
>>>>>
>>>>>
>>>>> Community Meeting Calendar:
>>>>>
>>>>> Schedule -
>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>
>>> ________
>>>
>>>
>>>
>>> Community Meeting Calendar:
>>>
>>> Schedule -
>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>> ________
>>>
>>>
>>>
>>> Community Meeting Calendar:
>>>
>>> Schedule -
>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>> ________
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20231212/0f2f12a9/attachment.html>