[Gluster-users] Gluster Performance - 12 Gbps SSDs and 10 Gbps NIC

Gilberto Ferreira gilberto.nunes32 at gmail.com
Thu Dec 14 12:59:25 UTC 2023


Thanks for the advice.
---
Gilberto Nunes Ferreira
(47) 99676-7530 - Whatsapp / Telegram






Em qui., 14 de dez. de 2023 às 09:54, Strahil Nikolov <hunter86_bg at yahoo.com>
escreveu:

> Hi Gilberto,
>
>
> Have you checked
> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/chap-configuring_red_hat_storage_for_enhancing_performance
>  ?
>
> I think that you will need to test the virt profile as the settings will
> prevent some bad situations - especially VM live migration.
> You should also consider sharding which can reduce healing time but also
> makes your life more difficult if you need to access the disks of the VMs.
>
> I think that client.event-thread , server.event-thread and
> performance.io-thread-count can be tuned in your case. Consider setting ip
> a VM using the gluster volume as backing store and run the tests inside the
> VM to simulate real workload (best is to run a DB, webserver, etc inside
> a VM).
>
> Best Regards,
> Strahil Nikolov
>
>
>
>
>
>
>
>
> On Wednesday, December 13, 2023, 2:34 PM, Gilberto Ferreira <
> gilberto.nunes32 at gmail.com> wrote:
>
> Hi all
> Aravinda, usually I set this in two server env and never get split brain:
> gluster vol set VMS cluster.heal-timeout 5
> gluster vol heal VMS enable
> gluster vol set VMS cluster.quorum-reads false
> gluster vol set VMS cluster.quorum-count 1
> gluster vol set VMS network.ping-timeout 2
> gluster vol set VMS cluster.favorite-child-policy mtime
> gluster vol heal VMS granular-entry-heal enable
> gluster vol set VMS cluster.data-self-heal-algorithm full
> gluster vol set VMS features.shard on
>
> Strahil, in general, I get 0,06ms with 1G dedicated NIC.
> My env are very simple, using Proxmox + QEMU/KVM, with 3 or 5 VM.
>
>
> ---
> Gilberto Nunes Ferreira
> (47) 99676-7530 - Whatsapp / Telegram
>
>
>
>
>
>
> Em qua., 13 de dez. de 2023 às 06:08, Strahil Nikolov <
> hunter86_bg at yahoo.com> escreveu:
>
> Hi Aravinda,
>
> Based on the output it’s a ‘replica 3 arbiter 1’ type.
>
> Gilberto,
> What’s the latency between the nodes ?
>
> Best Regards,
> Strahil Nikolov
>
>
>
> On Wednesday, December 13, 2023, 7:36 AM, Aravinda <aravinda at kadalu.tech>
> wrote:
>
> Only Replica 2 or Distributed Gluster volumes can be created with two
> servers. High chance of split brain with Replica 2 compared to Replica 3
> volume.
>
> For NFS Ganesha, no issue exporting the volume even if only one server is
> available. Run NFS Ganesha servers in Gluster server nodes and NFS clients
> from the network can connect to any NFS Ganesha server.
>
> You can use Haproxy + Keepalived (or any other load balancer) if high
> availability required for the NFS Ganesha connections (Ex: If a server node
> goes down, then nfs client can connect to other NFS ganesha server node).
>
> --
> Aravinda
> Kadalu Technologies
>
>
>
> ---- On Wed, 13 Dec 2023 01:42:11 +0530 *Gilberto Ferreira
> <gilberto.nunes32 at gmail.com <gilberto.nunes32 at gmail.com>>* wrote ---
>
> Ah that's nice.
> Somebody knows this can be achieved with two servers?
>
> ---
> Gilberto Nunes Ferreira
> (47) 99676-7530 - Whatsapp / Telegram
>
>
>
>
>
>
>
> Em ter., 12 de dez. de 2023 às 17:08, Danny <dbray925+gluster at gmail.com>
> escreveu:
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> Wow, HUGE improvement with NFS-Ganesha!
>
>
> sudo dnf -y install glusterfs-ganesha
> sudo vim /etc/ganesha/ganesha.conf
>
> NFS_CORE_PARAM {
>     mount_path_pseudo = true;
>     Protocols = 3,4;
> }
> EXPORT_DEFAULTS {
>     Access_Type = RW;
> }
>
> LOG {
>     Default_Log_Level = WARN;
> }
>
> EXPORT{
>     Export_Id = 1 ;     # Export ID unique to each export
>     Path = "/data";     # Path of the volume to be exported
>
>     FSAL {
>         name = GLUSTER;
>         hostname = "localhost"; # IP of one of the nodes in the trusted
> pool
>         volume = "data";        # Volume name. Eg: "test_volume"
>     }
>
>     Access_type = RW;           # Access permissions
>     Squash = No_root_squash;    # To enable/disable root squashing
>     Disable_ACL = TRUE;         # To enable/disable ACL
>     Pseudo = "/data";           # NFSv4 pseudo path for this export
>     Protocols = "3","4" ;       # NFS protocols supported
>     Transports = "UDP","TCP" ;  # Transport protocols supported
>     SecType = "sys";            # Security flavors supported
> }
>
>
> sudo systemctl enable --now nfs-ganesha
> sudo vim /etc/fstab
>
> localhost:/data             /data                 nfs
> defaults,_netdev          0 0
>
>
> sudo systemctl daemon-reload
> sudo mount -a
>
> fio --name=test --filename=/data/wow --size=1G --readwrite=write
>
> Run status group 0 (all jobs):
>   WRITE: bw=2246MiB/s (2355MB/s), 2246MiB/s-2246MiB/s (2355MB/s-2355MB/s),
> io=1024MiB (1074MB), run=456-456msec
>
> Yeah 2355MB/s is much better than the original 115MB/s
>
> So in the end, I guess FUSE isn't the best choice.
>
> On Tue, Dec 12, 2023 at 3:00 PM Gilberto Ferreira <
> gilberto.nunes32 at gmail.com> wrote:
>
> Fuse there some overhead.
> Take a look at libgfapi:
>
> https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/libgfapi/
>
> I know this doc somehow is out of date, but could be a hint
>
>
> ---
> Gilberto Nunes Ferreira
> (47) 99676-7530 - Whatsapp / Telegram
>
>
>
>
>
>
>
> Em ter., 12 de dez. de 2023 às 16:29, Danny <dbray925+gluster at gmail.com>
> escreveu:
>
> Nope, not a caching thing. I've tried multiple different types of fio
> tests, all produce the same results. Gbps when hitting the disks locally,
> slow MB\s when hitting the Gluster FUSE mount.
>
> I've been reading up on glustr-ganesha, and will give that a try.
>
> On Tue, Dec 12, 2023 at 1:58 PM Ramon Selga <ramon.selga at gmail.com> wrote:
>
> Dismiss my first question: you have SAS 12Gbps SSDs  Sorry!
>
> El 12/12/23 a les 19:52, Ramon Selga ha escrit:
>
> May ask you which kind of disks you have in this setup? rotational, ssd
> SAS/SATA, nvme?
>
> Is there a RAID controller with writeback caching?
>
> It seems to me your fio test on local brick has a unclear result due to
> some caching.
>
> Try something like (you can consider to increase test file size depending
> of your caching memory) :
>
> fio --size=16G --name=test --filename=/gluster/data/brick/wow --bs=1M
> --nrfiles=1 --direct=1 --sync=0 --randrepeat=0 --rw=write --refill_buffers
> --end_fsync=1 --iodepth=200 --ioengine=libaio
>
> Also remember a replica 3 arbiter 1 volume writes synchronously to two
> data bricks, halving throughput of your network backend.
>
> Try similar fio on gluster mount but I hardly see more than 300MB/s
> writing sequentially on only one fuse mount even with nvme backend. On the
> other side, with 4 to 6 clients, you can easily reach 1.5GB/s of aggregate
> throughput
>
> To start, I think is better to try with default parameters for your
> replica volume.
>
> Best regards!
>
> Ramon
>
>
> El 12/12/23 a les 19:10, Danny ha escrit:
>
> Sorry, I noticed that too after I posted, so I instantly upgraded to 10.
> Issue remains.
>
> On Tue, Dec 12, 2023 at 1:09 PM Gilberto Ferreira <
> gilberto.nunes32 at gmail.com> wrote:
>
> I strongly suggest you update to version 10 or higher.
> It's come with significant improvement regarding performance.
> ---
> Gilberto Nunes Ferreira
> (47) 99676-7530 - Whatsapp / Telegram
>
>
>
>
>
>
> Em ter., 12 de dez. de 2023 às 13:03, Danny <dbray925+gluster at gmail.com>
> escreveu:
>
> MTU is already 9000, and as you can see from the IPERF results, I've got a
> nice, fast connection between the nodes.
>
> On Tue, Dec 12, 2023 at 9:49 AM Strahil Nikolov <hunter86_bg at yahoo.com>
> wrote:
>
> Hi,
>
> Let’s try the simple things:
>
> Check if you can use MTU9000 and if it’s possible, set it on the Bond
> Slaves and the bond devices:
>  ping GLUSTER_PEER -c 10 -M do -s 8972
>
> Then try to follow up the recommendations from
> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/chap-configuring_red_hat_storage_for_enhancing_performance
>
>
>
> Best Regards,
> Strahil Nikolov
>
> On Monday, December 11, 2023, 3:32 PM, Danny <dbray925+gluster at gmail.com>
> wrote:
>
> Hello list, I'm hoping someone can let me know what setting I missed.
>
> Hardware:
> Dell R650 servers, Dual 24 Core Xeon 2.8 GHz, 1 TB RAM
> 8x SSD s Negotiated Speed 12 Gbps
> PERC H755 Controller - RAID 6
> Created virtual "data" disk from the above 8 SSD drives, for a ~20 TB
> /dev/sdb
>
> OS:
> CentOS Stream
> kernel-4.18.0-526.el8.x86_64
> glusterfs-7.9-1.el8.x86_64
>
> IPERF Test between nodes:
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec  11.5 GBytes  9.90 Gbits/sec    0
> sender
> [  5]   0.00-10.04  sec  11.5 GBytes  9.86 Gbits/sec
>  receiver
>
> All good there. ~10 Gbps, as expected.
>
> LVM Install:
> export DISK="/dev/sdb"
> sudo parted --script $DISK "mklabel gpt"
> sudo parted --script $DISK "mkpart primary 0% 100%"
> sudo parted --script $DISK "set 1 lvm on"
> sudo pvcreate --dataalignment 128K /dev/sdb1
> sudo vgcreate --physicalextentsize 128K gfs_vg /dev/sdb1
> sudo lvcreate -L 16G -n gfs_pool_meta gfs_vg
> sudo lvcreate -l 95%FREE -n gfs_pool gfs_vg
> sudo lvconvert --chunksize 1280K --thinpool gfs_vg/gfs_pool --poolmetadata
> gfs_vg/gfs_pool_meta
> sudo lvchange --zero n gfs_vg/gfs_pool
> sudo lvcreate -V 19.5TiB --thinpool gfs_vg/gfs_pool -n gfs_lv
> sudo mkfs.xfs -f -i size=512 -n size=8192 -d su=128k,sw=10
> /dev/mapper/gfs_vg-gfs_lv
> sudo vim /etc/fstab
> /dev/mapper/gfs_vg-gfs_lv   /gluster/data/brick   xfs
> rw,inode64,noatime,nouuid 0 0
>
> sudo systemctl daemon-reload && sudo mount -a
> fio --name=test --filename=/gluster/data/brick/wow --size=1G
> --readwrite=write
>
> Run status group 0 (all jobs):
>   WRITE: bw=2081MiB/s (2182MB/s), 2081MiB/s-2081MiB/s (2182MB/s-2182MB/s),
> io=1024MiB (1074MB), run=492-492msec
>
> All good there. 2182MB/s =~ 17.5 Gbps. Nice!
>
>
> Gluster install:
> export NODE1='10.54.95.123'
> export NODE2='10.54.95.124'
> export NODE3='10.54.95.125'
> sudo gluster peer probe $NODE2
> sudo gluster peer probe $NODE3
> sudo gluster volume create data replica 3 arbiter 1
> $NODE1:/gluster/data/brick $NODE2:/gluster/data/brick
> $NODE3:/gluster/data/brick force
> sudo gluster volume set data network.ping-timeout 5
> sudo gluster volume set data performance.client-io-threads on
> sudo gluster volume set data group metadata-cache
> sudo gluster volume start data
> sudo gluster volume info all
>
> Volume Name: data
> Type: Replicate
> Volume ID: b52b5212-82c8-4b1a-8db3-52468bc0226e
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: 10.54.95.123:/gluster/data/brick
> Brick2: 10.54.95.124:/gluster/data/brick
> Brick3: 10.54.95.125:/gluster/data/brick (arbiter)
> Options Reconfigured:
> network.inode-lru-limit: 200000
> performance.md-cache-timeout: 600
> performance.cache-invalidation: on
> performance.stat-prefetch: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> network.ping-timeout: 5
> transport.address-family: inet
> storage.fips-mode-rchecksum: on
> nfs.disable: on
> performance.client-io-threads: on
>
> sudo vim /etc/fstab
> localhost:/data             /data                 glusterfs
> defaults,_netdev      0 0
>
> sudo systemctl daemon-reload && sudo mount -a
> fio --name=test --filename=/data/wow --size=1G --readwrite=write
>
> Run status group 0 (all jobs):
>   WRITE: bw=109MiB/s (115MB/s), 109MiB/s-109MiB/s (115MB/s-115MB/s),
> io=1024MiB (1074MB), run=9366-9366msec
>
> Oh no, what's wrong? From 2182MB/s down to only 115MB/s? What am I
> missing? I'm not expecting the above ~17 Gbps, but I'm thinking it should
> at least be close(r) to ~10 Gbps.
>
> Any suggestions?
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20231214/eca9565b/attachment.html>


More information about the Gluster-users mailing list