[Gluster-users] Configuring legacy Gulster NFS
Strahil Nikolov
hunter86_bg at yahoo.com
Mon May 25 14:28:10 UTC 2020
I forgot to mention that you need to verify/set the VMware machines for high-performance/low-lattency workload.
На 25 май 2020 г. 17:13:52 GMT+03:00, Strahil Nikolov <hunter86_bg at yahoo.com> написа:
>
>
>На 25 май 2020 г. 5:49:00 GMT+03:00, Olivier
><Olivier.Nicole at cs.ait.ac.th> написа:
>>Strahil Nikolov <hunter86_bg at yahoo.com> writes:
>>
>>> On May 23, 2020 7:29:23 AM GMT+03:00, Olivier
>><Olivier.Nicole at cs.ait.ac.th> wrote:
>>>>Hi,
>>>>
>>>>I have been struggling with NFS Ganesha: one gluster node with
>>ganesha
>>>>serving only one client could not handle the load when dealing with
>>>>thousand of small files. Legacy gluster NFS works flawlesly with 5
>or
>>6
>>>>clients.
>>>>
>>>>But the documentation for gNFS is scarce, I could not find where to
>>>>configure the various autorizations, so any pointer is greatly
>>welcome.
>>>>
>>>>Best regards,
>>>>
>>>>Olivier
>>>
>>> Hi Oliver,
>>>
>>> Can you hint me why you are using gluster with a single node in the
>>TSP serving only 1 client ?
>>> Usually, this is not a typical gluster workload.
>>
>>Hi Strahil,
>>
>>Of course I have more than one node, other nodes are supporting the
>>bricks and the data. I am using a node with no data to solve this
>issue
>>with NFS. But in my comparison between gNFS and Ganesha, I was using
>>the
>>same configuration, with one node with no birck accessing the other
>>nodes for the data. So the only change between what is working and
>what
>>was not is the NFS server. Beside, I have been using NFS for over 15
>>years and know that given my data and type of activity, one single NFS
>>server should be able to serve 5 to 10 clients without a problem, that
>>is why I suspected Ganesha from the begining.
>
>You are not clmparing apples-to-apples. Pure NFS has been used in
>UNIXes before reaching modern OS-es. Linux has long been using Pure
>NFS and the kernel has been optimized for that, while Ganesha is new
> tech and requires some tuning.
>
>You haven't mentioned what kind of issues do you see - searching
>a directory, accessing a lot of files for read, writing a lot of
>small files, etc.
>
>Usually a negative lookup (searching/accessing) inexisting object
>(file/dir/etc) has a serious performance degradation.
>
>>If I cannot configure gNFS, I think I could glusterfs_mount the volume
>>and use the native NFS server of Linux, but that would add overhead
>and
>>leave some features behind, that is why my focus is primarily on
>>configuring gNFS.
>>
>>>
>>> Also can you specify:
>>> - Brick block device type and details (raid type, lvm, vdo, etc )
>>
>>All nodes are VMware virtual machines, the RAID being at VMware level
>
>Yeah, that's not very descriptive.
>For write-intensive and small-file workload the optimal raid mode
>is raid10 with at least 12 disks per node.
>What is the I/O scheduler, are you using Thin LVM or thic? How many
>snapshots you have ?
>Are you using striping on LVM level ( if you use local storage
>then most probably no striping)?
>PE size in VG ?
>
>>> - xfs_info of the brick
>
>What kind of FS are you using ? You need to be sure that
>inode size is at least 512 bytes (1024 for swift) in order to be
> supported.
>
>>> - mount options for the brick
>>
>>Bricks are not mounted
>
>It is not good to share OS and Gluster Bricks VMDK. You can benefit
>from options like 'noatime,nodiratime,nobarrier,inode64' . Noatime
>requires storage with battery-backed write cache.
>
>>> - SELINUX/APPARMOR status
>>> - sysctl tunables (including tuned profile)
>>
>>All systems are vanilla Ubuntu with no tuning.
>
>I have done some tests and you can benefit from the rhgs random IO
>tuned profile . The latest source rpm can be found at:
>ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.5.0.0-1.el7rhgs.src.rpm
>
>On top of that you need to modify it to disable LRO, as it is
>automatically enabled for VMXNET NICs. This increases bandwidth but
>reduces lattency which is crucial for looking up thousand of
>files/directories.
>
>>> - gluster volume information and status
>>
>>sudo gluster volume info gv0
>>
>>Volume Name: gv0
>>Type: Distributed-Replicate
>>Volume ID: cc664830-1dd0-4dd4-9f1c-493578297e79
>>Status: Started
>>Snapshot Count: 0
>>Number of Bricks: 2 x 2 = 4
>>Transport-type: tcp
>>Bricks:
>>Brick1: gluster3000:/gluster1/br
>>Brick2: gluster5000:/gluster/br
>>Brick3: gluster3000:/gluster2/br
>>Brick4: gluster2000:/gluster/br
>>Options Reconfigured:
>>features.quota-deem-statfs: on
>>features.inode-quota: on
>>features.quota: on
>>transport.address-family: inet
>>nfs.disable: off
>>features.cache-invalidation: on
>>on at gluster3:~$ sudo gluster volume status gv0
>>Status of volume: gv0
>>Gluster process TCP Port RDMA Port
>Online
>> Pid
>>------------------------------------------------------------------------------
>>Brick gluster3000:/gluster1/br 49152 0 Y
>
>> 1473
>>Brick gluster5000:/gluster/br 49152 0 Y
>
>> 724
>>Brick gluster3000:/gluster2/br 49153 0 Y
>
>> 1549
>>Brick gluster2000:/gluster/br 49152 0 Y
>
>> 723
>>Self-heal Daemon on localhost N/A N/A Y
>
>> 1571
>>NFS Server on localhost N/A N/A N
>
>> N/A
>>Quota Daemon on localhost N/A N/A Y
>
>> 1560
>>Self-heal Daemon on gluster2000.cs.ait.ac.t
>>h N/A N/A Y
>
>> 835
>>NFS Server on gluster2000.cs.ait.ac.th N/A N/A N
>
>> N/A
>>Quota Daemon on gluster2000.cs.ait.ac.th N/A N/A Y
>
>> 735
>>Self-heal Daemon on gluster5000.cs.ait.ac.t
>>h N/A N/A Y
>
>> 829
>>NFS Server on gluster5000.cs.ait.ac.th N/A N/A N
>
>> N/A
>>Quota Daemon on gluster5000.cs.ait.ac.th N/A N/A Y
>
>> 736
>>Self-heal Daemon on fbsd3500 N/A N/A Y
>
>> 2584
>>NFS Server on fbsd3500 2049 0 Y
>
>> 2671
>>Quota Daemon on fbsd3500 N/A N/A Y
>
>> 2571
>>
>>Task Status of Volume gv0
>>------------------------------------------------------------------------------
>>Task : Rebalance
>>ID : 53e7c649-27f0-4da0-90dc-af59f937d01f
>>Status : completed
>
>
>You don't have any tunings in the volume, despite the predefined
>ones in /var/lib/glusterd/groups.
>Both metadata-cache and nl-cache bring some performance when having a
>small-file workload. You have to try them and check the results.
>Use a real-world workload job for testing, as synthetic benches do
>not always show the real truth.
>In order to reset (revert) a setting you can use 'gluster volume reset
> gv0 <setting>'
>
>
>>> - ganesha settings
>>
>>MDCACHE
>>{
>>Attr_Expiration_Time = 600;
>>Entries_HWMark = 50000;
>>LRU_Run_Interval = 90;
>>FD_HWMark_Percent = 60;
>>FD_LWMark_Percent = 20;
>>FD_Limit_Percent = 90;
>>}
>>EXPORT
>>{
>> Export_Id = 2;
>> etc.
>>}
>>
>>> - Network settings + MTU
>>
>>MTU 1500 (I think it is my switch that never worked with jumbo
>>frames). I have a dedicated VLAN for NFS and gluster and a VLAN for
>>users connection.
>
>Verify that there is no fragmentation between the TSP nodes and
>between the NFS (Ganesha) and the cluster:
>For example MTU is 1500 , then use a size of 1500 - 28 (ICMP
>+ IP headers) = 1472
>ping -M do -s 1472 -c 4 -I <interface> <other gluster node>
>
>Even the dumbest gigabit switches support Jumbo frames of 9000
>(anything above that is supported by expensive hardware), so I
>would recommend you to verify if Jumbo frames is possible at least
>between the TSP nodes and maybe the NFS.
>
>>I hope that helps.
>>
>>Best regards,
>>
>>Olivier
>>
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>>
>
>
>As you can see you are getting further into the deep and we haven't
>covered the storage stack yet, nor any Ganesha settings :)
>
>Good luck!
>
>Best Regards,
>Strahil Nikolov
More information about the Gluster-users
mailing list