[Gluster-users] Configuring legacy Gulster NFS

Mon May 25 14:28:10 UTC 2020

I forgot to mention that you need  to verify/set the VMware machines for high-performance/low-lattency workload.

На 25 май 2020 г. 17:13:52 GMT+03:00, Strahil Nikolov <hunter86_bg at yahoo.com> написа:
>
>
>На 25 май 2020 г. 5:49:00 GMT+03:00, Olivier
><Olivier.Nicole at cs.ait.ac.th> написа:
>>Strahil Nikolov <hunter86_bg at yahoo.com> writes:
>>
>>> On May 23, 2020 7:29:23 AM GMT+03:00, Olivier
>><Olivier.Nicole at cs.ait.ac.th> wrote:
>>>>Hi,
>>>>
>>>>I have been struggling with NFS Ganesha: one gluster node with
>>ganesha
>>>>serving only one client could not handle the load when dealing with
>>>>thousand of small files. Legacy gluster NFS works flawlesly with 5
>or
>>6
>>>>clients.
>>>>
>>>>But the documentation for gNFS is scarce, I could not find where to
>>>>configure the various autorizations, so any pointer is greatly
>>welcome.
>>>>
>>>>Best regards,
>>>>
>>>>Olivier
>>>
>>> Hi Oliver,
>>>
>>> Can you hint me why you are using gluster with a single node in the
>>TSP serving only 1 client ?
>>> Usually, this is not a typical gluster workload.
>>
>>Hi Strahil,
>>
>>Of course I have more than one node, other nodes are supporting the
>>bricks and the data. I am using a node with no data to solve this
>issue
>>with NFS. But in my comparison between gNFS and Ganesha, I was using
>>the
>>same configuration, with one node with no birck accessing the other
>>nodes for the data. So the only change between what is working and
>what
>>was not is the NFS server. Beside, I have been using NFS for over 15
>>years and know that given my data and type of activity, one single NFS
>>server should be able to serve 5 to 10 clients without a problem, that
>>is why I suspected Ganesha from the begining.
>
>You are not clmparing apples-to-apples. Pure  NFS  has been used in
>UNIXes  before  reaching modern OS-es. Linux  has long  been using Pure
>NFS  and the kernel has been optimized  for that, while  Ganesha is new
> tech and requires  some tuning.
>
>You haven't mentioned  what kind of  issues  do  you see -  searching 
>a directory, accessing a  lot  of  files  for  read, writing a lot of
>small files, etc.
>
>Usually  a  negative lookup (searching/accessing) inexisting  object
>(file/dir/etc) has  a  serious performance degradation.
>
>>If I cannot configure gNFS, I think I could glusterfs_mount the volume
>>and use the native NFS server of Linux, but that would add overhead
>and
>>leave some features behind, that is why my focus is primarily on
>>configuring gNFS.
>>
>>>
>>> Also can you specify:
>>> - Brick block device type and details (raid type, lvm, vdo, etc )
>>
>>All nodes are VMware virtual machines, the RAID being at VMware level
>
>Yeah,  that's not very descriptive.
>For  write-intensive  and small-file workload  the optimal raid mode 
>is  raid10  with at least  12 disks per node.
>What is the I/O scheduler,  are you using Thin LVM or thic?  How  many
>snapshots  you have ?
>Are  you using striping  on LVM level (  if you use  local  storage
>then most probably no striping)?
>PE  size  in VG  ?
>
>>> - xfs_info of the brick
>
>What kind  of  FS  are  you  using  ?  You need  to be  sure  that 
>inode  size  is at least  512 bytes (1024  for  swift)  in order  to be
> supported.
>
>>> - mount options  for the brick
>>
>>Bricks are not mounted
>
>It  is not good  to share OS  and Gluster Bricks VMDK. You can benefit 
>from options like 'noatime,nodiratime,nobarrier,inode64'  .  Noatime 
>requires  storage  with battery-backed  write  cache.
>
>>> - SELINUX/APPARMOR status
>>> - sysctl tunables (including tuned profile)
>>
>>All systems are vanilla Ubuntu with no tuning.
>
>I have  done  some tests and you can benefit from the rhgs random IO 
>tuned profile . The latest  source  rpm can be  found  at:
>ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.5.0.0-1.el7rhgs.src.rpm
>
>On top  of  that  you need  to  modify it  to disable LRO,  as  it  is 
>automatically  enabled  for VMXNET NICs. This  increases bandwidth but
>reduces  lattency which is crucial  for  looking up thousand  of
>files/directories.
>
>>> - gluster volume information and status
>>
>>sudo gluster volume info gv0
>>
>>Volume Name: gv0
>>Type: Distributed-Replicate
>>Volume ID: cc664830-1dd0-4dd4-9f1c-493578297e79
>>Status: Started
>>Snapshot Count: 0
>>Number of Bricks: 2 x 2 = 4
>>Transport-type: tcp
>>Bricks:
>>Brick1: gluster3000:/gluster1/br
>>Brick2: gluster5000:/gluster/br
>>Brick3: gluster3000:/gluster2/br
>>Brick4: gluster2000:/gluster/br
>>Options Reconfigured:
>>features.quota-deem-statfs: on
>>features.inode-quota: on
>>features.quota: on
>>transport.address-family: inet
>>nfs.disable: off
>>features.cache-invalidation: on
>>on at gluster3:~$ sudo gluster volume status gv0
>>Status of volume: gv0
>>Gluster process                             TCP Port  RDMA Port 
>Online
>> Pid
>>------------------------------------------------------------------------------
>>Brick gluster3000:/gluster1/br              49152     0          Y    
>
>> 1473
>>Brick gluster5000:/gluster/br               49152     0          Y    
>
>> 724
>>Brick gluster3000:/gluster2/br              49153     0          Y    
>
>> 1549
>>Brick gluster2000:/gluster/br               49152     0          Y    
>
>> 723
>>Self-heal Daemon on localhost               N/A       N/A        Y    
>
>> 1571
>>NFS Server on localhost                     N/A       N/A        N    
>
>> N/A
>>Quota Daemon on localhost                   N/A       N/A        Y    
>
>> 1560
>>Self-heal Daemon on gluster2000.cs.ait.ac.t
>>h                                           N/A       N/A        Y    
>
>> 835
>>NFS Server on gluster2000.cs.ait.ac.th      N/A       N/A        N    
>
>> N/A
>>Quota Daemon on gluster2000.cs.ait.ac.th    N/A       N/A        Y    
>
>> 735
>>Self-heal Daemon on gluster5000.cs.ait.ac.t
>>h                                           N/A       N/A        Y    
>
>> 829
>>NFS Server on gluster5000.cs.ait.ac.th      N/A       N/A        N    
>
>> N/A
>>Quota Daemon on gluster5000.cs.ait.ac.th    N/A       N/A        Y    
>
>> 736
>>Self-heal Daemon on fbsd3500                N/A       N/A        Y    
>
>> 2584
>>NFS Server on fbsd3500                      2049      0          Y    
>
>> 2671
>>Quota Daemon on fbsd3500                    N/A       N/A        Y    
>
>> 2571
>>
>>Task Status of Volume gv0
>>------------------------------------------------------------------------------
>>Task                 : Rebalance
>>ID                   : 53e7c649-27f0-4da0-90dc-af59f937d01f
>>Status               : completed
>
>
>You don't have any tunings  in the volume, despite  the  predefined 
>ones  in  /var/lib/glusterd/groups.
>Both metadata-cache and nl-cache bring some performance  when having a
>small-file  workload.  You  have  to try them and check the results.
>Use  a  real-world  workload  job for testing, as  synthetic benches do
>not always show the real truth.
>In order to reset (revert) a setting you can use 'gluster volume  reset
> gv0  <setting>'
>
>
>>> - ganesha settings
>>
>>MDCACHE
>>{
>>Attr_Expiration_Time = 600;
>>Entries_HWMark = 50000;
>>LRU_Run_Interval = 90;
>>FD_HWMark_Percent = 60;
>>FD_LWMark_Percent = 20;
>>FD_Limit_Percent = 90;
>>}
>>EXPORT
>>{
>>        Export_Id = 2;
>>        etc.
>>}
>>
>>> - Network settings + MTU
>>
>>MTU 1500 (I think it is my switch that never worked with jumbo
>>frames). I have a dedicated VLAN for NFS and gluster and a VLAN for
>>users connection.
>
>Verify that there  is no fragmentation between the TSP  nodes and
>between the NFS (Ganesha) and the cluster:
>For  example  MTU  is  1500 ,  then use  a  size  of  1500  - 28 (ICMP 
>+  IP headers) = 1472
>ping  -M do -s  1472  -c  4 -I <interface> <other  gluster node>
>
>Even the dumbest gigabit switches  support Jumbo frames  of  9000 
>(anything  above that is  supported  by expensive hardware),  so I 
>would  recommend  you to verify if Jumbo frames  is  possible  at least
>between the TSP nodes  and  maybe the NFS.
>
>>I hope that helps.
>>
>>Best regards,
>>
>>Olivier
>>
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>>
>
>
>As you can see you  are getting further into the deep and we haven't
>covered the storage stack yet, nor any Ganesha settings :) 
>
>Good luck!
>
>Best Regards,
>Strahil  Nikolov