[Gluster-users] Understanding gluster performance

Sat Jan 25 11:21:35 UTC 2020

Il 22-01-2020 14:02 Barak Sason Rofman ha scritto:
> Hello Gionatan,
> 
> Some time ago we've looked into the effect of logging in gluster on
> performance.
> 
> If it's possible for you, you could try doing a performance test by
> disabling logging.
> 
> In order to fully disable logging, some code needs to be modified.
> 
> You can locate the method called 'skip_logging' at 'logging.c' and
> modify it so it always returns '_gf_true' (return _gf_true;).
> 
> Recompiling and installing is required as well.
> 
> I would be interested in knowing to what extent (if any) this had an
> affect.
> 
> You can contact me if you have any doubts.

Hi Barak, thanks for you reply and sorry for the delay. As I am using 
the gluster repository binary RPM, I can not immediately try to change 
the source code. I think it should be possible by using SRPM, right?

That said, I collected some new stats which, to mee, seem to exclude any 
logging problem; rather, it seems that a specific call (FXATTROP) take a 
very long time.

Here you can find my test volume tv0 config:

[root at singularity gluster]# gluster volume status tv0 detail
Status of volume: tv0
------------------------------------------------------------------------------
Brick                : Brick 172.31.255.1:/dev/shm/brick1
TCP Port             : 49153
RDMA Port            : 0
Online               : Y
Pid                  : 9243
File System          : tmpfs
Device               : tmpfs
Mount Options        : rw,seclabel,nosuid,nodev
Inode Size           : N/A
Disk Space Free      : 1.4GB
Total Disk Space     : 3.8GB
Inode Count          : 998691
Free Inodes          : 998520
------------------------------------------------------------------------------
Brick                : Brick 172.31.255.1:/dev/shm/brick2
TCP Port             : 49154
RDMA Port            : 0
Online               : Y
Pid                  : 9263
File System          : tmpfs
Device               : tmpfs
Mount Options        : rw,seclabel,nosuid,nodev
Inode Size           : N/A
Disk Space Free      : 1.4GB
Total Disk Space     : 3.8GB
Inode Count          : 998691
Free Inodes          : 998520

As you can see, it is a very simple replica 2 volume backed by two 
different /dev/shm (memory/tmpfs) directory. I am using /dev/shm to 
discount any physical fsync latency. I mounted it via gluster native 
client.

Runnig fio produce the following ouput:

[root at singularity tmp]# fio --name=test --filename=/mnt/fuse/test.img 
--size=256M --rw=randwrite --bs=4k --fsync=1
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=psync, iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=2164KiB/s][r=0,w=541 IOPS][eta 
00m:00s]2s]
test: (groupid=0, jobs=1): err= 0: pid=10612: Sat Jan 25 12:03:14 2020
   write: IOPS=540, BW=2163KiB/s (2215kB/s)(256MiB/121181msec)

So, about 550 IOPs. For an all in-memory volume, it seems very slow.

Now the interesting part: first gluster client profile:

=== Interval 4 stats ===
       Duration : 19 secs
      BytesRead : 0
   BytesWritten : 41852928

Block Size   :            4096B+
Read Count   :                 0
Write Count  :             10218

Fop           Call Count    Avg-Latency    Min-Latency    Max-Latency
---           ----------    -----------    -----------    -----------
WRITE               9234       26.97 us       17.27 us      153.22 us
STATFS                 1      306.35 us      306.35 us      306.35 us
FSYNC               9233     1638.45 us      810.85 us     3343.30 us
SETXATTR               1      559.61 us      559.61 us      559.61 us
LOOKUP                 1      445.29 us      445.29 us      445.29 us

As you can see, the client see many slow fsync. Now, server side (please 
consider that client and server profile were taken at different time 
point, but running the exactly same fio workload at the same 
performance):

Brick: 172.31.255.1:/dev/shm/brick1
-----------------------------------
Cumulative Stats:
    Block Size:               4096b+
  No. of Reads:                    0
No. of Writes:               102916
  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls      
    Fop
  ---------   -----------   -----------   -----------   ------------      
   ----
       0.00       0.00 us       0.00 us       0.00 us              2     
RELEASE
       0.00       0.00 us       0.00 us       0.00 us              3  
RELEASEDIR
      10.94      46.20 us      23.20 us     222.63 us           6969      
  FSYNC
      17.35      36.66 us      15.67 us     221.59 us          13937    
FINODELK
      18.51      78.22 us      44.56 us     284.91 us           6968      
  WRITE
      53.20     112.38 us      70.46 us     355.08 us          13937    
FXATTROP

     Duration: 246 seconds
    Data Read: 0 bytes
Data Written: 421543936 bytes

Interval 1 Stats:
    Block Size:               4096b+
  No. of Reads:                    0
No. of Writes:                 3497
  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls      
    Fop
  ---------   -----------   -----------   -----------   ------------      
   ----
      11.01      46.42 us      23.20 us     222.63 us           3498      
  FSYNC
      17.30      36.49 us      15.94 us     195.65 us           6995    
FINODELK
      18.42      77.70 us      44.59 us     257.26 us           3497      
  WRITE
      53.28     112.37 us      71.49 us     355.08 us           6995    
FXATTROP

     Duration: 6 seconds
    Data Read: 0 bytes
Data Written: 14323712 bytes

Brick: 172.31.255.1:/dev/shm/brick2
-----------------------------------
Cumulative Stats:
    Block Size:               4096b+
  No. of Reads:                    0
No. of Writes:               102917
  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls      
    Fop
  ---------   -----------   -----------   -----------   ------------      
   ----
       0.00       0.00 us       0.00 us       0.00 us              2     
RELEASE
       0.00       0.00 us       0.00 us       0.00 us              3  
RELEASEDIR
      11.05      44.83 us      23.04 us     212.17 us           6968      
  FSYNC
      17.39      35.28 us      15.63 us     200.36 us          13938    
FINODELK
      18.04      73.17 us      42.60 us     261.36 us           6969      
  WRITE
      53.52     108.54 us      70.47 us     353.13 us          13938    
FXATTROP

     Duration: 246 seconds
    Data Read: 0 bytes
Data Written: 421548032 bytes

Interval 1 Stats:
    Block Size:               4096b+
  No. of Reads:                    0
No. of Writes:                 3498
  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls      
    Fop
  ---------   -----------   -----------   -----------   ------------      
   ----
      11.24      46.52 us      23.38 us     212.17 us           3497      
  FSYNC
      17.58      36.37 us      16.13 us     194.07 us           6996    
FINODELK
      18.07      74.77 us      42.72 us     237.45 us           3498      
  WRITE
      53.11     109.90 us      70.87 us     353.13 us           6995    
FXATTROP

     Duration: 6 seconds
    Data Read: 0 bytes
Data Written: 14327808 bytes

Now, the slowest operation is FXATTROP, which seems executed at a ratio 
of 2x FSYNC.

So, some questions:
- as my fio 4k sync writes is running at 550 IOPs, why I see so many 
FSYNC and FXATTROP?
- are my stats broken/wrong?
- are the log-disabled test still necessary?

Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti at assyoma.it - info at assyoma.it
GPG public key ID: FF5F32A8