[Bugs] [Bug 1664934] New: glusterfs-fuse not client not benefiting from page cache on read after write

Thu Jan 10 05:17:17 UTC 2019

https://bugzilla.redhat.com/show_bug.cgi?id=1664934

            Bug ID: 1664934
           Summary: glusterfs-fuse not client not benefiting from page
                    cache on read after write
           Product: GlusterFS
           Version: 5
          Hardware: x86_64
                OS: Linux
            Status: NEW
         Component: fuse
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: mpillai at redhat.com
                CC: bugs at gluster.org
  Target Milestone: ---
    Classification: Community

Description of problem:
On a simple single brick distribute volume, I'm running tests to validate
glusterfs-fuse client's use of page cache. The tests are indicating that a read
following a write is reading from the brick, not from client cache. In
contrast, a 2nd read gets data from the client cache.

Version-Release number of selected component (if applicable):

glusterfs-*5.2-1.el7.x86_64
kernel-3.10.0-957.el7.x86_64 (RHEL 7.6)

How reproducible:

Consistently

Steps to Reproduce:
1. use fio to create a data set that would fit easily in the page cache. My
client has 128 GB RAM; I'll create a 64 GB data set:

fio --name=initialwrite --ioengine=sync --rw=write \
--direct=0 --create_on_open=1 --end_fsync=1 --bs=128k \
--directory=/mnt/glustervol/ --filename_format=f.\$jobnum.\$filenum \
--filesize=16g --size=16g --numjobs=4

2. run an fio read test that reads the data set from step 1, without
invalidating the page cache:

fio --name=readtest --ioengine=sync --rw=read --invalidate=0 \
--direct=0 --bs=128k --directory=/mnt/glustervol/ \
--filename_format=f.\$jobnum.\$filenum --filesize=16g \
--size=16g --numjobs=4

Read throughput is much lower than it would be if reading from page cache:
READ: bw=573MiB/s (601MB/s), 143MiB/s-144MiB/s (150MB/s-150MB/s), io=64.0GiB
(68.7GB), run=114171-114419msec

Reads are going over the 10GbE network as shown in (edited) sar output:
05:01:04 AM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s 
05:01:06 AM       em1 755946.26  40546.26 1116287.75   3987.24      0.00

[There is some read amplification here: application is getting lower throughput
than what client is reading over the n/w. More on that later]      

3. Run the read test in step 2 again. This time read throughput is really high,
indicating read from cache, rather than over the network:
READ: bw=14.8GiB/s (15.9GB/s), 3783MiB/s-4270MiB/s (3967MB/s-4477MB/s),
io=64.0GiB (68.7GB), run=3837-4331msec

Expected results:

The read test in step 2 should be reading from page cache, and should be giving
throughput close to what we get in step 3.

Additional Info:

gluster volume info:

Volume Name: perfvol
Type: Distribute
Volume ID: 7033539b-0331-44b1-96cf-46ddc6ee2255
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 172.16.70.128:/mnt/rhs_brick1
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.