[Bugs] [Bug 1378814] New: Files not being opened with o_direct flag during random read operation ( Glusterfs 3.8.2)

bugzilla at redhat.com bugzilla at redhat.com
Fri Sep 23 10:07:17 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1378814

            Bug ID: 1378814
           Summary: Files not being opened with o_direct flag during
                    random read operation (Glusterfs 3.8.2)
           Product: GlusterFS
           Version: 3.9
         Component: open-behind
          Keywords: Reopened
          Assignee: bugs at gluster.org
          Reporter: kdhananj at redhat.com
                CC: amukherj at redhat.com, bugs at gluster.org,
                    kdhananj at redhat.com, mpillai at redhat.com,
                    pkarampu at redhat.com, psuriset at redhat.com,
                    rcyriac at redhat.com, rgowdapp at redhat.com,
                    rsussman at redhat.com, sasundar at redhat.com,
                    shberry at redhat.com
        Depends On: 1375959
            Blocks: 1377556, 1378695



+++ This bug was initially created as a clone of Bug #1375959 +++

Description of problem:

Hi,

I am running random workload test using pbench_fio and passing Direct=1 flag to
bypass the cache. On the server side I have performance.strict-o-direct: on.

The expectation was that for all workload no activity would be seen on server
cache. On the contrary I am seeing full 48G of server memory is being utilized
for Random reads workload.

I collected strace on the brick process and saw no files opened with o_direct
flag. Also vmstat data collected on server shows full utilization of 48G of
server memory.

I have 6 node server setup and running workload on them from 5 clients. I have
replica 2 volume. Here is vol info:

Volume Name: rep2
Type: Distributed-Replicate
Volume ID: 08bc410b-fba4-4a81-b918-fe5239947eef
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 172.17.40.13:/bricks/b01/g
Brick2: 172.17.40.14:/bricks/b01/g
Brick3: 172.17.40.15:/bricks/b01/g
Brick4: 172.17.40.16:/bricks/b01/g
Brick5: 172.17.40.22:/bricks/b01/g
Brick6: 172.17.40.24:/bricks/b01/g
Options Reconfigured:
performance.quick-read: off
performance.open-behind: off
performance.strict-o-direct: on
transport.address-family: inet
performance.readdir-ahead: on


Version-Release number of selected component (if applicable):

On server side:

rpm -qa | grep gluster
glusterfs-cli-3.8.2-1.el7.x86_64
glusterfs-3.8.2-1.el7.x86_64
glusterfs-api-3.8.2-1.el7.x86_64
glusterfs-libs-3.8.2-1.el7.x86_64
glusterfs-fuse-3.8.2-1.el7.x86_64
glusterfs-client-xlators-3.8.2-1.el7.x86_64
glusterfs-server-3.8.2-1.el7.x86_64

On client side:

rpm -qa | grep gluster
glusterfs-client-xlators-3.8.2-1.el7.x86_64
glusterfs-libs-3.8.2-1.el7.x86_64
glusterfs-3.8.2-1.el7.x86_64
glusterfs-fuse-3.8.2-1.el7.x86_64


How reproducible:

Its reproducible every-time

Steps to Reproduce:
1. Create a volume and set performance.strict-o-direct: on
2. Run random read workload using fio or any other equivalent workload
generator
3. Observe vmstat and collect strace output using strace -ff -T -p <pid> -o
<path-where-you-want-the-logs-captured>. To get PID ps aux | grep glusterfsd |
grep <volname>

Actual results:

Full 48G of cache memory is being used.

Expected results:

With performance.strict-o-direct: on being set, expectation was it would bypass
the cache completely.

Additional info:

Output of strace (See no file is being opened with o_direct flag)

 grep -i "o_direct" strace.log.*
strace.log.13755:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 31 <0.000063>
strace.log.13755:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/dirty",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 31 <0.000013>
strace.log.13755:openat(AT_FDCWD,
"/bricks/b01/g/.glusterfs/indices/entry-changes",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 31 <0.000013>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 31 <0.000118>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 31 <0.000114>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 31 <0.000142>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 31 <0.000141>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 31 <0.000141>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 31 <0.000141>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000023>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000089>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000058>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000094>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 26 <0.000104>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000028>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000099>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000018>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 26 <0.000106>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000028>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 29 <0.000077>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 29 <0.000074>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 29 <0.000119>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000028>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000036>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000028>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000109>
strace.log.13764:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/landfill",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000028>
strace.log.13789:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000029>
strace.log.13789:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000015>
strace.log.13789:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 26 <0.000015>
strace.log.13789:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000027>
strace.log.13789:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 29 <0.000050>
strace.log.13789:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 30 <0.000034>
strace.log.13789:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000018>
strace.log.13789:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 26 <0.000128>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000014>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000028>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000014>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000038>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 26 <0.000111>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000014>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000017>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000025>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 29 <0.000028>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000026>
strace.log.14342:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000075>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000022>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 29 <0.000013>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000013>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 29 <0.000025>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 29 <0.000021>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 29 <0.000022>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000016>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 26 <0.000080>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000027>
strace.log.14343:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000018>
strace.log.14358:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000022>
strace.log.14358:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 28 <0.000026>
strace.log.14358:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000017>
strace.log.14359:openat(AT_FDCWD, "/bricks/b01/g/.glusterfs/indices/xattrop",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27 <0.000014>

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-09-14
06:49:16 EDT ---

This bug is automatically being proposed for the current release of Red Hat
Gluster Storage 3 under active development, by setting the release flag
'rhgs‑3.2.0' to '?'. 

If this bug should be proposed for a different release, please manually change
the proposed release flag.

--- Additional comment from Atin Mukherjee on 2016-09-14 13:28:19 EDT ---

Shekhar - please change the product of this BZ to GlusterFS as the testing was
done with glusterfs 3.8.2 bits.

--- Additional comment from Pranith Kumar K on 2016-09-14 13:39:06 EDT ---

Shekhar,
   You need the following two options for O_DIRECT to be handled properly:

1) performance.strict-o-direct on
2) network.remote-dio off

Pranith

--- Additional comment from Shekhar Berry on 2016-09-14 13:45:06 EDT ---

Pranith,

To enable o_direct I turned performance.strict-o-direct on and my
network.remote-dio was disabled by default. 

performance.quick-read: off
performance.open-behind: off

are two options which I tried as part of troubleshooting exercise with Krutika
and Raghavendra.

I am reopening this issue.

--Shekhar

--- Additional comment from Krutika Dhananjay on 2016-09-14 14:17:53 EDT ---

(In reply to Pranith Kumar K from comment #3)
> Shekhar,
>    You need the following two options for O_DIRECT to be handled properly:
> 
> 1) performance.strict-o-direct on
> 2) network.remote-dio off
> 
> Pranith

@Pranith - Part of the problem seems to be that the anonymous fds created by
open-behind don't inherit the original open() flags. And when ob does wind
reads and writes on these anon fds, posix winds up using the usual
GF_ANON_FD_FLAGS (O_RDWR|O_LARGEFILE) value at the time of open() preceding the
invoked fd operation (mostly read/write).

FWIW, I did try the same test Shekhar ran on my test cluster and disabling
open-behind in addition to the two already known o-direct options that you
mentioned above, seemed to fix the issue of growing cache size as the test
progresses (although it didn't have any effect as per Shekhar when he tried the
same).

I am yet to confirm the theory (and the presence/absence of O_DIRECT flag at
the level of posix) through strace output of the bricks with and without
open-behind. I will do the same tomorrow and update the bug.

-Krutika

--- Additional comment from Pranith Kumar K on 2016-09-14 15:26:23 EDT ---

(In reply to Shekhar Berry from comment #4)
> Pranith,
> 
> To enable o_direct I turned performance.strict-o-direct on and my
> network.remote-dio was disabled by default. 
> 
> performance.quick-read: off
> performance.open-behind: off
> 
> are two options which I tried as part of troubleshooting exercise with
> Krutika and Raghavendra.
> 
> I am reopening this issue.
> 
> --Shekhar

I was so confident this is not a bug, sorry about closing the bug before coming
to a conclusion :-(

--- Additional comment from Pranith Kumar K on 2016-09-14 15:30:03 EDT ---

(In reply to Krutika Dhananjay from comment #5)
> (In reply to Pranith Kumar K from comment #3)
> > Shekhar,
> >    You need the following two options for O_DIRECT to be handled properly:
> > 
> > 1) performance.strict-o-direct on
> > 2) network.remote-dio off
> > 
> > Pranith
> 
> @Pranith - Part of the problem seems to be that the anonymous fds created by
> open-behind don't inherit the original open() flags. And when ob does wind
> reads and writes on these anon fds, posix winds up using the usual
> GF_ANON_FD_FLAGS (O_RDWR|O_LARGEFILE) value at the time of open() preceding
> the invoked fd operation (mostly read/write).

As soon as the first write comes, an OPEN FOP with same flags that came at the
time of open() are sent out from open-behind, from then on even the reads are
served using this fd. Could you confirm if the test includes only reads and no
writes at all? In that case we have a bug we need to look into.

> 
> FWIW, I did try the same test Shekhar ran on my test cluster and disabling
> open-behind in addition to the two already known o-direct options that you
> mentioned above, seemed to fix the issue of growing cache size as the test
> progresses (although it didn't have any effect as per Shekhar when he tried
> the same).
> 
> I am yet to confirm the theory (and the presence/absence of O_DIRECT flag at
> the level of posix) through strace output of the bricks with and without
> open-behind. I will do the same tomorrow and update the bug.

Yes, this would be important to know.

> 
> -Krutika

--- Additional comment from Krutika Dhananjay on 2016-09-15 09:28:42 EDT ---

(In reply to Krutika Dhananjay from comment #5)
> (In reply to Pranith Kumar K from comment #3)
> > Shekhar,
> >    You need the following two options for O_DIRECT to be handled properly:
> > 
> > 1) performance.strict-o-direct on
> > 2) network.remote-dio off
> > 
> > Pranith
> 
> @Pranith - Part of the problem seems to be that the anonymous fds created by
> open-behind don't inherit the original open() flags. And when ob does wind
> reads and writes on these anon fds, posix winds up using the usual
> GF_ANON_FD_FLAGS (O_RDWR|O_LARGEFILE) value at the time of open() preceding
> the invoked fd operation (mostly read/write).
> 
> FWIW, I did try the same test Shekhar ran on my test cluster and disabling
> open-behind in addition to the two already known o-direct options that you
> mentioned above, seemed to fix the issue of growing cache size as the test
> progresses (although it didn't have any effect as per Shekhar when he tried
> the same).
> 
> I am yet to confirm the theory (and the presence/absence of O_DIRECT flag at
> the level of posix) through strace output of the bricks with and without
> open-behind. I will do the same tomorrow and update the bug.
> 
> -Krutika

I stand corrected. *Even* with open-behind disabled, like Shekhar rightly said,
the issue still exists. I even disabled all perf xls and the problem still
persists. I found that open()s are being issued on gfid handles with
O_RDONLY|O_DIRECT flags in the bricks' strace output with all these xls
disabled. I still need to check if pread()s are being issued on the same file
descriptors though.

Still have some more strace analysis to do. Will update the bug again once the
issue is RC'd.

-Krutika

--- Additional comment from Krutika Dhananjay on 2016-09-15 12:04:21 EDT ---

So I checked the strace output again. The moment I disable open-behind, I start
to see O_DIRECT in the strace logs. I grepped for other invocations of open()
and realised there are all these files opened with O_WRONLY flag (without
O_DIRECT of course) with a lot of pwrites being invoked on their fds. Took me a
while to realise that these are filenames with patterns fio_bw*.log,
fio_clat*.log, fio_lat*.log, etc - in other words log files created by fio
itself which were ending up on the glusterfs mount point.

Possible dumb question inbound - Shekhar, did you invoke the fio command when
your current working directory was the glusterfs mount point itself? I made
that mistake today which is why I saw the same results as you did. :)

Could you please confirm this?

I ran the test again now, this time from $HOME. Disabling open-behind caused
things to work for me.

The bug report however is still valid, I will send out a patch to make
open-behind handle O_DIRECT. Meanwhile you can work around the issue by
disabling open-behind for now.

-Krutika

--- Additional comment from Shekhar Berry on 2016-09-15 14:30:27 EDT ---

Krutika,

Thanks for looking into the bug.

Here are couple of my comments:

1) did you invoke the fio command when your current working directory was the
glusterfs mount point itself?

No, I am using pbench_fio for my testing and it stores run logs under
/var/lib/pbench-agent. Just FYI, I am not invoking pbench_fio from gluster
mount point itself :)

2) I redid the test with performance.open-behind: off and
performance.strict-o-direct: on option. The results are no different in my
setup.I am still seeing full cache utilization. 

I have not captured strace under this setting. I can capture and share that
with you if required.

--Shekhar

--- Additional comment from Krutika Dhananjay on 2016-09-16 01:12:36 EDT ---

Yes, strace output would be great.

1. Could you capture the strace logs of the bricks (the smaller the brick
count, the easier it would be for me :) ) for the following cases:

i. strict-o-direct on, and everything else remains default (implying that
open-behind will also be on)

ii. strict-o-direct on and open-behind turned off

iii. strict-o-direct on and all perf xls disabled (write-behind, open-behind,
quick-read, read-ahead, io-cache, stat-prefetch).

For capturing the strace output, you can use this:

# strace -ff -f -p <pid-of-brick> -o <path-where-you-want-the-output-captured>

Not that you may have to hit Ctrl+C on the strace terminal after every run.



2. Along with 1, also start the test script (fio in your case) itself using
strace. For this, you need to run execute the following:

# strace -ff -f -o fio-strace.log <your-test-invocation-command-here>

-Krutika

--- Additional comment from Shekhar Berry on 2016-09-17 03:54:25 EDT ---

I stand corrected. With 'Open-behind' off I am seeing files are now open with
o_direct flag for both write and read test.

I redid the test with following volume setting:

performance.open-behind: off
performance.strict-o-direct: on

Here's the strace output:

grep -i "o_direct" strace.log.* | egrep -v openat

strace.log.25363:open("/bricks/b01/g/.glusterfs/ab/2e/ab2ea2ca-1f6e-494a-b817-faec8627afe3",
O_RDWR|O_DIRECT) = 33 <0.000014>
strace.log.25363:open("/bricks/b01/g/.glusterfs/10/f7/10f7cf7d-bd68-4414-a3c8-cb3b12092452",
O_RDWR|O_DIRECT) = 35 <0.000016>
strace.log.25363:open("/bricks/b01/g/.glusterfs/51/ac/51acd630-6518-42f8-a9cf-c936dc5c3b9c",
O_RDWR|O_DIRECT) = 37 <0.000011>
strace.log.25363:open("/bricks/b01/g/.glusterfs/a1/41/a141600d-8fbb-41f9-b0b8-34318125839e",
O_RDWR|O_DIRECT) = 39 <0.000017>
strace.log.25363:open("/bricks/b01/g/.glusterfs/fa/cd/facda6a3-53c4-484e-a599-2488d0683c31",
O_RDWR|O_DIRECT) = 43 <0.000039>
strace.log.25363:open("/bricks/b01/g/.glusterfs/52/02/5202698e-51d2-40d3-9a61-9b40ce2b427b",
O_RDWR|O_DIRECT) = 44 <0.000014>
strace.log.25363:open("/bricks/b01/g/.glusterfs/ac/2c/ac2ceda0-d035-4b94-87bc-c12bced59c6c",
O_RDONLY|O_DIRECT) = 45 <0.000040>
strace.log.25363:open("/bricks/b01/g/.glusterfs/51/ac/51acd630-6518-42f8-a9cf-c936dc5c3b9c",
O_RDONLY|O_DIRECT) = 52 <0.000021>
strace.log.25454:open("/bricks/b01/g/.glusterfs/ca/47/ca475b51-9f61-4838-ba2c-8930b65a0c04",
O_RDWR|O_DIRECT) = 34 <0.000019>
strace.log.25454:open("/bricks/b01/g/.glusterfs/11/d7/11d7083f-a4d0-41c2-9b48-efbd899b831a",
O_RDWR|O_DIRECT) = 36 <0.000017>
strace.log.25454:open("/bricks/b01/g/.glusterfs/84/63/8463c0d7-3e68-44c7-b360-0a6622070c42",
O_RDWR|O_DIRECT) = 38 <0.000017>
strace.log.25454:open("/bricks/b01/g/.glusterfs/9a/4c/9a4cb41f-487d-464f-9238-9db7385ecf53",
O_RDWR|O_DIRECT) = 40 <0.000012>
strace.log.25454:open("/bricks/b01/g/.glusterfs/30/f9/30f929ce-9b66-4d3e-ab0f-a73c60d4d817",
O_RDWR|O_DIRECT) = 41 <0.000013>
strace.log.25454:open("/bricks/b01/g/.glusterfs/85/50/8550a796-172b-4390-b976-b228eafd9ada",
O_RDWR|O_DIRECT) = 42 <0.000016>
strace.log.25454:open("/bricks/b01/g/.glusterfs/9d/b2/9db21c69-39bc-4565-8793-0833cf21887d",
O_RDWR|O_DIRECT) = 45 <0.000020>
strace.log.25454:open("/bricks/b01/g/.glusterfs/ac/2c/ac2ceda0-d035-4b94-87bc-c12bced59c6c",
O_RDWR|O_DIRECT) = 48 <0.000026>
strace.log.25454:open("/bricks/b01/g/.glusterfs/a5/b6/a5b62db9-0d1a-49cb-b188-a1be216de017",
O_RDWR|O_DIRECT) = 50 <0.000021>
strace.log.25454:open("/bricks/b01/g/.glusterfs/c1/fb/c1fb470d-63dc-4d1e-8079-93877bdcb2a5",
O_RDONLY|O_DIRECT) = 36 <0.000026>
strace.log.25454:open("/bricks/b01/g/.glusterfs/11/d7/11d7083f-a4d0-41c2-9b48-efbd899b831a",
O_RDONLY|O_DIRECT) = 48 <0.000030>
strace.log.25455:open("/bricks/b01/g/.glusterfs/81/04/810408fe-eb40-42e3-b53b-981dbf290af3",
O_RDWR|O_DIRECT) = 49 <0.000037>
strace.log.25455:open("/bricks/b01/g/.glusterfs/c1/fb/c1fb470d-63dc-4d1e-8079-93877bdcb2a5",
O_RDWR|O_DIRECT) = 52 <0.000013>
strace.log.25455:open("/bricks/b01/g/.glusterfs/30/f9/30f929ce-9b66-4d3e-ab0f-a73c60d4d817",
O_RDONLY|O_DIRECT) = 33 <0.000022>
strace.log.25455:open("/bricks/b01/g/.glusterfs/9d/b2/9db21c69-39bc-4565-8793-0833cf21887d",
O_RDONLY|O_DIRECT) = 35 <0.000032>
strace.log.25455:open("/bricks/b01/g/.glusterfs/84/63/8463c0d7-3e68-44c7-b360-0a6622070c42",
O_RDONLY|O_DIRECT) = 49 <0.000027>
strace.log.25456:open("/bricks/b01/g/.glusterfs/2d/35/2d352297-ab44-4c92-8b1d-cb314f5f693b",
O_RDWR|O_DIRECT) = 51 <0.000029>
strace.log.25456:open("/bricks/b01/g/.glusterfs/fa/cd/facda6a3-53c4-484e-a599-2488d0683c31",
O_RDONLY|O_DIRECT) = 34 <0.000029>
strace.log.25457:open("/bricks/b01/g/.glusterfs/e5/13/e5134455-eb71-43b8-8662-355c79d5ef1a",
O_RDWR|O_DIRECT) = 46 <0.000018>
strace.log.25457:open("/bricks/b01/g/.glusterfs/49/3b/493b7a5b-42f7-4c0a-9a4c-48956a05444f",
O_RDWR|O_DIRECT) = 47 <0.000027>
strace.log.25457:open("/bricks/b01/g/.glusterfs/81/04/810408fe-eb40-42e3-b53b-981dbf290af3",
O_RDONLY|O_DIRECT) = 47 <0.000031>
strace.log.25458:open("/bricks/b01/g/.glusterfs/52/02/5202698e-51d2-40d3-9a61-9b40ce2b427b",
O_RDONLY|O_DIRECT) = 39 <0.000026>
strace.log.25458:open("/bricks/b01/g/.glusterfs/ab/2e/ab2ea2ca-1f6e-494a-b817-faec8627afe3",
O_RDONLY|O_DIRECT) = 46 <0.000037>
strace.log.25458:open("/bricks/b01/g/.glusterfs/ca/47/ca475b51-9f61-4838-ba2c-8930b65a0c04",
O_RDONLY|O_DIRECT) = 51 <0.000030>
strace.log.25459:open("/bricks/b01/g/.glusterfs/e5/13/e5134455-eb71-43b8-8662-355c79d5ef1a",
O_RDONLY|O_DIRECT) = 40 <0.000017>
strace.log.25459:open("/bricks/b01/g/.glusterfs/10/f7/10f7cf7d-bd68-4414-a3c8-cb3b12092452",
O_RDONLY|O_DIRECT) = 42 <0.000014>
strace.log.25459:open("/bricks/b01/g/.glusterfs/49/3b/493b7a5b-42f7-4c0a-9a4c-48956a05444f",
O_RDONLY|O_DIRECT) = 43 <0.000044>
strace.log.25460:open("/bricks/b01/g/.glusterfs/85/50/8550a796-172b-4390-b976-b228eafd9ada",
O_RDONLY|O_DIRECT) = 37 <0.000012>
strace.log.25461:open("/bricks/b01/g/.glusterfs/a5/b6/a5b62db9-0d1a-49cb-b188-a1be216de017",
O_RDONLY|O_DIRECT) = 38 <0.000039>
strace.log.25461:open("/bricks/b01/g/.glusterfs/9a/4c/9a4cb41f-487d-464f-9238-9db7385ecf53",
O_RDONLY|O_DIRECT) = 44 <0.000027>
strace.log.25461:open("/bricks/b01/g/.glusterfs/a1/41/a141600d-8fbb-41f9-b0b8-34318125839e",
O_RDONLY|O_DIRECT) = 50 <0.000015>
strace.log.25462:open("/bricks/b01/g/.glusterfs/2d/35/2d352297-ab44-4c92-8b1d-cb314f5f693b",
O_RDONLY|O_DIRECT) = 41 <0.000017>


--Shekhar

--- Additional comment from Worker Ant on 2016-09-23 06:04:16 EDT ---

REVIEW: http://review.gluster.org/15552 (performance/open-behind: Pass O_DIRECT
flags for anon fd reads when required) posted (#1) for review on release-3.8 by
Krutika Dhananjay (kdhananj at redhat.com)


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1375959
[Bug 1375959] Files not being opened with o_direct flag during random read
operation (Glusterfs 3.8.2)
https://bugzilla.redhat.com/show_bug.cgi?id=1377556
[Bug 1377556] Files not being opened with o_direct flag during random read
operation (Glusterfs 3.8.2)
https://bugzilla.redhat.com/show_bug.cgi?id=1378695
[Bug 1378695] Files not being opened with o_direct flag during random read
operation (Glusterfs 3.8.2)
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list