[Bugs] [Bug 1478411] Directory listings on fuse mount are very slow due to small number of getdents () entries

bugzilla at redhat.com bugzilla at redhat.com
Tue Jan 2 12:20:17 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1478411

nh2 <nh2-redhatbugzilla at deditus.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|needinfo?(pgurusid at redhat.c |needinfo?(rgowdapp at redhat.c
                   |om)                         |om)
                   |needinfo?(nh2-redhatbugzill |
                   |a at deditus.de)               |



--- Comment #35 from nh2 <nh2-redhatbugzilla at deditus.de> ---
@ Raghavendra G:

OK, so with the rda-request-size limit bumped in readdir-ahead.c, and set to
1310720 on the volume, I get:

    # gluster volume get myvol performance.rda-request-size
    performance.rda-request-size            1310720

    # strace -Te getdents ls -1U largenames-10k/ > /dev/null
    getdents(3, /* 604 entries */, 131072)  = 43392 <0.102401>
    getdents(3, /* 603 entries */, 131072)  = 43416 <0.001757>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.001937>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.007563>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.001913>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.003381>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.011234>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.018437>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.003767>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.002458>
    getdents(3, /* 591 entries */, 131072)  = 42552 <0.011717>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.020970>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.001548>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.001877>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.002279>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.001776>
    getdents(3, /* 391 entries */, 131072)  = 28152 <0.001263>
    getdents(3, /* 0 entries */, 131072)    = 0 <0.000211>

That seems to do something!

The getdents() calls still return the same ~40k number of bytes as before, but
the durations of the calls are now changed:

When before every ~2nd getdents took 0.1 seconds, a long call now happens only
at the beginning for largenames-10k/.

However, that initial call now takes much longer, so that the total time for
all getdents() calls together remains the same as in comment #28 and there is
no time improvement overall (see another strace further down this post that
shows this).

With largenames-100k:

    getdents(3, /* 604 entries */, 131072)  = 43392 <0.163892>
    getdents(3, /* 602 entries */, 131072)  = 43344 <0.003254>
    getdents(3, /* 599 entries */, 131072)  = 43128 <0.003669>
    getdents(3, /* 599 entries */, 131072)  = 43128 <0.010538>
    getdents(3, /* 599 entries */, 131072)  = 43128 <0.007300>
    getdents(3, /* 599 entries */, 131072)  = 43128 <0.017373>
    getdents(3, /* 599 entries */, 131072)  = 43128 <0.003721>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.008421>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.016774>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.004317>
    getdents(3, /* 593 entries */, 131072)  = 42696 <0.006771>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.132223>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003357>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003823>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.008572>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.006946>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.005080>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.135234>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.008050>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.017798>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.015384>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.006461>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.079356>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003682>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.008991>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.009290>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.005059>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.017135>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.009058>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.012989>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.007093>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.006234>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.011052>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.117044>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003382>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003615>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.010157>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.009356>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.053361>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.005857>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.018655>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.007252>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.015048>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.005514>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.099730>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.056949>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003871>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.018724>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003966>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.017172>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003817>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.013045>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.011726>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.007532>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.016091>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.086207>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003886>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.042044>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.005362>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003997>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.017465>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.009604>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.030751>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.007936>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.015482>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.007543>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.114697>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003654>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003635>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.010172>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.007981>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.016444>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003645>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.005491>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.018165>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003652>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.014853>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.109726>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.007251>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.009851>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.007128>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.018354>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.129458>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.013488>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.006543>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.006349>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.013647>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.007909>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.052205>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003597>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003964>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.012631>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003580>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.020874>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003798>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003759>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.014521>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.007281>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.015407>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.117141>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003797>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003647>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.012896>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.005515>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.015374>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.007836>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.013366>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.008682>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.144207>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.016566>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.052704>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003120>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.004119>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.013704>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003501>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003531>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.016954>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.005842>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.014438>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.007029>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.014823>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.114327>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003237>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.011245>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003815>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.006510>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.005199>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.019505>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.008927>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003749>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.015448>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.007190>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.179178>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.044503>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.013554>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.007634>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.005732>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.016517>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.005310>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.015247>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.006601>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.010271>
    getdents(3, /* 595 entries */, 131072)  = 42840 <0.008384>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.096027>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003394>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003681>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.011489>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.005808>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.018045>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.005568>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.013506>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.007447>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.006223>
    getdents(3, /* 587 entries */, 131072)  = 42264 <0.005347>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.219264>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.003878>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.014202>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.004581>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.012905>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.030729>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.041575>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.007530>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.005656>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.005534>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.008494>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.005954>
    getdents(3, /* 598 entries */, 131072)  = 43056 <0.020728>
    getdents(3, /* 140 entries */, 131072)  = 10080 <0.002299>
    getdents(3, /* 0 entries */, 131072)    = 0 <0.001825>

That is changed too: Only around every 10th entry takes 0.1 seconds. But in
sum, this still takes as much time as it did before, just that it's spent in
larger clusters.

Another thing I notice is that performance goes bad when the directory is
written to in parallel to calling the getdents (e.g. with a `touch
largenames-100k/{1..N}` still running in th background):
Then every single getdents() call takes ~0.1 seconds.

I have another question about what you said:

You mentioned that the "translator readdir-ahead tries to hide the readdirplus
latency on bricks". Is readdir-latency on bricks really an issue?
For example, `ls -1U /data/glusterfs/myvol/brick1/brick/largenames-100k/` takes
only 0.15 seconds real time on the brick when the directory is currently being
written to, and 0.06 seconds if it's not being written to.
On the `largenames-10k` on the brick, it's another 10x faster than for 100k, as
expected.
An individual `getdents(3, /* 455 entries */, 32768) = 32760` call on the brick
takes between 40 and 400 micro-seconds (average 80 us) on the brick when the
directory is being written to, and 4x less when it's not being written to.
So I would find it very surprising if readdir latency on the brick FS itself
could make any noticeable impact; can you explain it a bit?

Comparing directly the impact of `rda-request-size` 1310720 vs 131072:

With rda-request-size 1310720 (10x higher than default):

    # gluster vol set myvol performance.rda-request-size 1310720
    volume set: success

    # strace -Te getdents ls -1U largenames-10k/ > /dev/null
    getdents(3, /* 604 entries */, 131072)  = 43392 <0.128277>
    getdents(3, /* 603 entries */, 131072)  = 43416 <0.001492>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.001961>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.001765>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.006883>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.003683>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.003105>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.015737>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.001787>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.001689>
    getdents(3, /* 591 entries */, 131072)  = 42552 <0.012255>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.033856>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.001385>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.001721>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.001668>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.002190>
    getdents(3, /* 391 entries */, 131072)  = 28152 <0.001552>
    getdents(3, /* 0 entries */, 131072)    = 0 <0.000254>
    +++ exited with 0 +++

    # strace -wcfe getdents ls -1U largenames-10k/ > /dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.210427       11690        18           getdents
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.210427                    18           total

With rda-request-size 131072 (default):

    # gluster vol set myvol performance.rda-request-size 131072
    volume set: success

    # strace -Te getdents ls -1U largenames-10k/ > /dev/null
    getdents(3, /* 604 entries */, 131072)  = 43392 <0.021472>
    getdents(3, /* 602 entries */, 131072)  = 43344 <0.001393>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.025918>
    getdents(3, /* 599 entries */, 131072)  = 43128 <0.001342>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.027952>
    getdents(3, /* 599 entries */, 131072)  = 43128 <0.001415>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.024628>
    getdents(3, /* 599 entries */, 131072)  = 43128 <0.001430>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.025017>
    getdents(3, /* 599 entries */, 131072)  = 43128 <0.001329>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.033111>
    getdents(3, /* 599 entries */, 131072)  = 43128 <0.001497>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.033382>
    getdents(3, /* 599 entries */, 131072)  = 43128 <0.001328>
    getdents(3, /* 601 entries */, 131072)  = 43272 <0.034641>
    getdents(3, /* 599 entries */, 131072)  = 43128 <0.001393>
    getdents(3, /* 396 entries */, 131072)  = 28512 <0.013024>
    getdents(3, /* 0 entries */, 131072)    = 0 <0.000068>
    +++ exited with 0 +++

    # strace -wcfe getdents ls -1U largenames-10k/ > /dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.218633       12146        18           getdents
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.218633                    18           total

In the above we can see: The same total time is spent, but with the larger
`rda-request-size 1310720` it's spent mostly upfront.

That begs the question:
What is gluster doing that makes the first `getdents()` call take entire 0.12
seconds?
As mentioned before, `time ls -1U
/data/glusterfs/benacofs-staging/brick1/brick/largenames-10k/ > /dev/null` on
the brick takes 0.007 seconds, so it can't be brick FS latency.
It also can't be network latency, as the ping between the machines is 0.0004
seconds.
And it can't be a network throughput bottleneck, as the total throughput here
is 6 MB/s (comparable to the 5 MB/s I reported earlier), and my has a much
faster link.

My suspicion: Gluster is burning this time as CPU cycles.

The current TIME+ shown in htop for glusterfs and glusterfsd are:

5:47.62 glusterfs
4:52.75 glusterfsd

and not moving.

After I run `time ls -1U largenames-100k/ > /dev/null` which takes `real
`0m3.284s` they are:

5:48.51 glusterfs
4:54.56 glusterfsd

In relative times, the increase is:

0.89 glusterfs
1.81 glusterfsd

This is a very large chunk of the 3.28 seconds that the `ls` takes.

Also consider: The total amount of data returned by getdents() for
largenames-100k/ is around 7 MB. glusterfsd spends 1.81 seconds CPU time to
produce these 7 MB. That makes ~3.8 MB/s CPU throughput. This is not a
reasonable number for CPU processing throughput. Gluster must be doing
something extremely inefficient to spend so much CPU time at so little data.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list