[Bugs] [Bug 1529883] New: glusterfind is extremely slow if there are lots of changes

bugzilla at redhat.com bugzilla at redhat.com
Sat Dec 30 18:54:02 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1529883

            Bug ID: 1529883
           Summary: glusterfind is extremely slow if there are lots of
                    changes
           Product: GlusterFS
           Version: mainline
         Component: glusterfind
          Assignee: bugs at gluster.org
          Reporter: nh2-redhatbugzilla at deditus.de
        QA Contact: bugs at gluster.org
                CC: avishwan at redhat.com, bugs at gluster.org,
                    khiremat at redhat.com



Description of problem:

I noticed that my glusterfind on 3.12.3 ran for 100s of hours straight without
terminating.

A quick strace showed that there were tons of pread64() syscalls in between
each open() of a CHANGELOG.* file.

Looking in /proc/$(pidof glusterfind)/fd, I found that the file it's
pread64()ing from is the `tmp_output_1` sqlite file. It was clearly reading the
entire database in via those syscalls for each *line* of each CHANGELOG.* file.

To make it very clear, it was doing:

for each CHANGELOG file:
  for each line in that file:
     read in the entire SQL database contents (9 MB in my case)

Looking into the code, it beacame clear that there's a simple check implemented
in glusterfind whether some line of a CHANGELOG.* file is already in the DB.
That is done by checking whether some `gfid` is already in the `gfid` column.

Unfortunately that column didn't have an SQL index defined, thus resulting in a
full scan over the database for each check if the line already exists.

If you use sqlite you must really make sure to use indexes, because otherwise
any O(1) or O(log n) operation turns into a O(n) operation, thus giving
glusterfind O(n²) complexity.

I will submit a patch.

It makes glusterfind 150x faster for me.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list