[Bugs] [Bug 1401812] New: RFE: Make readdirp parallel in dht

bugzilla at redhat.com bugzilla at redhat.com
Tue Dec 6 07:26:48 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1401812

            Bug ID: 1401812
           Summary: RFE: Make readdirp parallel in dht
           Product: GlusterFS
           Version: mainline
         Component: distribute
          Assignee: bugs at gluster.org
          Reporter: pgurusid at redhat.com
                CC: bugs at gluster.org



Description of problem:
Currently readdirp is sequential at the dht layer. This make find and recursive
listing of small directories very slow(directory whose content can be
accomodated in one readdirp call, eg: ~600 entries if buf size is 128k).

The number of readdirp fops
required to fetch the ls -l -R for nested directories is:
no. of fops = (x + 1) * m * n
n = number of bricks
m = number of directories
x = number of readdirp calls required to fetch the dentries completely
(this depends on the size of the directory and the readdirp buf size)
1 = readdirp fop that is sent to just detect the end of directory.

Eg: Let's say, to list 800 directories with files ~300 each and readdirp
buf size 128K, on distribute 6:
(1+1) * 800 * 6 = 9600 fops

And all the readdirp fops are not sent in parallel to all the bricks, but
sequentially. This patch is a first step towards making readdirp parallel
With parallel readdirp, the number of fops may not decrease drastically
but since they are issued in parallel, it will increase the throughput.

Why its not a straightforward problem to solve:
One needs to briefly understand, how the directory offset is handled in dht.
[1], [2], [3] are some of the links that will hint the same.
- The d_off is in the order of bricks identfied by dht. Hence, the dentries
should always be returned in the same order as bricks. i.e. brick2 entries
shouldn't be returned before brick1 reaches EOD.
- We cannot store any info of offset read so far etc. in inode_ctx or fd_ctx
- In case of a very large directories, and readdirp buf too small to hold
all the dentries in any brick, parallel readdirp is a overhead. Sequential
readdirp best suits the large directories. This demands dht be aware of or
speculate the directory size.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list