[Gluster-devel] RFC: d_off encoding at client/protocol layer

Mon Jan 26 19:59:14 UTC 2015

Hi,

Some parts of this topic has been discussed in the recent past here [1]

The current mechanism of each xlator encoding the subvol in the lower or 
higher bits has its pitfalls as discussed in the threads and in this 
review, here [2]

Here is a solution design from the one of the comments posted on this by 
Avati here, [3], as in,

"One example approach (not necessarily the best): Make every xlator 
knows the total number of leaf xlators (protocol/clients), and also the 
number of all leaf xlators from each of its subvolumes. This way, the 
protocol/client xlators (alone) do the encoding, by knowing its global 
brick# and total #of bricks. The cluster xlators blindly forward the 
readdir_cbk without any further transformations of the d_offs, and also 
route the next readdir(old_doff) request to the appropriate subvolume 
based on the weighted graph (of counts of protocol/clients in the 
subtrees) till it reaches the right protocol/client to resume the 
enumeration."

So the current proposed scheme that is being worked on is as follows,
- encode the d_off with the client/protocol ID, which is generated as 
its leaf position/number
- no further encoding in any other xlator
- on receiving further readdir requests with the d_off, consult the, 
graph/or immediate children, on ID encoded in the d_off, and send the 
request down that subvol path

IOW, given a d_off and a common routine, pass the d_off with this (i.e 
current xlator) to get a subvol that the d_off belongs to. This routine 
would decode the d_off for the leaf ID as encoded in the client/protocol 
layer, and match its subvol relative to this and send that for further 
processing. (it may consult the graph or store the range of IDs that any 
subvol has w.r.t client/protocol and deliver the result appropriately).

Given the current situation of ext4 and xfs, and continuing with the ID 
encoding scheme, this seems to be the best manner of preventing multiple 
encoding of subvol stomping on each other, and also preserving (in a 
sense) further loss of bits. This scheme would also give AFR/EC the 
ability to load balance readdir requests across its subvols better, than 
have a static subvol to send to for a longer duration.

Thoughts/comments?

Shyam

[1] https://www.mail-archive.com/gluster-devel@gluster.org/msg02834.html
[2] review.gluster.org/#/c/8201/4/xlators/cluster/afr/src/afr-dir-read.c
[3] https://www.mail-archive.com/gluster-devel@gluster.org/msg02847.html