[Gluster-devel] RFC: d_off encoding at client/protocol layer

Tue Feb 3 15:18:21 UTC 2015

On 02/02/2015 10:29 PM, Krishnan Parthasarathi wrote:
>>> IOW, given a d_off and a common routine, pass the d_off with this (i.e
>>> current xlator) to get a subvol that the d_off belongs to. This routine
>>> would decode the d_off for the leaf ID as encoded in the client/protocol
>>> layer, and match its subvol relative to this and send that for further
>>> processing. (it may consult the graph or store the range of IDs that any
>>> subvol has w.r.t client/protocol and deliver the result appropriately).
>>
>> What happens to this scheme when bricks are repeatedly added/removed?

The result should be no different than what the current scheme in code 
does, i.e encode the subvol ID based on children of DHT, which is based 
on dht_subvol_cnt, which means indirectly the order of children seen in 
the graph.

I would further state, this change does not improve that limitation, 
rather it just changes the encoding to a single point.

>
> IIUC, the leaf xlator encoding proposed should be performed during graph
> initialization and would remain static for the lifetime of the graph.
> When bricks are added or removed, it would trigger a graph change, and
> the new encoding would be computed. Further, it is guaranteed that
> ongoing (readdir) FOPs would complete in the same (old) graph and therefore
> the encoding should be unaffected by bricks being added/removed.
>

I would differ in the reasoning here, NFS clients store d_off returned 
on directory scans, hence it is possible that they come back with those 
d_off values post a graph switch and in this case it would be a fresh 
opendir and then seeking to the d_off provided (with all the subvol ID 
decoding etc.).

So in short, we are not immune to this.

Shyam