[Gluster-devel] RFC: d_off encoding at client/protocol layer
Shyam
srangana at redhat.com
Tue Feb 3 15:18:21 UTC 2015
On 02/02/2015 10:29 PM, Krishnan Parthasarathi wrote:
>>> IOW, given a d_off and a common routine, pass the d_off with this (i.e
>>> current xlator) to get a subvol that the d_off belongs to. This routine
>>> would decode the d_off for the leaf ID as encoded in the client/protocol
>>> layer, and match its subvol relative to this and send that for further
>>> processing. (it may consult the graph or store the range of IDs that any
>>> subvol has w.r.t client/protocol and deliver the result appropriately).
>>
>> What happens to this scheme when bricks are repeatedly added/removed?
The result should be no different than what the current scheme in code
does, i.e encode the subvol ID based on children of DHT, which is based
on dht_subvol_cnt, which means indirectly the order of children seen in
the graph.
I would further state, this change does not improve that limitation,
rather it just changes the encoding to a single point.
>
> IIUC, the leaf xlator encoding proposed should be performed during graph
> initialization and would remain static for the lifetime of the graph.
> When bricks are added or removed, it would trigger a graph change, and
> the new encoding would be computed. Further, it is guaranteed that
> ongoing (readdir) FOPs would complete in the same (old) graph and therefore
> the encoding should be unaffected by bricks being added/removed.
>
I would differ in the reasoning here, NFS clients store d_off returned
on directory scans, hence it is possible that they come back with those
d_off values post a graph switch and in this case it would be a fresh
opendir and then seeking to the d_off provided (with all the subvol ID
decoding etc.).
So in short, we are not immune to this.
Shyam
More information about the Gluster-devel
mailing list