[Gluster-devel] Readdir d_off encoding

Mon Dec 15 20:46:37 UTC 2014

With the changes present in [1] and [2],

A short explanation of the change would be, we encode the subvol ID in 
the d_off, losing 'n + 1' bits in case the high order n+1 bits of the 
underlying xlator returned d_off is not free. (Best to read the commit 
message for [1] :) )

Although not related to the latest patch, here is something to consider 
for the future:

We now have DHT, AFR, EC(?), DHT over DHT (Tier) which need subvol 
encoding in the returned readdir offset. Due to this, the loss in bits 
_may_ cause unwanted offset behavior, when used in the current scheme. 
As we would end up eating more bits than what we do at present.

Or IOW, we could be invalidating the assumption "both EXT4/XFS are 
tolerant in terms of the accuracy of the value presented
back in seekdir(). i.e, a seekdir(val) actually seeks to the entry which 
has the "closest" true offset."

Should we reconsider an in memory _cookie_ like approach that can help 
in this case?

It would invalidate (some or all based on the implementation) the 
following constraints that the current design resolves, (from, [1])
- Nothing to "remember in memory" or evict "old entries".
- Works fine across NFS server reboots and also NFS head failover.
- Tolerant to seekdir() to arbitrary locations.

But, would provide a more reliable readdir offset for use (when valid 
and not evicted, say).

How would NFS adapt to this? Does Ganesha need a better scheme when 
doing multi-head NFS fail over?

Thoughts?

Shyam
[1] http://review.gluster.org/#/c/4711/
[2] http://review.gluster.org/#/c/8201/