[Gluster-devel] GFID2 - Proposal to add extra byte to existing GFID

Mon Dec 19 08:17:05 UTC 2016

On 12/19/2016 07:57 AM, Aravinda wrote:
>
> regards
> Aravinda
>
> On 12/16/2016 05:47 PM, Xavier Hernandez wrote:
>> On 12/16/2016 08:31 AM, Aravinda wrote:
>>> Proposal to add one more byte to GFID to store "Type" information.
>>> Extra byte will represent type(directory: 00, file: 01, Symlink: 02
>>> etc)
>>>
>>> For example, if a directory GFID is f4f18c02-0360-4cdc-8c00-0164e49a7afd
>>> then, GFID2 will be 00f4f18c02-0360-4cdc-8c00-0164e49a7afd.
>>>
>>> Changes to Backend store
>>> ------------------------
>>> Existing: .glusterfs/gfid[0:2]/gfid/[2:4]/gfid
>>> Proposed: .glusterfs/gfid2[0:2]/gfid2[2:4]/gfid2[4:6]/gfid2
>>>
>>> Advantages:
>>> -----------
>>> - Automatic grouping in .glusterfs directory based on file Type.
>>> - Easy identification of Type by looking at GFID in logs/status output
>>>   etc.
>>> - Crawling(Quota/AFR): List of directories can be easily fetched by
>>>   crawling `.glusterfs/gfid2[0:2]/` directory. This enables easy
>>>   parallel Crawling.
>>> - Quota - Marker: Marker transator can mark xtime of current file and
>>>   parent directory. No need to update xtime xattr of all directories
>>>   till root.
>>> - Geo-replication: - Crawl can be multithreaded during initial sync.
>>>   With marker changes above it will be more effective in crawling.
>>>
>>> Please add if any more advantageous.
>>>
>>> Disadvantageous:
>>> ----------------
>>> Functionality is not changed with the above change except the length
>>> of the ID. I can't think of any disadvantages except the code changes
>>> to accommodate this change. Let me know if I missed anything here.
>>
>> One disadvantage is that 17 bytes is a very ugly number for
>> structures. Compilers will add paddings that will make any structure
>> containing a GFID noticeable bigger. This will also cause troubles on
>> all binary formats where a GFID is used, making them incompatible. One
>> clear case of this is the XDR encoding of the gluster protocol.
>> Currently a GFID is defined this way in many places:
>>
>>         opaque gfid[16]
>>
>> This seems to make it quite complex to allow a mix of gluster versions
>> in the same cluster (for example in a middle of an upgrade).
>>
>> What about this alternative approach:
>>
>> Based on the RFC4122 [1] that describes the format of an UUID, we can
>> define a new structure for new GFID's using the same length.
>>
>> Currently all GFID's are generated using the "random" method. This
>> means that all GFID have this structure:
>>
>>         xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx
>>
>> Where N can be 8, 9, A or B, and M is 4.
>>
>> There are some special GFID's that have a M=0 and N=0, for example the
>> root GFID.
>>
>> What I propose is to use a new variant of GFID, for example E or F
>> (officially marked as reserved for future definition) or even 0 to 7.
>> We could use M as an internal version for the GFID structure (defined
>> by ourselves when needed). Then we could use the first 4 or 8 bits of
>> each GFID as you propose, without needing to extend current GFID
>> length nor risking to collide with existing GFID's.
>>
>> If we are concerned about the collision probability (quite small but
>> still bigger than the current version) because we loose some random
>> bits, we could use N = 0..7 and leave M random. This way we get 5 more
>> random bits, from which we could use 4 to represent the inode type.
>>
>> I think this way everything will work smoothly with older versions
>> with minimal effort.
>>
>> What do you think ?
> That is really nice suggestion.
>
> To get the crawling advantageous as mentioned above, we need to make
> backend store as .glusterfs/N/gfid[0:2]/gfid[2:4]/gfid

That's one possibility. Since N will be 4 bits at most, it won't collide 
with currently existing subdirectories that represent 8 bits. Or we 
could use M. It all depends on the exact interpretation we give to each 
field.

One suggestion I would make is to define it in a way that we use the 
minimal amount of bits to represent what we need now but leave space for 
future extensions. For example creating a "reserved" value for the field.

Proposal:

Use N = 00xx for special GFID's, like NULL GFID, or the ones currently 
used in some places. All these will also have M = 0. All other values of 
M will be reserved for future extensions.

Also reserve all other values of N (01xx) for future extensions.

This gives a lot of space to represent many things in the future if 
necessary, while keeping current usage compatible with it.

For this particular case we could use N = 0000 and define M as (this is 
a mapping of the posix S_IFxxx values):

M = 0000 Current special GFID's
M = 0001 Fifo (S_IFIFO)
M = 0010 Character Device (S_IFCHR)
M = 0100 Directory (S_IFDIR)
M = 0110 Block Device (S_IFBLK)
M = 1000 Regular File (S_IFREG)
M = 1010 Symbolic Link (S_IFLNK)
M = 1100 Socket (S_IFSOCK)

M = xx11 \
M = x1x1  | Reserved for future extensions
M = 1xx1  |
M = 111x /

If we use our own mapping instead of using the same values than IF_Sxxx 
macros, we can get a more compact representation if needed.

In this case the directory structure could be 
.glusterfs/M/gfid[0:2]/gfid[2:4]/gfid. And use M = 0 to put all current 
existing gfid's, or we could leave existing gfid's in their current 
location.

Or we could even have .glusterfs/NM/gfid[0:2]/gfid[2:4]/gfid. This would 
probably be compatible even with future extensions.

Xavi

>>
>> Xavi
>>
>> [1] https://www.ietf.org/rfc/rfc4122.txt
>>
>>>
>>> Changes:
>>> ---------
>>> - Code changes to accommodate 17 bytes GFID instead of 16 bytes(Read
>>>   and Write)
>>> - Migration Tool to upgrade GFIDs in Volume/Cluster
>>>
>>> Let me know your thoughts.
>>>
>>
>