<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Apr 11, 2017 at 2:59 PM, Amar Tumballi <span dir="ltr">&lt;<a href="mailto:amarts@gmail.com" target="_blank">amarts@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Comments inline.<br><br><div><div class="gmail_extra"><div class="gmail_quote">On Mon, Dec 19, 2016 at 1:47 PM, Xavier Hernandez <span dir="ltr">&lt;<a href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="m_-6259276629431601807HOEnZb"><div class="m_-6259276629431601807h5">On 12/19/2016 07:57 AM, Aravinda wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

regards<br>

Aravinda<br>

<br>

On 12/16/2016 05:47 PM, Xavier Hernandez wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

On 12/16/2016 08:31 AM, Aravinda wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Proposal to add one more byte to GFID to store &quot;Type&quot; information.<br>

Extra byte will represent type(directory: 00, file: 01, Symlink: 02<br>

etc)<br>

<br>

For example, if a directory GFID is f4f18c02-0360-4cdc-8c00-0164e4<wbr>9a7afd<br>

then, GFID2 will be 00f4f18c02-0360-4cdc-8c00-0164<wbr>e49a7afd.<br>

<br>

Changes to Backend store<br>

------------------------<br>

Existing: .glusterfs/gfid[0:2]/gfid/[2:4<wbr>]/gfid<br>

Proposed: .glusterfs/gfid2[0:2]/gfid2[2:<wbr>4]/gfid2[4:6]/gfid2<br>

<br>

Advantages:<br>

-----------<br>

- Automatic grouping in .glusterfs directory based on file Type.<br>

- Easy identification of Type by looking at GFID in logs/status output<br>

  etc.<br></blockquote></blockquote></blockquote></div></div></blockquote><div><br></div><div>Above two will be good enough points to bump up the priority for the feature.<br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="m_-6259276629431601807HOEnZb"><div class="m_-6259276629431601807h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

- Crawling(Quota/AFR): List of directories can be easily fetched by<br>

  crawling `.glusterfs/gfid2[0:2]/` directory. This enables easy<br>

  parallel Crawling.<br></blockquote></blockquote></blockquote></div></div></blockquote><div><br></div><div>With the current design, we still have to do a distributed readdir() to get all <br>the entries in the directory. This layout change, along with proposed <br>DHT2/EHT/DHT2+ (name for me doesn&#39;t matter here) layout, where directory <br>entries would be created in just one place should enhance the performance overall.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="m_-6259276629431601807HOEnZb"><div class="m_-6259276629431601807h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

- Quota - Marker: Marker transator can mark xtime of current file and<br>

  parent directory. No need to update xtime xattr of all directories<br>

  till root.<br>

- Geo-replication: - Crawl can be multithreaded during initial sync.<br>

  With marker changes above it will be more effective in crawling.<br>

<br></blockquote></blockquote></blockquote></div></div></blockquote><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="m_-6259276629431601807HOEnZb"><div class="m_-6259276629431601807h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Please add if any more advantageous.<br>

<br>

Disadvantageous:<br>

----------------<br>

Functionality is not changed with the above change except the length<br>

of the ID. I can&#39;t think of any disadvantages except the code changes<br>

to accommodate this change. Let me know if I missed anything here.<br>

</blockquote>

<br>

One disadvantage is that 17 bytes is a very ugly number for<br>

structures. Compilers will add paddings that will make any structure<br>

containing a GFID noticeable bigger. This will also cause troubles on<br>

all binary formats where a GFID is used, making them incompatible. One<br>

clear case of this is the XDR encoding of the gluster protocol.<br>

Currently a GFID is defined this way in many places:<br>

<br>

        opaque gfid[16]<br>

<br>

This seems to make it quite complex to allow a mix of gluster versions<br>

in the same cluster (for example in a middle of an upgrade).<br></blockquote></blockquote></div></div></blockquote><div><br></div><div>Totally agree with Xavier here. Not in support of adding one more byte.<br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="m_-6259276629431601807HOEnZb"><div class="m_-6259276629431601807h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

What about this alternative approach:<br>

<br>

Based on the RFC4122 [1] that describes the format of an UUID, we can<br>

define a new structure for new GFID&#39;s using the same length.<br>

<br>

Currently all GFID&#39;s are generated using the &quot;random&quot; method. This<br>

means that all GFID have this structure:<br>

<br>

        xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxx<wbr>xxxxxx<br>

<br>

Where N can be 8, 9, A or B, and M is 4.<br>

<br>

There are some special GFID&#39;s that have a M=0 and N=0, for example the<br>

root GFID.<br>

<br>

What I propose is to use a new variant of GFID, for example E or F<br>

(officially marked as reserved for future definition) or even 0 to 7.<br>

We could use M as an internal version for the GFID structure (defined<br>

by ourselves when needed). Then we could use the first 4 or 8 bits of<br>

each GFID as you propose, without needing to extend current GFID<br>

length nor risking to collide with existing GFID&#39;s.<br>

<br>

If we are concerned about the collision probability (quite small but<br>

still bigger than the current version) because we loose some random<br>

bits, we could use N = 0..7 and leave M random. This way we get 5 more<br>

random bits, from which we could use 4 to represent the inode type.<br>

<br>

I think this way everything will work smoothly with older versions<br>

with minimal effort.<br>

<br>

What do you think ?<br>

</blockquote>

That is really nice suggestion.<br>

<br>

To get the crawling advantageous as mentioned above, we need to make<br>

backend store as .glusterfs/N/gfid[0:2]/gfid[2:<wbr>4]/gfid<br>

</blockquote>

<br></div></div>

That&#39;s one possibility. Since N will be 4 bits at most, it won&#39;t collide with currently existing subdirectories that represent 8 bits. Or we could use M. It all depends on the exact interpretation we give to each field.<br>

<br>

One suggestion I would make is to define it in a way that we use the minimal amount of bits to represent what we need now but leave space for future extensions. For example creating a &quot;reserved&quot; value for the field.<br>

<br></blockquote></div></div></div></div></blockquote><div><br></div><div>While discussing this with Aravinda, we realized, if we just make changes in UUID generation logic, we don&#39;t need to be worried about version incompatibility.<br><br></div><div>Also, I have a question, What are the chances of uuid collision if we take just 3 bits from the first byte ? <br><br></div><div>000 - Unspecified (can be anything).<br></div><div>001 - Directory<br></div><div>010 - Regular File<br></div><div>011 - Special files (symlink, Block and Char devices, socket files etc).<br></div><div>{100 - 111} - Reserved.<br><br></div><div>As a side-effect, it reduces the number of directories created at as the metadata, inside of .glusterfs directory. (Will be 50% of current load). <br><br></div><div>-Amar<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Proposal:<br>

<br>

Use N = 00xx for special GFID&#39;s, like NULL GFID, or the ones currently used in some places. All these will also have M = 0. All other values of M will be reserved for future extensions.<br>

<br>

Also reserve all other values of N (01xx) for future extensions.<br>

<br>

This gives a lot of space to represent many things in the future if necessary, while keeping current usage compatible with it.<br>

<br>

For this particular case we could use N = 0000 and define M as (this is a mapping of the posix S_IFxxx values):<br>

<br>

M = 0000 Current special GFID&#39;s<br>

M = 0001 Fifo (S_IFIFO)<br>

M = 0010 Character Device (S_IFCHR)<br>

M = 0100 Directory (S_IFDIR)<br>

M = 0110 Block Device (S_IFBLK)<br>

M = 1000 Regular File (S_IFREG)<br>

M = 1010 Symbolic Link (S_IFLNK)<br>

M = 1100 Socket (S_IFSOCK)<br>

<br>

M = xx11 \<br>

M = x1x1  | Reserved for future extensions<br>

M = 1xx1  |<br>

M = 111x /<br>

<br>

If we use our own mapping instead of using the same values than IF_Sxxx macros, we can get a more compact representation if needed.<br>

<br>

In this case the directory structure could be .glusterfs/M/gfid[0:2]/gfid[2:<wbr>4]/gfid. And use M = 0 to put all current existing gfid&#39;s, or we could leave existing gfid&#39;s in their current location.<br>

<br>

Or we could even have .glusterfs/NM/gfid[0:2]/gfid[2<wbr>:4]/gfid. This would probably be compatible even with future extensions.<br>

<br></blockquote><div><br></div><div>I would go with only &#39;M&#39; being considered for current layout and keeping N for future developments. Even though we are not considering &#39;N&#39; internally, we can keep directory name as &#39;00MM&#39; (zero zero M M). so that backend layout would be compatible to consider N later if required.<br><br></div><div>One major thing is we need a solid plan for migration from current layout to newer layout.<br><br></div><div>Regards,<br></div><div>Amar<br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Xavi<div class="m_-6259276629431601807HOEnZb"><div class="m_-6259276629431601807h5"><br>

<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Xavi<br>

<br>

[1] <a href="https://www.ietf.org/rfc/rfc4122.txt" rel="noreferrer" target="_blank">https://www.ietf.org/rfc/rfc41<wbr>22.txt</a><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Changes:<br>

---------<br>

- Code changes to accommodate 17 bytes GFID instead of 16 bytes(Read<br>

  and Write)<br>

- Migration Tool to upgrade GFIDs in Volume/Cluster<br>

<br>

Let me know your thoughts.<br>

<br>

</blockquote>

<br>

</blockquote>

<br>

</blockquote>

<br>

______________________________<wbr>_________________<br>

Gluster-devel mailing list<br>

<a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a><br>

<a href="http://www.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">http://www.gluster.org/mailman<wbr>/listinfo/gluster-devel</a><br>

</div></div></blockquote></div><br></div></div></div>

<br>______________________________<wbr>_________________<br>

Gluster-devel mailing list<br>

<a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br>

<a href="http://lists.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-devel</a><br></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Amar Tumballi (amarts)<br></div></div></div></div></div>

</div></div>