<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, May 1, 2017 at 11:20 PM, Shyam <span dir="ltr">&lt;<a href="mailto:srangana@redhat.com" target="_blank">srangana@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 05/01/2017 01:13 PM, Pranith Kumar Karampuri wrote:<br>

</span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">

<br>

<br>

On Mon, May 1, 2017 at 10:42 PM, Pranith Kumar Karampuri<br></span><span class="">

&lt;<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a> &lt;mailto:<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;&gt; wrote:<br>

<br>

<br>

<br>

    On Mon, May 1, 2017 at 10:39 PM, Gandalf Corvotempesta<br>

    &lt;<a href="mailto:gandalf.corvotempesta@gmail.com" target="_blank">gandalf.corvotempesta@gmail.c<wbr>om</a><br></span><span class="">

    &lt;mailto:<a href="mailto:gandalf.corvotempesta@gmail.com" target="_blank">gandalf.corvotempesta@<wbr>gmail.com</a>&gt;&gt; wrote:<br>

<br>

        2017-05-01 18:57 GMT+02:00 Pranith Kumar Karampuri<br></span>

        &lt;<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a> &lt;mailto:<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;&gt;:<div><div class="h5"><br>

        &gt; Yes this is precisely what all the other SDS with metadata servers kind of<br>

        &gt; do. They kind of keep a map of on what all servers a particular file/blob is<br>

        &gt; stored in a metadata server.<br>

<br>

        Not exactly. Other SDS has some servers dedicated to metadata and,<br>

        personally, I don&#39;t like that approach.<br>

<br>

        &gt; GlusterFS doesn&#39;t do that. In GlusterFS what<br>

        &gt; bricks need to be replicated is always given and distribute layer on top of<br>

        &gt; these replication layer will do the job of distributing and fetching the<br>

        &gt; data. Because replication happens at a brick level and not at a file level<br>

        &gt; and distribute happens on top of replication and not at file level. There<br>

        &gt; isn&#39;t too much metadata that needs to be stored per file. Hence no need for<br>

        &gt; separate metadata servers.<br>

<br>

        And this is great, that&#39;s why i&#39;m talking about embedding a sort<br>

        of database<br>

        to be stored on all nodes. no metadata servers, only a mapping<br>

        between files<br>

        and servers.<br>

<br>

        &gt; If you know path of the file, you can always know where the file is stored<br>

        &gt; using pathinfo:<br>

        &gt; Method-2 in the following link:<br>

        &gt; <a href="https://gluster.readthedocs.io/en/latest/Troubleshooting/gfid-to-path/" rel="noreferrer" target="_blank">https://gluster.readthedocs.io<wbr>/en/latest/Troubleshooting/gfi<wbr>d-to-path/</a><br>

        &lt;<a href="https://gluster.readthedocs.io/en/latest/Troubleshooting/gfid-to-path/" rel="noreferrer" target="_blank">https://gluster.readthedocs.i<wbr>o/en/latest/Troubleshooting/gf<wbr>id-to-path/</a>&gt;<br>

        &gt;<br>

        &gt; You don&#39;t need any db.<br>

<br>

        For the current gluster yes.<br>

        I&#39;m talking about a different thing.<br>

<br>

        In a RAID, you have data stored somewhere on the array, with<br>

        metadata<br>

        defining how this data should<br>

        be wrote or read. obviously, raid metadata must be stored in a fixed<br>

        position, or you won&#39;t be able to read<br>

        that.<br>

<br>

        Something similiar could be added in gluster (i don&#39;t know if it<br>

        would<br>

        be hard): you store a file mapping in a fixed<br>

        position in gluster, then all gluster clients will be able to know<br>

        where a file is by looking at this &quot;metadata&quot; stored in<br>

        the fixed position.<br>

<br>

        Like &quot;.gluster&quot; directory. Gluster is using some &quot;internal&quot;<br>

        directories for internal operations (&quot;.shards&quot;, &quot;.gluster&quot;,<br>

        &quot;.trash&quot;)<br>

        A &quot;.metadata&quot; with file mapping would be hard to add ?<br>

<br>

        &gt; Basically what you want, if I understood correctly is:<br>

        &gt; If we add a 3rd node with just one disk, the data should automatically<br>

        &gt; arrange itself splitting itself to 3 categories(Assuming replica-2)<br>

        &gt; 1) Files that are present in Node1, Node2<br>

        &gt; 2) Files that are present in Node2, Node3<br>

        &gt; 3) Files that are present in Node1, Node3<br>

        &gt;<br>

        &gt; As you can see we arrived at a contradiction where all the nodes should have<br>

        &gt; at least 2 bricks but there is only 1 disk. Hence the contradiction. We<br>

        &gt; can&#39;t do what you are asking without brick splitting. i.e. we need to split<br>

        &gt; the disk into 2 bricks.<br>

</div></div></blockquote>

<br>

Splitting the bricks need not be a post factum decision, we can start with larger brick counts, on a given node/disk count, and hence spread these bricks to newer nodes/bricks as they are added.<br></blockquote><div><br>Let&#39;s say we have 1 disk, we format it with say XFS and that becomes a brick at the moment. Just curious, what will be the relationship between brick to disk in this case(If we leave out LVM for this example)? <br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

If I understand the ceph PG count, it works on a similar notion, till the cluster grows beyond the initial PG count (set for the pool) at which point there is a lot more data movement (as the pg count has to be increased, and hence existing PGs need to be further partitioned) . (just using ceph as an example, a similar approach exists for openstack swift with their partition power settings).<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">

<br>

        I don&#39;t think so.<br>

        Let&#39;s assume a replica 2.<br>

<br>

        S1B1 + S2B1<br>

<br>

        1TB each, thus 1TB available (2TB/2)<br>

<br>

        Adding a third 1TB disks should increase available space to<br>

        1.5TB (3TB/2)<br>

<br>

<br>

    I agree it should. Question is how? What will be the resulting<br>

    brick-map?<br>

<br>

<br>

I don&#39;t see any solution that we can do without at least 2 bricks on<br>

each of the 3 servers.<br>

<br>

<br>

<br>

<br>

    --<br>

    Pranith<br>

<br>

<br>

<br>

<br>

--<br>

Pranith<br>

<br>

<br></span><span class="">

______________________________<wbr>_________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a><br>

<br>

</span></blockquote>

</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">Pranith<br></div></div>

</div></div>