[Gluster-devel] Non Shared Persistent Gluster Storage with Kubernetes

Wed Jul 6 09:55:33 UTC 2016

On Wed, Jul 6, 2016 at 12:24 AM, Shyam <srangana at redhat.com> wrote:

> On 07/01/2016 01:45 AM, B.K.Raghuram wrote:
>
>> I have not gone through this implementation nor the new iscsi
>> implementation being worked on for 3.9 but I thought I'd share the
>> design behind a distributed iscsi implementation that we'd worked on
>> some time back based on the istgt code with a libgfapi hook.
>>
>> The implementation used the idea of using one file to represent one
>> block (of a chosen size) thus allowing us to use gluster as the backend
>> to store these files while presenting a single block device of possibly
>> infinite size. We used a fixed file naming convention based on the block
>> number which allows the system to determine which file(s) needs to be
>> operated on for the requested byte offset. This gave us the advantage of
>> automatically accessing all of gluster's file based functionality
>> underneath to provide a fully distributed iscsi implementation.
>>
>> Would this be similar to the new iscsi implementation thats being worked
>> on for 3.9?
>>
>
> <will let others correct me here, but...>
>
> Ultimately the idea would be to use sharding, as a part of the gluster
> volume graph, to distribute the blocks (or rather shard the blocks), rather
> than having the disk image on one distribute subvolume and hence scale disk
> sizes to the size of the cluster. Further, sharding should work well here,
> as this is a single client access case (or are we past that hurdle
> already?).
>

Not yet, we need common transaction frame in place to reduce the latency
for synchronization.

>
> What this achieves is similar to the iSCSI implementation that you talk
> about, but gluster doing the block splitting and hence distribution, rather
> than the iSCSI implementation (istgt) doing the same.
>
> < I did a cursory check on the blog post, but did not find a shard
> reference, so maybe others could pitch in here, if they know about the
> direction>
>

There are two directions which will eventually converge.
1) Granular data self-heal implementation so that taking snapshot becomes
as simple as reflink.
2) Bring in snapshots of file with shards - this is a bit involved compared
to the solution above.

Once 2) is also complete we will have both 1) + 2) combined so that
data-self-heal will heal the exact blocks inside each shard.

If the users are not worried about snapshots 2) is the best option.

> Further, in your original proposal, how do you maintain device properties,
> such as size of the device and used/free blocks? I ask about used and free,
> as that is an overhead to compute, if each block is maintained as a
> separate file by itself, or difficult to achieve consistency of the size
> and block update (as they are separate operations). Just curious.
>

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160706/d4632b9e/attachment-0001.html>