[Gluster-devel] Disperse volume : Sequential Writes

Pranith Kumar Karampuri pkarampu at redhat.com
Thu Jun 15 09:50:43 UTC 2017


On Thu, Jun 15, 2017 at 11:51 AM, Ashish Pandey <aspandey at redhat.com> wrote:

> Hi All,
>
> We have been facing some issues in disperse (EC) volume.
> We know that currently EC is not good for random IO as it requires
> READ-MODIFY-WRITE fop
> cycle if an offset and offset+length falls in the middle of strip size.
>
> Unfortunately, it could also happen with sequential writes.
> Consider an EC volume with configuration  4+2. The stripe size for this
> would be 512 * 4 = 2048. That is, 2048 bytes of user data stored in one
> stripe.
> Let's say 2048 + 512 = 2560 bytes are already written on this volume. 512
> Bytes would be in second stripe.
> Now, if there are sequential writes with offset 2560 and of size 1 Byte,
> we have to read the whole stripe, encode it with 1 Byte and then again have
> to write it back.
> Next, write with offset 2561 and size of 1 Byte will again
> READ-MODIFY-WRITE the whole stripe. This is causing bad performance.
>
> There are some tools and scenario's where such kind of load is coming and
> users are not aware of that.
> Example: fio and zip
>
> Solution:
> One possible solution to deal with this issue is to keep last stripe in
> memory.
> This way, we need not to read it again and we can save READ fop going over
> the network.
> Considering the above example, we have to keep last 2048 bytes (maximum)
> in memory per file. This should not be a big
> deal as we already keep some data like xattr's and size info in memory and
> based on that we take decisions.
>
> Please provide your thoughts on this and also if you have any other
> solution.
>

Just adding more details.
The stripe will be in memory only when lock on the inode is active. One
thing we are yet to decide on is: do we want to read the stripe everytime
we get the lock or just after an extending write is performed. I am
thinking keeping the stripe in memory just after an extending write is
better as it doesn't involve extra network operation.



>
> ---
> Ashish
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170615/5acdca91/attachment.html>


More information about the Gluster-devel mailing list