[Gluster-devel] Suggestions for improving the block/gluster driver in QEMU

Niels de Vos ndevos at redhat.com
Thu Jul 28 11:13:54 UTC 2016


On Thu, Jul 28, 2016 at 04:19:42PM +0530, Ravishankar N wrote:
> On 07/28/2016 03:32 PM, Niels de Vos wrote:
> > There are some features in QEMU that we could implement with the
> > existing libgfapi functions. Kevin asked me about this a while back, and
> > I have finally (sorry for the delay Kevin!) taken the time to look into
> > it.
> > 
> > There are some optional operations that can be set in the BlockDriver
> > structure. The ones missing that we could have, or have useless
> > implementations are these:
> > 
> >    .bdrv_get_info/.bdrv_refresh_limits:
> >      This seems to set values in a BlockDriverInfo and BlockLimits
> >      structure that is used by QEMUs block layer. By setting the right
> >      values, we can use glfs_discard() and glfs_zerofill() to reduce the
> >      writing of 0-bytes that QEMU falls back on at the moment.
> > 
> >    .bdrv_has_zero_init / qemu_gluster_has_zero_init:
> >      Currently always returns 0. But if a file gets created on a Gluster
> >      volume, it should never have old contents in it. Rewriting it with
> >      0-bytes looks unneeded to me.
> 
> N00b question, what is the need for separate glfs_discard() and
> glfs_zerofill() functions? Can we not just use glfs_fallocate() with
> appropriate flags?

glfs_fallocate() does not have an argument for flags :-/ If we introduce
it now, we'll change the API and existing libgfapi applications using
the function will fail to compile. It can be done though, and involved
implementing a new glfs_fallocate() with an updated symbol version. But,
it'll be painful for the existing applications in any case.

> posix_discard() in gluster seems to be using fallocate() with
> FALLOC_FL_PUNCH_HOLE flag. And posix_zerofill() can be made smarter to use
> FALLOC_FL_ZERO_RANGE and fallback to writing zeroes if ZERO_RANGE is not
> supported.

Oh, nice find! I was expecting that posix_zerofill() uses fallocate()
already... Definitely something that shoud be improved too. Care to file
a bug for that?

Thanks,
Niels


> Regards,
> Ravi
> 
> > 
> > With these improvements the gluster:// URL usage with QEMU (and now also
> > the new JSON QAPI), certain operations are expected to be a little
> > faster. Anyone starting to work on this would want to trace the actual
> > operations (on a single-brick volume) with ltrace/wireshark on the
> > system where QEMU runs.
> > 
> > Who is interested to take this on?
> > Niels
> > 
> > 
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160728/1ac9df6e/attachment.sig>


More information about the Gluster-devel mailing list