[GEDI] [RFC v4 11/11] virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint

Stefan Hajnoczi stefanha at redhat.com
Mon Sep 5 20:50:25 UTC 2022


On Fri, Sep 02, 2022 at 10:06:45AM +0200, David Hildenbrand wrote:
> On 30.08.22 22:16, Stefan Hajnoczi wrote:
> > On Thu, Aug 25, 2022 at 09:43:16AM +0200, David Hildenbrand wrote:
> >> On 23.08.22 21:22, Stefan Hajnoczi wrote:
> >>> On Tue, Aug 23, 2022 at 10:01:59AM +0200, David Hildenbrand wrote:
> >>>> On 23.08.22 00:24, Stefan Hajnoczi wrote:
> >>>>> Register guest RAM using BlockRAMRegistrar and set the
> >>>>> BDRV_REQ_REGISTERED_BUF flag so block drivers can optimize memory
> >>>>> accesses in I/O requests.
> >>>>>
> >>>>> This is for vdpa-blk, vhost-user-blk, and other I/O interfaces that rely
> >>>>> on DMA mapping/unmapping.
> >>>>
> >>>> Can you explain why we're monitoring RAMRegistrar to hook into "guest
> >>>> RAM" and not go the usual path of the MemoryListener?
> >>>
> >>> The requirements are similar to VFIO, which uses RAMBlockNotifier. We
> >>
> >> Only VFIO NVME uses RAMBlockNotifier. Ordinary VFIO uses the MemoryListener.
> >>
> >> Maybe the difference is that ordinary VFIO has to replicate the actual
> >> guest physical memory layout, and VFIO NVME is only interested in
> >> possible guest RAM inside guest physical memory.
> >>
> >>> need to learn about all guest RAM because that's where I/O buffers are
> >>> located.
> >>>
> >>> Do you think RAMBlockNotifier should be avoided?
> >>
> >> I assume it depends on the use case. For saying "this might be used for
> >> I/O" it might be good enough I guess.
> >>
> >>>
> >>>> What will BDRV_REQ_REGISTERED_BUF actually do? Pin all guest memory in
> >>>> the worst case such as io_uring fixed buffers would do ( I hope not ).
> >>>
> >>> BLK_REQ_REGISTERED_BUF is a hint that no bounce buffer is necessary
> >>> because the I/O buffer is located in memory that was previously
> >>> registered with bdrv_registered_buf().
> >>>
> >>> The RAMBlockNotifier calls bdrv_register_buf() to let the libblkio
> >>> driver know about RAM. Some libblkio drivers ignore this hint, io_uring
> >>> may use the fixed buffers feature, vhost-user sends the shared memory
> >>> file descriptors to the vhost device server, and VFIO/vhost may pin
> >>> pages.
> >>>
> >>> So the blkio block driver doesn't add anything new, it's the union of
> >>> VFIO/vhost/vhost-user/etc memory requirements.
> >>
> >> The issue is if that backend pins memory inside any of these regions.
> >> Then, you're instantly incompatible to anything the relies on sparse
> >> RAMBlocks, such as memory ballooning or virtio-mem, and have to properly
> >> fence it.
> >>
> >> In that case, you'd have to successfully trigger
> >> ram_block_discard_disable(true) first, before pinning. Who would do that
> >> now conditionally, just like e.g., VFIO does?
> >>
> >> io_uring fixed buffers would be one such example that pins memory and is
> >> problematic. vfio (unless on s390x) is another example, as you point out.
> > 
> > Okay, I think libblkio needs to expose a bool property called
> > "mem-regions-pinned" so QEMU whether or not the registered buffers will
> > be pinned.
> > 
> > Then the QEMU BlockDriver can do:
> > 
> >   if (mem_regions_pinned) {
> >       if (ram_block_discard_disable(true) < 0) {
> >           ...fail to open block device...
> >       }
> >   }
> > 
> > Does that sound right?
> 
> Yes, I think so.
> 
> > 
> > Is "pinned" the best word to describe this or is there a more general
> > characteristic we are looking for?
> 
> pinning should be the right term. We want to express that all user page
> tables will immediately get populated and that a kernel subsystem will
> take longterm references on mapped page that will go out of sync as soon
> as we discard memory e.g., using madvise(MADV_DONTEED).
> 
> We just should not confuse it with memlock / locking into memory, which
> are yet different semantics (e.g., don't swap it out).
> 
> > 
> >>
> >> This has to be treated with care. Another thing to consider is that
> >> different backends might only support a limited number of such regions.
> >> I assume there is a way for QEMU to query this limit upfront? It might
> >> be required for memory hot(un)plug to figure out how many memory slots
> >> we actually have (for ordinary DIMMs, and if we ever want to make this
> >> compatible to virtio-mem, it might be required as well when the backend
> >> pins memory).
> > 
> > Yes, libblkio reports the maximum number of blkio_mem_regions supported
> > by the device. The property is called "max-mem-regions".
> > 
> > The QEMU BlockDriver currently doesn't use this information. Are there
> > any QEMU APIs that should be called to propagate this value?
> 
> I assume we have to do exactly the same thing as e.g.,
> vhost_has_free_slot()/kvm_has_free_slot() does.
> 
> Especially, hw/mem/memory-device.c needs care and
> slots_limit/used_memslots handling in hw/virtio/vhost.c might be
> relevant as well.
> 
> 
> Note that I have some patches pending that extend that handling, by also
> providing how many used+free slots there are, such as:
> 
> https://lore.kernel.org/all/20211027124531.57561-3-david@redhat.com/

Okay, thanks for explaining. I will make the libblkio driver participate
in free slots accounting.

Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/integration/attachments/20220905/b83d552f/attachment.sig>


More information about the integration mailing list