[GEDI] [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
Peter Xu
peterx at redhat.com
Wed Jun 5 21:18:38 UTC 2024
On Wed, Jun 05, 2024 at 08:48:28PM +0000, Dr. David Alan Gilbert wrote:
> > > I just noticed this thread; some random notes from a somewhat
> > > fragmented memory of this:
> > >
> > > a) Long long ago, I also tried rsocket;
> > > https://lists.gnu.org/archive/html/qemu-devel/2015-01/msg02040.html
> > > as I remember the library was quite flaky at the time.
> >
> > Hmm interesting. There also looks like a thread doing rpoll().
>
> Yeh, I can't actually remember much more about what I did back then!
Heh, that's understandable and fair. :)
> > I hope Lei and his team has tested >4G mem, otherwise definitely worth
> > checking. Lei also mentioned there're rsocket bugs they found in the cover
> > letter, but not sure what's that about.
>
> It would probably be a good idea to keep track of what bugs
> are in flight with it, and try it on a few RDMA cards to see
> what problems get triggered.
> I think I reported a few at the time, but I gave up after
> feeling it was getting very hacky.
Agreed. Maybe we can have a list of that in the cover letter or even
QEMU's migration/rmda doc page.
Lei, if you think that makes sense please do so in your upcoming posts.
There'll need to have a list of things you encountered in the kernel driver
and it'll be even better if there're further links to read on each problem.
> > >
> > > e) Someone made a good suggestion (sorry can't remember who) - that the
> > > RDMA migration structure was the wrong way around - it should be the
> > > destination which initiates an RDMA read, rather than the source
> > > doing a write; then things might become a LOT simpler; you just need
> > > to send page ranges to the destination and it can pull it.
> > > That might work nicely for postcopy.
> >
> > I'm not sure whether it'll still be a problem if rdma recv side is based on
> > zero-copy. It would be a matter of whether atomicity can be guaranteed so
> > that we don't want the guest vcpus to see a partially copied page during
> > on-flight DMAs. UFFDIO_COPY (or friend) is currently the only solution for
> > that.
>
> Yes, but even ignoring that (and the UFFDIO_CONTINUE idea you mention), if
> the destination can issue an RDMA read itself, it doesn't need to send messages
> to the source to ask for a page fetch; it just goes and grabs it itself,
> that's got to be good for latency.
Oh, that's pretty internal stuff of rdma to me and beyond my knowledge..
but from what I can tell it sounds very reasonable indeed!
Thanks!
--
Peter Xu
More information about the integration
mailing list