[GEDI] [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
Jinpu Wang
jinpu.wang at ionos.com
Mon May 13 07:30:49 UTC 2024
Hi Peter, Hi Chuan,
On Thu, May 9, 2024 at 4:14 PM Peter Xu <peterx at redhat.com> wrote:
>
> On Thu, May 09, 2024 at 04:58:34PM +0800, Zheng Chuan via wrote:
> > That's a good news to see the socket abstraction for RDMA!
> > When I was developed the series above, the most pain is the RDMA migration has no QIOChannel abstraction and i need to take a 'fake channel'
> > for it which is awkward in code implementation.
> > So, as far as I know, we can do this by
> > i. the first thing is that we need to evaluate the rsocket is good enough to satisfy our QIOChannel fundamental abstraction
> > ii. if it works right, then we will continue to see if it can give us opportunity to hide the detail of rdma protocol
> > into rsocket by remove most of code in rdma.c and also some hack in migration main process.
> > iii. implement the advanced features like multi-fd and multi-uri for rdma migration.
> >
> > Since I am not familiar with rsocket, I need some times to look at it and do some quick verify with rdma migration based on rsocket.
> > But, yes, I am willing to involved in this refactor work and to see if we can make this migration feature more better:)
>
> Based on what we have now, it looks like we'd better halt the deprecation
> process a bit, so I think we shouldn't need to rush it at least in 9.1
> then, and we'll need to see how it goes on the refactoring.
>
> It'll be perfect if rsocket works, otherwise supporting multifd with little
> overhead / exported APIs would also be a good thing in general with
> whatever approach. And obviously all based on the facts that we can get
> resources from companies to support this feature first.
>
> Note that so far nobody yet compared with rdma v.s. nic perf, so I hope if
> any of us can provide some test results please do so. Many people are
> saying RDMA is better, but I yet didn't see any numbers comparing it with
> modern TCP networks. I don't want to have old impressions floating around
> even if things might have changed.. When we have consolidated results, we
> should share them out and also reflect that in QEMU's migration docs when a
> rdma document page is ready.
I also did a tests with Mellanox ConnectX-6 100 G RoCE nic, the
results are mixed, for less than 3 streams native ethernet is faster,
and when more than 3 streams rsocket performs better.
root at x4-right:~# iperf -c 1.1.1.16 -P 1
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 325 KByte (default)
------------------------------------------------------------
[ 3] local 1.1.1.15 port 44214 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0000-10.0000 sec 52.9 GBytes 45.4 Gbits/sec
root at x4-right:~# iperf -c 1.1.1.16 -P 2
[ 3] local 1.1.1.15 port 33118 connected with 1.1.1.16 port 5001
[ 4] local 1.1.1.15 port 33130 connected with 1.1.1.16 port 5001
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 4.00 MByte (default)
------------------------------------------------------------
[ ID] Interval Transfer Bandwidth
[ 3] 0.0000-10.0001 sec 45.0 GBytes 38.7 Gbits/sec
[ 4] 0.0000-10.0000 sec 43.9 GBytes 37.7 Gbits/sec
[SUM] 0.0000-10.0000 sec 88.9 GBytes 76.4 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) =
0.172/0.189/0.205/0.172 ms (tot/err) = 2/0
root at x4-right:~# iperf -c 1.1.1.16 -P 4
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 325 KByte (default)
------------------------------------------------------------
[ 5] local 1.1.1.15 port 50748 connected with 1.1.1.16 port 5001
[ 4] local 1.1.1.15 port 50734 connected with 1.1.1.16 port 5001
[ 6] local 1.1.1.15 port 50764 connected with 1.1.1.16 port 5001
[ 3] local 1.1.1.15 port 50730 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 6] 0.0000-10.0000 sec 24.7 GBytes 21.2 Gbits/sec
[ 3] 0.0000-10.0004 sec 23.6 GBytes 20.3 Gbits/sec
[ 4] 0.0000-10.0000 sec 27.8 GBytes 23.9 Gbits/sec
[ 5] 0.0000-10.0000 sec 28.0 GBytes 24.0 Gbits/sec
[SUM] 0.0000-10.0000 sec 104 GBytes 89.4 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) =
0.104/0.156/0.204/0.124 ms (tot/err) = 4/0
root at x4-right:~# iperf -c 1.1.1.16 -P 8
[ 4] local 1.1.1.15 port 55588 connected with 1.1.1.16 port 5001
[ 5] local 1.1.1.15 port 55600 connected with 1.1.1.16 port 5001
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 325 KByte (default)
------------------------------------------------------------
[ 10] local 1.1.1.15 port 55628 connected with 1.1.1.16 port 5001
[ 15] local 1.1.1.15 port 55648 connected with 1.1.1.16 port 5001
[ 7] local 1.1.1.15 port 55620 connected with 1.1.1.16 port 5001
[ 3] local 1.1.1.15 port 55584 connected with 1.1.1.16 port 5001
[ 14] local 1.1.1.15 port 55644 connected with 1.1.1.16 port 5001
[ 6] local 1.1.1.15 port 55610 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 6] 0.0000-10.0015 sec 8.47 GBytes 7.27 Gbits/sec
[ 4] 0.0000-10.0011 sec 8.62 GBytes 7.40 Gbits/sec
[ 7] 0.0000-10.0000 sec 18.1 GBytes 15.5 Gbits/sec
[ 14] 0.0000-10.0000 sec 8.69 GBytes 7.46 Gbits/sec
[ 5] 0.0000-10.0006 sec 18.5 GBytes 15.9 Gbits/sec
[ 10] 0.0000-10.0006 sec 16.1 GBytes 13.9 Gbits/sec
[ 3] 0.0000-10.0000 sec 17.1 GBytes 14.6 Gbits/sec
[ 15] 0.0000-10.0016 sec 8.54 GBytes 7.34 Gbits/sec
[SUM] 0.0000-10.0017 sec 104 GBytes 89.4 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) =
0.049/0.095/0.213/0.062 ms (tot/err) = 8/0
root at x4-right:~#
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c
1.1.1.16 -P 1
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 3] local 1.1.1.15 port 45596 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0000-10.0000 sec 37.8 GBytes 32.5 Gbits/sec
root at x4-right:~#
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c
1.1.1.16 -P 2
[ 4] local 1.1.1.15 port 46782 connected with 1.1.1.16 port 5001
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 3] local 1.1.1.15 port 43237 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 4] 0.0000-10.0000 sec 37.5 GBytes 32.2 Gbits/sec
[ 3] 0.0000-10.0000 sec 40.7 GBytes 34.9 Gbits/sec
[SUM] 0.0000-10.0000 sec 78.2 GBytes 67.2 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) =
5.819/6.579/7.340/7.340 ms (tot/err) = 2/0
root at x4-right:~#
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c
1.1.1.16 -P 4
[ 4] local 1.1.1.15 port 60385 connected with 1.1.1.16 port 5001
[ 7] local 1.1.1.15 port 55203 connected with 1.1.1.16 port 5001
[ 6] local 1.1.1.15 port 35084 connected with 1.1.1.16 port 5001
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 3] local 1.1.1.15 port 37253 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 6] 0.0000-10.0000 sec 28.4 GBytes 24.4 Gbits/sec
[ 4] 0.0000-10.0000 sec 28.3 GBytes 24.3 Gbits/sec
[ 7] 0.0000-10.0000 sec 28.4 GBytes 24.4 Gbits/sec
[ 3] 0.0000-10.0001 sec 28.2 GBytes 24.3 Gbits/sec
[SUM] 0.0000-10.0001 sec 113 GBytes 97.3 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) =
5.311/7.579/10.019/4.165 ms (tot/err) = 4/0
root at x4-right:~#
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c
1.1.1.16 -P 8
[ 8] local 1.1.1.15 port 33684 connected with 1.1.1.16 port 5001
[ 10] local 1.1.1.15 port 40620 connected with 1.1.1.16 port 5001
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 3] local 1.1.1.15 port 56988 connected with 1.1.1.16 port 5001
[ 4] local 1.1.1.15 port 51139 connected with 1.1.1.16 port 5001
[ 12] local 1.1.1.15 port 44712 connected with 1.1.1.16 port 5001
[ 5] local 1.1.1.15 port 50838 connected with 1.1.1.16 port 5001
[ 6] local 1.1.1.15 port 51334 connected with 1.1.1.16 port 5001
[ 9] local 1.1.1.15 port 40611 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0000-10.0000 sec 13.8 GBytes 11.9 Gbits/sec
[ 5] 0.0000-10.0001 sec 13.9 GBytes 11.9 Gbits/sec
[ 12] 0.0000-10.0001 sec 13.8 GBytes 11.9 Gbits/sec
[ 10] 0.0000-10.0001 sec 13.9 GBytes 11.9 Gbits/sec
[ 9] 0.0000-10.0000 sec 13.8 GBytes 11.9 Gbits/sec
[ 6] 0.0000-10.0000 sec 13.9 GBytes 11.9 Gbits/sec
[ 8] 0.0000-10.0000 sec 13.8 GBytes 11.9 Gbits/sec
[ 4] 0.0000-10.0001 sec 13.8 GBytes 11.9 Gbits/sec
[SUM] 0.0000-10.0001 sec 111 GBytes 95.1 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) =
5.973/10.699/15.943/4.251 ms (tot/err) = 8/0
root at x4-right:~#
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c
1.1.1.16 -P 1
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 3] local 1.1.1.15 port 36960 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0000-10.0000 sec 41.1 GBytes 35.3 Gbits/sec
root at x4-right:~#
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c
1.1.1.16 -P 2
[ 3] local 1.1.1.15 port 32799 connected with 1.1.1.16 port 5001
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 4] local 1.1.1.15 port 35912 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 4] 0.0000-10.0000 sec 36.6 GBytes 31.4 Gbits/sec
[ 3] 0.0000-10.0000 sec 36.6 GBytes 31.4 Gbits/sec
[SUM] 0.0000-10.0000 sec 73.2 GBytes 62.9 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) =
5.172/5.842/6.512/6.512 ms (tot/err) = 2/0
root at x4-right:~#
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c
1.1.1.16 -P 4
[ 4] local 1.1.1.15 port 53311 connected with 1.1.1.16 port 5001
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 3] local 1.1.1.15 port 37243 connected with 1.1.1.16 port 5001
[ 7] local 1.1.1.15 port 60801 connected with 1.1.1.16 port 5001
[ 6] local 1.1.1.15 port 49694 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 6] 0.0000-10.0000 sec 28.2 GBytes 24.2 Gbits/sec
[ 7] 0.0000-10.0000 sec 28.2 GBytes 24.3 Gbits/sec
[ 3] 0.0000-10.0000 sec 28.2 GBytes 24.2 Gbits/sec
[ 4] 0.0000-10.0000 sec 28.2 GBytes 24.2 Gbits/sec
[SUM] 0.0000-10.0000 sec 113 GBytes 96.9 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) =
5.570/7.762/10.045/4.265 ms (tot/err) = 4/0
root at x4-right:~#
>
> Chuan, please check the whole thread discussion, it may help to understand
> what we are looking for on rdma migrations [1]. Meanwhile please feel free
> to sync with Jinpu's team and see how to move forward with such a project.
We are happy to work with community to improve rdma migration.
>
> [1] https://lore.kernel.org/qemu-devel/87frwatp7n.fsf@suse.de/
>
> Thanks,
Regards!
>
> --
> Peter Xu
>
More information about the integration
mailing list