[GEDI] [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

Michael Galaxy mgalaxy at akamai.com
Mon Apr 29 20:56:26 UTC 2024


Reviewed-by: Michael Galaxy <mgalaxy at akamai.com>

Thanks Yu Zhang and Peter.

- Michael

On 4/29/24 15:45, Yu Zhang wrote:
> Hello Michael and Peter,
>
> We are very glad at your quick and kind reply about our plan to take
> over the maintenance of your code. The message is for presenting our
> plan and working together.
> If we were able to obtain the maintainer's role, our plan is:
>
> 1. Create the necessary unit-test cases and get them integrated into
> the current QEMU GitLab-CI pipeline
> 2. Review and test the code changes by other developers to ensure that
> nothing is broken in the changed code before being merged by the
> community
> 3. Based on our current practice and application scenario, look for
> possible improvements when necessary
>
> Besides that, a patch is attached to announce this change in the community.
>
> With your generous support, we hope that the development community
> will make a positive decision for us.
>
> Kind regards,
> Yu Zhang@ IONOS Cloud
>
> On Mon, Apr 29, 2024 at 4:57 PM Peter Xu <peterx at redhat.com> wrote:
>> On Mon, Apr 29, 2024 at 08:08:10AM -0500, Michael Galaxy wrote:
>>> Hi All (and Peter),
>> Hi, Michael,
>>
>>> My name is Michael Galaxy (formerly Hines). Yes, I changed my last name
>>> (highly irregular for a male) and yes, that's my real last name:
>>> https://urldefense.com/v3/__https://www.linkedin.com/in/mrgalaxy/__;!!GjvTz_vk!TZmnCE90EK692dSjZGr-2cpOEZBQTBsTO2bW5z3rSbpZgNVCexZkxwDXhmIOWG2GAKZAUovQ5xe5coQ$ )
>>>
>>> I'm the original author of the RDMA implementation. I've been discussing
>>> with Yu Zhang for a little bit about potentially handing over maintainership
>>> of the codebase to his team.
>>>
>>> I simply have zero access to RoCE or Infiniband hardware at all,
>>> unfortunately. so I've never been able to run tests or use what I wrote at
>>> work, and as all of you know, if you don't have a way to test something,
>>> then you can't maintain it.
>>>
>>> Yu Zhang put a (very kind) proposal forward to me to ask the community if
>>> they feel comfortable training his team to maintain the codebase (and run
>>> tests) while they learn about it.
>> The "while learning" part is fine at least to me.  IMHO the "ownership" to
>> the code, or say, taking over the responsibility, may or may not need 100%
>> mastering the code base first.  There should still be some fundamental
>> confidence to work on the code though as a starting point, then it's about
>> serious use case to back this up, and careful testings while getting more
>> familiar with it.
>>
>>> If you don't mind, I'd like to let him send over his (very detailed)
>>> proposal,
>> Yes please, it's exactly the time to share the plan.  The hope is we try to
>> reach a consensus before or around the middle of this release (9.1).
>> Normally QEMU has a 3~4 months window for each release and 9.1 schedule is
>> not yet out, but I think it means we make a decision before or around
>> middle of June.
>>
>> Thanks,
>>
>>> - Michael
>>>
>>> On 4/11/24 11:36, Yu Zhang wrote:
>>>>> 1) Either a CI test covering at least the major RDMA paths, or at least
>>>>>       periodically tests for each QEMU release will be needed.
>>>> We use a batch of regression test cases for the stack, which covers the
>>>> test for QEMU. I did such test for most of the QEMU releases planned as
>>>> candidates for rollout.
>>>>
>>>> The migration test needs a pair of (either physical or virtual) servers with
>>>> InfiniBand network, which makes it difficult to do on a single server. The
>>>> nested VM could be a possible approach, for which we may need virtual
>>>> InfiniBand network. Is SoftRoCE [1] a choice? I will try it and let you know.
>>>>
>>>> [1]  https://urldefense.com/v3/__https://enterprise-support.nvidia.com/s/article/howto-configure-soft-roce__;!!GjvTz_vk!VEqNfg3Kdf58Oh1FkGL6ErDLfvUXZXPwMTaXizuIQeIgJiywPzuwbqx8wM0KUsyopw_EYQxWvGHE3ig$
>>>>
>>>> Thanks and best regards!
>>>>
>>>> On Thu, Apr 11, 2024 at 4:20 PM Peter Xu <peterx at redhat.com> wrote:
>>>>> On Wed, Apr 10, 2024 at 09:49:15AM -0400, Peter Xu wrote:
>>>>>> On Wed, Apr 10, 2024 at 02:28:59AM +0000, Zhijian Li (Fujitsu) via wrote:
>>>>>>> on 4/10/2024 3:46 AM, Peter Xu wrote:
>>>>>>>
>>>>>>>>> Is there document/link about the unittest/CI for migration tests, Why
>>>>>>>>> are those tests missing?
>>>>>>>>> Is it hard or very special to set up an environment for that? maybe we
>>>>>>>>> can help in this regards.
>>>>>>>> See tests/qtest/migration-test.c.  We put most of our migration tests
>>>>>>>> there and that's covered in CI.
>>>>>>>>
>>>>>>>> I think one major issue is CI systems don't normally have rdma devices.
>>>>>>>> Can rdma migration test be carried out without a real hardware?
>>>>>>> Yeah,  RXE aka. SOFT-RoCE is able to emulate the RDMA, for example
>>>>>>> $ sudo rdma link add rxe_eth0 type rxe netdev eth0  # on host
>>>>>>> then we can get a new RDMA interface "rxe_eth0".
>>>>>>> This new RDMA interface is able to do the QEMU RDMA migration.
>>>>>>>
>>>>>>> Also, the loopback(lo) device is able to emulate the RDMA interface
>>>>>>> "rxe_lo", however when
>>>>>>> I tried(years ago) to do RDMA migration over this
>>>>>>> interface(rdma:127.0.0.1:3333) , it got something wrong.
>>>>>>> So i gave up enabling the RDMA migration qtest at that time.
>>>>>> Thanks, Zhijian.
>>>>>>
>>>>>> I'm not sure adding an emu-link for rdma is doable for CI systems, though.
>>>>>> Maybe someone more familiar with how CI works can chim in.
>>>>> Some people got dropped on the cc list for unknown reason, I'm adding them
>>>>> back (Fabiano, Peter Maydell, Phil).  Let's make sure nobody is dropped by
>>>>> accident.
>>>>>
>>>>> I'll try to summarize what is still missing, and I think these will be
>>>>> greatly helpful if we don't want to deprecate rdma migration:
>>>>>
>>>>>     1) Either a CI test covering at least the major RDMA paths, or at least
>>>>>        periodically tests for each QEMU release will be needed.
>>>>>
>>>>>     2) Some performance tests between modern RDMA and NIC devices are
>>>>>        welcomed.  The current knowledge is modern NIC can work similarly to
>>>>>        RDMA in performance, then it's debatable why we still maintain so much
>>>>>        rdma specific code.
>>>>>
>>>>>     3) No need to be soild patchsets for this one, but some plan to improve
>>>>>        RDMA migration code so that it is not almost isolated from the rest
>>>>>        protocols.
>>>>>
>>>>>     4) Someone to look after this code for real.
>>>>>
>>>>> For 2) and 3) more info is here:
>>>>>
>>>>> https://urldefense.com/v3/__https://lore.kernel.org/r/ZhWa0YeAb9ySVKD1@x1n__;!!GjvTz_vk!VEqNfg3Kdf58Oh1FkGL6ErDLfvUXZXPwMTaXizuIQeIgJiywPzuwbqx8wM0KUsyopw_EYQxWpIWYBhQ$
>>>>>
>>>>> Here 4) can be the most important as Markus pointed out.  We just didn't
>>>>> get there yet on the discussions, but maybe Markus is right that we should
>>>>> talk that first.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --
>>>>> Peter Xu
>>>>>
>> --
>> Peter Xu
>>


More information about the integration mailing list