[automated-testing] reboot_nodes_and_wait_to_come_online bug

Jonathan Holloway jhollowa at redhat.com
Wed Jun 13 17:07:26 UTC 2018


Hey Nigel,

The RPyC server does not automatically restart nor does it reconnect after
the server process is restarted.
There are a couple of ways to handle it.
I'll send steps that can be used without a change to Glusto, but right now
I'm testing something that can be quickly injected into Glusto to make this
seamless.

Cheers,
Jonathan


On Wed, Jun 13, 2018 at 5:15 AM, Nigel Babu <nigelb at redhat.com> wrote:

> Jonathan,
>
> After a machine reboot, will rpyc reconnect automatically? Or are the
> communication issues a symptom of a larger problem that you can't restart a
> client and expect the connection to exist when it comes back online?
>
> On Mon, Jun 11, 2018 at 7:36 PM, Jonathan Holloway <jhollowa at redhat.com>
> wrote:
>
>> Hey Vijay,
>>
>> In the AFR test run I started on Saturday, it looks like the 039 system
>> had that communication issue we were tracking down on Friday, and it had
>> just been rebooted as part of the test.
>> Definitely worth re-running AFR after the fix.
>>
>> Cheers,
>> Jonathan
>>
>> On Mon, Jun 11, 2018 at 6:49 AM, Vijay Bhaskar Reddy Avuthu <
>> vavuthu at redhat.com> wrote:
>>
>>> I will take a look.
>>>
>>> Regards,
>>> Vijay A
>>>
>>> On Mon, Jun 11, 2018 at 5:05 PM, Nigel Babu <nigelb at redhat.com> wrote:
>>>
>>>> Oh dear. That's a problem. Vijay, I think you wrote the original code?
>>>> Can you take a look?
>>>>
>>>> On Mon, Jun 11, 2018 at 1:58 PM, Vitalii Koriakov <vkoriako at redhat.com>
>>>> wrote:
>>>>
>>>>> Hello all
>>>>> Noticed such behavior:
>>>>>
>>>>> Reboot nodes with the method reboot_nodes_and_wait_to_come_online. In
>>>>> case when nodes are not online after timeout - it says that all nodes are
>>>>> online.
>>>>> So logs are:
>>>>>
>>>>>
>>>>> 2018-06-08 18:28:00,210 INFO (are_nodes_online) 172.19.2.122 is offline
>>>>> 2018-06-08 18:28:00,211 INFO (reboot_nodes_and_wait_to_come_online)
>>>>> Nodes are offline, Retry after 5 seconds .....
>>>>> 2018-06-08 18:28:05,216 INFO (reboot_nodes_and_wait_to_come_online)
>>>>> All nodes ['172.19.2.86', '172.19.2.126', '172.19.3.113', '172.19.2.122']
>>>>> are up and running
>>>>>
>>>>> So it doesn't check are nodes online after 5 sec and just return that
>>>>> all nodes are online.
>>>>>
>>>>> Regards,
>>>>> Vitalii
>>>>> _______________________________________________
>>>>> automated-testing mailing list
>>>>> automated-testing at gluster.org
>>>>> http://lists.gluster.org/mailman/listinfo/automated-testing
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> nigelb
>>>>
>>>
>>>
>>> _______________________________________________
>>> automated-testing mailing list
>>> automated-testing at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/automated-testing
>>>
>>>
>>
>
>
> --
> nigelb
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/automated-testing/attachments/20180613/6cda7247/attachment-0001.html>


More information about the automated-testing mailing list