[Gluster-devel] Release 3.12: Glusto run status

Wed Aug 30 10:55:33 UTC 2017

The return code is in the log message you copy pasted:

2017-08-28 15:13:58,952 INFO (test_expanding_volume_when_io_in_progress)
Successfully started rebalance on the volume
testvol_distributed-dispersed
2017-08-28 15:13:58,952 INFO (test_expanding_volume_when_io_in_progress)
Checking Rebalance status
2017-08-28 15:13:58,953 INFO (run) root at 172.19.2.86 (cp): gluster volume
rebalance testvol_distributed-dispersed status
2017-08-28 15:13:58,953 DEBUG (_get_ssh_connection) Retrieved connection
from cache: root at 172.19.2.86
*2017-08-28 15:13:58,993 INFO (_log_results) RETCODE (root at 172.19.2.86
<root at 172.19.2.86>): 1    *

We do not have any delay b/w rebalance start and the first 'rebalance
status' Should i introduce the delay?

On Wed, Aug 30, 2017 at 4:23 PM, Atin Mukherjee <amukherj at redhat.com> wrote:

> Ok, Nigel helped me in understanding the trace back time is not something
> we should look and the right way to dig through this problem is by looking
> at glusto logs. As per the last rebalance instance from the log I see the
> following:
>
> volume rebalance: testvol_distributed-dispersed: success: Rebalance on
> testvol_distributed-dispersed has been started successfully.   Use
> rebalance status command to check status of the rebalance process.
> ID: 96c645c7-710c-4c4c-a434-4157cbb75753
>
>
>
> 2017-08-28 15:13:58,952 INFO (test_expanding_volume_when_io_in_progress)
> Successfully started rebalance on the volume
> testvol_distributed-dispersed
> 2017-08-28 15:13:58,952 INFO (test_expanding_volume_when_io_in_progress)
> Checking Rebalance status
> 2017-08-28 15:13:58,953 INFO (run) root at 172.19.2.86 (cp): gluster volume
> rebalance testvol_distributed-dispersed status
> 2017-08-28 15:13:58,953 DEBUG (_get_ssh_connection) Retrieved connection
> from cache: root at 172.19.2.86
> 2017-08-28 15:13:58,993 INFO (_log_results) RETCODE (root at 172.19.2.86):
> 1
> 2017-08-28 15:13:58,994 INFO (_log_results) STDERR (root at 172.19.2.86
> )...
> volume rebalance: testvol_distributed-dispersed: failed: error
>
> Again here I see the rebalance status failure was logged at 15:13:58,994
> where as rebalance start was triggered at 15:13:58,952.
>
> @Shwetha - could you help me in understanding how do we log the rebalance
> status ret code in glusto log?
>
>
> On Wed, Aug 30, 2017 at 4:07 PM, Atin Mukherjee <amukherj at redhat.com>
> wrote:
>
>> *14:15:57*         # Start Rebalance*14:15:57*         g.log.info("Starting Rebalance on the volume")*14:15:57*         ret, _, _ = rebalance_start(self.mnode, self.volname)*14:15:57*         self.assertEqual(ret, 0, ("Failed to start rebalance on the volume "*14:15:57*                                   "%s", self.volname))*14:15:57*         g.log.info("Successfully started rebalance on the volume %s",*14:15:57*                    self.volname)*14:15:57*     *14:15:57*         # Check Rebalance status*14:15:57*         g.log.info("Checking Rebalance status")*14:15:57*         ret, _, _ = rebalance_status(self.mnode, self.volname)*14:15:57*         self.assertEqual(ret, 0, ("Failed to get rebalance status for the "*14:15:57* >                                 "volume %s", self.volname))*14:15:57* E       AssertionError: ('Failed to get rebalance status for the volume %s', 'testvol_distributed-dispersed')
>>
>>
>> The above is the snip extracted from https://ci.centos.org/view/Glu
>> ster/job/gluster_glusto/377/console
>>
>> If we had gone for rebalance status checks multiple times, I should have
>> seen multiple entries of rebalance_status failure or at least a difference
>> in time, isn't it?
>>
>>
>> On Wed, Aug 30, 2017 at 3:39 PM, Shwetha Panduranga <spandura at redhat.com>
>> wrote:
>>
>>> Case:
>>>
>>> 1) add-brick when IO is in progress , wait for 30 seconds
>>>
>>> 2) Trigger rebalance
>>>
>>> 3) Execute: 'rebalance status' ( there is no time delay b/w 2) and 3) )
>>>
>>> 4) wait_for_rebalance_to_complete ( This get's the xml output of
>>> rebalance status and keep checking for rebalance status to be 'complete'
>>> for every 10 seconds uptil 5 minutes. 5 minutes wait time can be passed as
>>> parameter )
>>>
>>> At every step we check the exit status of the command output. If the
>>> exit status is non-zero we fail the test case.
>>>
>>> -Shwetha
>>>
>>> On Wed, Aug 30, 2017 at 6:06 AM, Sankarshan Mukhopadhyay <
>>> sankarshan.mukhopadhyay at gmail.com> wrote:
>>>
>>>> On Wed, Aug 30, 2017 at 6:03 AM, Atin Mukherjee <amukherj at redhat.com>
>>>> wrote:
>>>> >
>>>> > On Wed, 30 Aug 2017 at 00:23, Shwetha Panduranga <spandura at redhat.com
>>>> >
>>>> > wrote:
>>>> >>
>>>> >> Hi Shyam, we are already doing it. we wait for rebalance status to be
>>>> >> complete. We loop. we keep checking if the status is complete for
>>>> '20'
>>>> >> minutes or so.
>>>> >
>>>> >
>>>> > Are you saying in this test rebalance status was executed multiple
>>>> times
>>>> > till it succeed? If yes then the test shouldn't have failed. Can I
>>>> get to
>>>> > access the complete set of logs?
>>>>
>>>> Would you not prefer to look at the specific test under discussion as
>>>> well?
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170830/4a958186/attachment.html>