[Gluster-devel] Release 3.12: Glusto run status

Shwetha Panduranga spandura at redhat.com
Wed Aug 30 10:58:52 UTC 2017


May be i should change the log message from 'Checking rebalance status' to
'Logging rebalance status' because the first 'rebalance status' command
just does that . It executes 'rebalance status'. Now
wait_for_rebalance_to_complete validates rebalance is 'completed' within 5
minutes ( default time out ). If that makes sense i will make those changes
as well along with introducing the delay b/w 'start' and 'status'

On Wed, Aug 30, 2017 at 4:26 PM, Atin Mukherjee <amukherj at redhat.com> wrote:

>
>
> On Wed, Aug 30, 2017 at 4:23 PM, Shwetha Panduranga <spandura at redhat.com>
> wrote:
>
>> This is the first check where we just execute 'rebalance status' . That's
>> the command which failed and hence failed the test case. If u see the test
>> case, the next step is wait_for_rebalance_to_complete (status --xml). This
>> is where we execute  rebalance status until 5 minutes for rebalance to get
>> completed. Even before waiting for rebalance, the first execution of status
>> command failed. Hence the test case failed.
>>
>
> Cool. So there is still a problem in the test case. We can't assume
> rebalance status to report back success immediately after rebalance start
> and I've explained the why part in the earlier thread. Why do we need to do
> an intermediate check of rebalance status before going for
> wait_for_rebalance_to_complete ?
>
>
>> On Wed, Aug 30, 2017 at 4:07 PM, Atin Mukherjee <amukherj at redhat.com>
>> wrote:
>>
>>> *14:15:57*         # Start Rebalance*14:15:57*         g.log.info("Starting Rebalance on the volume")*14:15:57*         ret, _, _ = rebalance_start(self.mnode, self.volname)*14:15:57*         self.assertEqual(ret, 0, ("Failed to start rebalance on the volume "*14:15:57*                                   "%s", self.volname))*14:15:57*         g.log.info("Successfully started rebalance on the volume %s",*14:15:57*                    self.volname)*14:15:57*     *14:15:57*         # Check Rebalance status*14:15:57*         g.log.info("Checking Rebalance status")*14:15:57*         ret, _, _ = rebalance_status(self.mnode, self.volname)*14:15:57*         self.assertEqual(ret, 0, ("Failed to get rebalance status for the "*14:15:57* >                                 "volume %s", self.volname))*14:15:57* E       AssertionError: ('Failed to get rebalance status for the volume %s', 'testvol_distributed-dispersed')
>>>
>>>
>>> The above is the snip extracted from https://ci.centos.org/view/Glu
>>> ster/job/gluster_glusto/377/console
>>>
>>> If we had gone for rebalance status checks multiple times, I should have
>>> seen multiple entries of rebalance_status failure or at least a difference
>>> in time, isn't it?
>>>
>>>
>>> On Wed, Aug 30, 2017 at 3:39 PM, Shwetha Panduranga <spandura at redhat.com
>>> > wrote:
>>>
>>>> Case:
>>>>
>>>> 1) add-brick when IO is in progress , wait for 30 seconds
>>>>
>>>> 2) Trigger rebalance
>>>>
>>>> 3) Execute: 'rebalance status' ( there is no time delay b/w 2) and 3) )
>>>>
>>>> 4) wait_for_rebalance_to_complete ( This get's the xml output of
>>>> rebalance status and keep checking for rebalance status to be 'complete'
>>>> for every 10 seconds uptil 5 minutes. 5 minutes wait time can be passed as
>>>> parameter )
>>>>
>>>> At every step we check the exit status of the command output. If the
>>>> exit status is non-zero we fail the test case.
>>>>
>>>> -Shwetha
>>>>
>>>> On Wed, Aug 30, 2017 at 6:06 AM, Sankarshan Mukhopadhyay <
>>>> sankarshan.mukhopadhyay at gmail.com> wrote:
>>>>
>>>>> On Wed, Aug 30, 2017 at 6:03 AM, Atin Mukherjee <amukherj at redhat.com>
>>>>> wrote:
>>>>> >
>>>>> > On Wed, 30 Aug 2017 at 00:23, Shwetha Panduranga <
>>>>> spandura at redhat.com>
>>>>> > wrote:
>>>>> >>
>>>>> >> Hi Shyam, we are already doing it. we wait for rebalance status to
>>>>> be
>>>>> >> complete. We loop. we keep checking if the status is complete for
>>>>> '20'
>>>>> >> minutes or so.
>>>>> >
>>>>> >
>>>>> > Are you saying in this test rebalance status was executed multiple
>>>>> times
>>>>> > till it succeed? If yes then the test shouldn't have failed. Can I
>>>>> get to
>>>>> > access the complete set of logs?
>>>>>
>>>>> Would you not prefer to look at the specific test under discussion as
>>>>> well?
>>>>> _______________________________________________
>>>>> Gluster-devel mailing list
>>>>> Gluster-devel at gluster.org
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170830/aec89cff/attachment.html>


More information about the Gluster-devel mailing list