[Gluster-devel] Release 3.12: Glusto run status

Atin Mukherjee amukherj at redhat.com
Wed Aug 30 11:04:54 UTC 2017


It doesn't make any sense to me do a rebalance status at first go and then
go for a rebalance status in a loop. Instead we should go for a rebalance
status in a loop immediately.

On Wed, Aug 30, 2017 at 4:28 PM, Shwetha Panduranga <spandura at redhat.com>
wrote:

> May be i should change the log message from 'Checking rebalance status' to
> 'Logging rebalance status' because the first 'rebalance status' command
> just does that . It executes 'rebalance status'. Now
> wait_for_rebalance_to_complete validates rebalance is 'completed' within 5
> minutes ( default time out ). If that makes sense i will make those changes
> as well along with introducing the delay b/w 'start' and 'status'
>
> On Wed, Aug 30, 2017 at 4:26 PM, Atin Mukherjee <amukherj at redhat.com>
> wrote:
>
>>
>>
>> On Wed, Aug 30, 2017 at 4:23 PM, Shwetha Panduranga <spandura at redhat.com>
>> wrote:
>>
>>> This is the first check where we just execute 'rebalance status' .
>>> That's the command which failed and hence failed the test case. If u see
>>> the test case, the next step is wait_for_rebalance_to_complete (status
>>> --xml). This is where we execute  rebalance status until 5 minutes for
>>> rebalance to get completed. Even before waiting for rebalance, the first
>>> execution of status command failed. Hence the test case failed.
>>>
>>
>> Cool. So there is still a problem in the test case. We can't assume
>> rebalance status to report back success immediately after rebalance start
>> and I've explained the why part in the earlier thread. Why do we need to do
>> an intermediate check of rebalance status before going for
>> wait_for_rebalance_to_complete ?
>>
>>
>>> On Wed, Aug 30, 2017 at 4:07 PM, Atin Mukherjee <amukherj at redhat.com>
>>> wrote:
>>>
>>>> *14:15:57*         # Start Rebalance*14:15:57*         g.log.info("Starting Rebalance on the volume")*14:15:57*         ret, _, _ = rebalance_start(self.mnode, self.volname)*14:15:57*         self.assertEqual(ret, 0, ("Failed to start rebalance on the volume "*14:15:57*                                   "%s", self.volname))*14:15:57*         g.log.info("Successfully started rebalance on the volume %s",*14:15:57*                    self.volname)*14:15:57*     *14:15:57*         # Check Rebalance status*14:15:57*         g.log.info("Checking Rebalance status")*14:15:57*         ret, _, _ = rebalance_status(self.mnode, self.volname)*14:15:57*         self.assertEqual(ret, 0, ("Failed to get rebalance status for the "*14:15:57* >                                 "volume %s", self.volname))*14:15:57* E       AssertionError: ('Failed to get rebalance status for the volume %s', 'testvol_distributed-dispersed')
>>>>
>>>>
>>>> The above is the snip extracted from https://ci.centos.org/view/Glu
>>>> ster/job/gluster_glusto/377/console
>>>>
>>>> If we had gone for rebalance status checks multiple times, I should
>>>> have seen multiple entries of rebalance_status failure or at least a
>>>> difference in time, isn't it?
>>>>
>>>>
>>>> On Wed, Aug 30, 2017 at 3:39 PM, Shwetha Panduranga <
>>>> spandura at redhat.com> wrote:
>>>>
>>>>> Case:
>>>>>
>>>>> 1) add-brick when IO is in progress , wait for 30 seconds
>>>>>
>>>>> 2) Trigger rebalance
>>>>>
>>>>> 3) Execute: 'rebalance status' ( there is no time delay b/w 2) and 3) )
>>>>>
>>>>> 4) wait_for_rebalance_to_complete ( This get's the xml output of
>>>>> rebalance status and keep checking for rebalance status to be 'complete'
>>>>> for every 10 seconds uptil 5 minutes. 5 minutes wait time can be passed as
>>>>> parameter )
>>>>>
>>>>> At every step we check the exit status of the command output. If the
>>>>> exit status is non-zero we fail the test case.
>>>>>
>>>>> -Shwetha
>>>>>
>>>>> On Wed, Aug 30, 2017 at 6:06 AM, Sankarshan Mukhopadhyay <
>>>>> sankarshan.mukhopadhyay at gmail.com> wrote:
>>>>>
>>>>>> On Wed, Aug 30, 2017 at 6:03 AM, Atin Mukherjee <amukherj at redhat.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > On Wed, 30 Aug 2017 at 00:23, Shwetha Panduranga <
>>>>>> spandura at redhat.com>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> Hi Shyam, we are already doing it. we wait for rebalance status to
>>>>>> be
>>>>>> >> complete. We loop. we keep checking if the status is complete for
>>>>>> '20'
>>>>>> >> minutes or so.
>>>>>> >
>>>>>> >
>>>>>> > Are you saying in this test rebalance status was executed multiple
>>>>>> times
>>>>>> > till it succeed? If yes then the test shouldn't have failed. Can I
>>>>>> get to
>>>>>> > access the complete set of logs?
>>>>>>
>>>>>> Would you not prefer to look at the specific test under discussion as
>>>>>> well?
>>>>>> _______________________________________________
>>>>>> Gluster-devel mailing list
>>>>>> Gluster-devel at gluster.org
>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-devel mailing list
>>>>> Gluster-devel at gluster.org
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170830/629ca7a5/attachment-0001.html>


More information about the Gluster-devel mailing list