[Gluster-devel] Release 3.12: Glusto run status

Atin Mukherjee amukherj at redhat.com
Wed Aug 30 10:53:37 UTC 2017


Ok, Nigel helped me in understanding the trace back time is not something
we should look and the right way to dig through this problem is by looking
at glusto logs. As per the last rebalance instance from the log I see the
following:

volume rebalance: testvol_distributed-dispersed: success: Rebalance on
testvol_distributed-dispersed has been started successfully.   Use
rebalance status command to check status of the rebalance process.
ID:
96c645c7-710c-4c4c-a434-4157cbb75753


2017-08-28 15:13:58,952 INFO (test_expanding_volume_when_io_in_progress)
Successfully started rebalance on the volume
testvol_distributed-dispersed
2017-08-28 15:13:58,952 INFO (test_expanding_volume_when_io_in_progress)
Checking Rebalance status
2017-08-28 15:13:58,953 INFO (run) root at 172.19.2.86 (cp): gluster volume
rebalance testvol_distributed-dispersed status
2017-08-28 15:13:58,953 DEBUG (_get_ssh_connection) Retrieved connection
from cache: root at 172.19.2.86
2017-08-28 15:13:58,993 INFO (_log_results) RETCODE (root at 172.19.2.86):
1
2017-08-28 15:13:58,994 INFO (_log_results) STDERR
(root at 172.19.2.86)...

volume rebalance: testvol_distributed-dispersed: failed: error

Again here I see the rebalance status failure was logged at 15:13:58,994
where as rebalance start was triggered at 15:13:58,952.

@Shwetha - could you help me in understanding how do we log the rebalance
status ret code in glusto log?


On Wed, Aug 30, 2017 at 4:07 PM, Atin Mukherjee <amukherj at redhat.com> wrote:

> *14:15:57*         # Start Rebalance*14:15:57*         g.log.info("Starting Rebalance on the volume")*14:15:57*         ret, _, _ = rebalance_start(self.mnode, self.volname)*14:15:57*         self.assertEqual(ret, 0, ("Failed to start rebalance on the volume "*14:15:57*                                   "%s", self.volname))*14:15:57*         g.log.info("Successfully started rebalance on the volume %s",*14:15:57*                    self.volname)*14:15:57*     *14:15:57*         # Check Rebalance status*14:15:57*         g.log.info("Checking Rebalance status")*14:15:57*         ret, _, _ = rebalance_status(self.mnode, self.volname)*14:15:57*         self.assertEqual(ret, 0, ("Failed to get rebalance status for the "*14:15:57* >                                 "volume %s", self.volname))*14:15:57* E       AssertionError: ('Failed to get rebalance status for the volume %s', 'testvol_distributed-dispersed')
>
>
> The above is the snip extracted from https://ci.centos.org/view/
> Gluster/job/gluster_glusto/377/console
>
> If we had gone for rebalance status checks multiple times, I should have
> seen multiple entries of rebalance_status failure or at least a difference
> in time, isn't it?
>
>
> On Wed, Aug 30, 2017 at 3:39 PM, Shwetha Panduranga <spandura at redhat.com>
> wrote:
>
>> Case:
>>
>> 1) add-brick when IO is in progress , wait for 30 seconds
>>
>> 2) Trigger rebalance
>>
>> 3) Execute: 'rebalance status' ( there is no time delay b/w 2) and 3) )
>>
>> 4) wait_for_rebalance_to_complete ( This get's the xml output of
>> rebalance status and keep checking for rebalance status to be 'complete'
>> for every 10 seconds uptil 5 minutes. 5 minutes wait time can be passed as
>> parameter )
>>
>> At every step we check the exit status of the command output. If the exit
>> status is non-zero we fail the test case.
>>
>> -Shwetha
>>
>> On Wed, Aug 30, 2017 at 6:06 AM, Sankarshan Mukhopadhyay <
>> sankarshan.mukhopadhyay at gmail.com> wrote:
>>
>>> On Wed, Aug 30, 2017 at 6:03 AM, Atin Mukherjee <amukherj at redhat.com>
>>> wrote:
>>> >
>>> > On Wed, 30 Aug 2017 at 00:23, Shwetha Panduranga <spandura at redhat.com>
>>> > wrote:
>>> >>
>>> >> Hi Shyam, we are already doing it. we wait for rebalance status to be
>>> >> complete. We loop. we keep checking if the status is complete for '20'
>>> >> minutes or so.
>>> >
>>> >
>>> > Are you saying in this test rebalance status was executed multiple
>>> times
>>> > till it succeed? If yes then the test shouldn't have failed. Can I get
>>> to
>>> > access the complete set of logs?
>>>
>>> Would you not prefer to look at the specific test under discussion as
>>> well?
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170830/086636f7/attachment-0001.html>


More information about the Gluster-devel mailing list