[Gluster-infra] regression machines reporting slowly ? here is the reason ...

Prasanna Kalever pkalever at redhat.com
Sun Apr 24 10:52:55 UTC 2016


On Sun, Apr 24, 2016 at 7:11 AM, Vijay Bellur <vbellur at redhat.com> wrote:
> On Sat, Apr 23, 2016 at 9:30 AM, Prasanna Kalever <pkalever at redhat.com> wrote:
>> Hi all,
>>
>> Noticed our regression machines are reporting back really slow,
>> especially CentOs and Smoke
>>
>> I found that most of the slaves are marked offline, this could be the
>> biggest reasons ?
>>
>>
>
> Regression machines are scheduled to be offline if there are no active
> jobs. I wonder if the slowness is related to LVM or related factors as
> detailed in a recent thread?
>

Sorry, the previous mail was sent incomplete (blame some Gmail shortcut)

Hi Vijay,

Honestly I was not aware of this case where the machines move to
offline state by them self, I was only aware that they just go to idle
state,
Thanks for sharing that information. But we still need to reclaim most
of machines, Here are the reasons why each of them are offline.


CentOs slaves:     Hardly (2/14) salves are online [1]

slave20.cloud.gluster.org (online)
slave21.cloud.gluster.org [Offline Reason: This node is offline
because Jenkins failed to launch the slave agent on it.]
slave22.cloud.gluster.org (online)
slave23.cloud.gluster.org [Offline Reason: This node is offline
because Jenkins failed to launch the slave agent on it.]
slave24.cloud.gluster.org [Offline Reason: This node is offline
because Jenkins failed to launch the slave agent on it.]
slave25.cloud.gluster.org [Offline Reason: This node is offline
because Jenkins failed to launch the slave agent on it.]
slave26.cloud.gluster.org [Offline Reason: This node is offline
because Jenkins failed to launch the slave agent on it.]
slave27.cloud.gluster.org [Offline Reason: Disconnected by rastar :
rastar taking this down for pranith. Needed for debugging with tar
issue.  Apr 20, 2016 3:44:14 AM]
slave28.cloud.gluster.org [Offline Reason: This node is offline
because Jenkins failed to launch the slave agent on it.]
slave29.cloud.gluster.org [Offline Reason: This node is offline
because Jenkins failed to launch the slave agent on it.]

slave32.cloud.gluster.org [Offline Reason: idle]
slave33.cloud.gluster.org [Offline Reason: idle]
slave34.cloud.gluster.org [Offline Reason: idle]

slave46.cloud.gluster.org [Offline Reason: This node is offline
because Jenkins failed to launch the slave agent on it.]




Smoke slaves:      Hardly (2/15) slaves are online [2]

slave20.cloud.gluster.org (onine)
slave21.cloud.gluster.org [Offline Reason: This node is offline
because Jenkins failed to launch the slave agent on it.]
slave22.cloud.gluster.org (online)
slave23.cloud.gluster.org [Offline Reason: This node is offline
because Jenkins failed to launch the slave agent on it.]
slave24.cloud.gluster.org [Offline Reason: This node is offline
because Jenkins failed to launch the slave agent on it.]
slave25.cloud.gluster.org [Offline Reason: This node is offline
because Jenkins failed to launch the slave agent on it.]
slave26.cloud.gluster.org [Offline Reason: This node is offline
because Jenkins failed to launch the slave agent on it.]
slave27.cloud.gluster.org [Offline Reason: Disconnected by rastar :
rastar taking this down for pranith. Needed for debugging with tar
issue.Apr 20, 2016 3:44:14 AM]
slave28.cloud.gluster.org [Offline Reason: This node is offline
because Jenkins failed to launch the slave agent on it.]
slave29.cloud.gluster.org [Offline Reason: This node is offline
because Jenkins failed to launch the slave agent on it.]

slave32.cloud.gluster.org [Offline Reason: idle]
slave33.cloud.gluster.org [Offline Reason: idle]
slave34.cloud.gluster.org [Offline Reason: idle]

slave46.cloud.gluster.org [Offline Reason: This node is offline
because Jenkins failed to launch the slave agent on it.]
slave47.cloud.gluster.org [Offline Reason: idle]




Netbsd slaves:       Only (6 /11) are online [3]

nbslave71.cloud.gluster.org (online)
nbslave72.cloud.gluster.org [Offline Reason: This node is offline
because Jenkins failed to launch the slave agent on it.]
nbslave74.cloud.gluster.org [Ofline Reason: Disconnected by kaushal
Mar 21, 2016 10:59:43 PM]
nbslave75.cloud.gluster.org (online)
nbslave77.cloud.gluster.org (online)
nbslave79.cloud.gluster.org (online)

nbslave7c.cloud.gluster.org (online)
nbslave7g.cloud.gluster.org [Ofline Reason: Disconnected by rastar :
anoop is using this to debug netbsd related issue Mar 29, 2016 2:27:20
AM]
nbslave7h.cloud.gluster.org [Ofline Reason: Disconnected by kaushal
Apr 13, 2016 3:15:06 AM]
nbslave7i.cloud.gluster.org [Ofline Reason: Disconnected by jdarcy :
Consistently generating spurious failures due to ping timeouts. This
costs people *hours* for a platform nobody uses except as a test for
perfused. Feb 27, 2016 9:09:09 PM]
nbslave7j.cloud.gluster.org (online)


Summary:

For CentOs Regressions: 9/14 slaves were completely down  [not just idle]
For Smoke: 9/15 slaves were completely down
For Netbsd Regressions: 5/11 slaves were completely down.

IIRC, for CentOs regression and Smoke jobs we use common machines. so,
 9 (CR+S) + 5 (NR) = 14 slaves were down. So on total (Centos [+ Smoke
] + Netbsd) 14/26 machines were down [Not just due to Idle state]



https://build.gluster.org/label/rackspace_regression_2gb/
https://build.gluster.org/label/smoke_tests/
https://build.gluster.org/label/netbsd7_regression/

Thanks,
--
Prasanna


> -Vijay


More information about the Gluster-infra mailing list