[Cinder.glusterfs.ci] [Third-party-announce] Nexenta Edge Cinder CI Having Constant Failures

Fri Jun 12 21:07:15 UTC 2015

Hi Mike,

Thank you very much for this update!

I reviewed these failures, there are a few things going on:

1. A chunk of those items are starting from yesterday morning (June 11,
PST) when we were doing some migration of CI environments during which we
had some issues, hence a few quick failures. In the future we will be
smarter about minimizing failures when changing CI infrastructure.

2. We were seeing many ssh timeouts in the days before that, and after the
migration (hardware reorganization, which we are still scrambling over) we
ended up having that timeout hit again, until we figured out some virtual
network interface driver optimization for our hypervisor and it seems more
stable now. We are also adding in SSDs as we speak, and tweaking the
environment to further optimize. It seems to be running a lot better
already, and there were a few recent successful runs.

3. During our migration, a certain tweak to the NEDGE backend was lost
(increasing of internal timeout) which is a tweak that highly improved
success rate during a working test run (as opposed to the many 6min
failures caused by our CI infrastructure rather than the backend.) We are
currently also scrambling over backend hardware too, and may not need that
tweak in the future, but I just recently put it back in and it looks like
those internal timeouts are gone again.

So to summarize, a lot of these failures are due to our CI infrastructure
issues and non-backend related, many of the failures before today's morning
are <6min failures caused by ssh timeout to the instance where the backend
was untouched. Now there do exist some issues with the backend itself, but
they are minor, and we are working through them by improving backend
hardware as well as continued effort on NexentaEdge source code.

Finally we are doing our best to give timely responses to comments and
questions. The Nexenta Edge review was updated with new driver changes
based on comments from today, and all other questions, comments and
suggestions are very welcomed and highly appreciated!

Thanks again

--
Zohar

On Fri, Jun 12, 2015 at 9:27 AM, Mike Perez <thingee at gmail.com> wrote:

> This just today:
>
> https://review.openstack.org/#/c/190677/ failed
> https://review.openstack.org/#/c/190725/ failed
> https://review.openstack.org/#/c/187624/ failed
> https://review.openstack.org/#/c/190169/ failed
> https://review.openstack.org/#/c/189135/ failed
> https://review.openstack.org/#/c/185906/ failed
> https://review.openstack.org/#/c/190173/ failed
> https://review.openstack.org/#/c/186580/ failed
> https://review.openstack.org/#/c/189517/ failed
> https://review.openstack.org/#/c/185545/ failed
> https://review.openstack.org/#/c/184596/ failed
> https://review.openstack.org/#/c/182985/ failed
> https://review.openstack.org/#/c/189720/ failed
> https://review.openstack.org/#/c/188649/ failed
> https://review.openstack.org/#/c/178573/ failed
>
> What's going on here, and when will this be corrected. This is
> publishing false failures to Cinder patches.
>
>
> --
> Mike Perez
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/cinder.glusterfs.ci/attachments/20150612/286e8406/attachment-0001.html>
-------------- next part --------------
_______________________________________________
Third-party-announce mailing list
Third-party-announce at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/third-party-announce