<div dir="ltr">Thanks misc. I have always seen a pattern that on a reattempt (recheck centos) the same builder is picked up many time even though it's promised to pick up the builders in a round robin manner.<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Apr 4, 2019 at 7:24 PM Michael Scherer <<a href="mailto:mscherer@redhat.com">mscherer@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Le jeudi 04 avril 2019 à 15:19 +0200, Michael Scherer a écrit :<br>
> Le jeudi 04 avril 2019 à 13:53 +0200, Michael Scherer a écrit :<br>
> > Le jeudi 04 avril 2019 à 16:13 +0530, Atin Mukherjee a écrit :<br>
> > > Based on what I have seen that any multi node test case will fail<br>
> > > and<br>
> > > the<br>
> > > above one is picked first from that group and If I am correct<br>
> > > none<br>
> > > of<br>
> > > the<br>
> > > code fixes will go through the regression until this is fixed. I<br>
> > > suspect it<br>
> > > to be an infra issue again. If we look at<br>
> > > <a href="https://review.gluster.org/#/c/glusterfs/+/22501/" rel="noreferrer" target="_blank">https://review.gluster.org/#/c/glusterfs/+/22501/</a> &<br>
> > > <a href="https://build.gluster.org/job/centos7-regression/5382/" rel="noreferrer" target="_blank">https://build.gluster.org/job/centos7-regression/5382/</a> peer<br>
> > > handshaking is<br>
> > > stuck as 127.1.1.1 is unable to receive a response back, did we<br>
> > > end<br>
> > > up<br>
> > > having firewall and other n/w settings screwed up? The test never<br>
> > > fails<br>
> > > locally.<br>
> > <br>
> > The firewall didn't change, and since the start has a line:<br>
> > "-A INPUT -i lo -j ACCEPT", so all traffic on the localhost<br>
> > interface<br>
> > work. (I am not even sure that netfilter do anything meaningful on<br>
> > the<br>
> > loopback interface, but maybe I am wrong, and not keen on looking<br>
> > kernel code for that).<br>
> > <br>
> > <br>
> > Ping seems to work fine as well, so we can exclude a routing issue.<br>
> > <br>
> > Maybe we should look at the socket, does it listen to a specific<br>
> > address or not ?<br>
> <br>
> So, I did look at the 20 first ailure, removed all not related to<br>
> rebal-all-nodes-migrate.t and seen all were run on builder203, who<br>
> was<br>
> freshly reinstalled. As Deepshika noticed today, this one had a issue<br>
> with ipv6, the 2nd issue we were tracking.<br>
> <br>
> Summary, rpcbind.socket systemd unit listen on ipv6 despites ipv6<br>
> being<br>
> disabled, and the fix is to reload systemd. We have so far no idea on<br>
> why it happen, but suspect this might be related to the network issue<br>
> we did identify, as that happen only after a reboot, that happen only<br>
> if a build is cancelled/crashed/aborted.<br>
> <br>
> I apply the workaround on builder203, so if the culprit is that<br>
> specific issue, guess that's fixed. <br>
> <br>
> I started a test to see how it go:<br>
> <a href="https://build.gluster.org/job/centos7-regression/5383/" rel="noreferrer" target="_blank">https://build.gluster.org/job/centos7-regression/5383/</a><br>
<br>
The test did just pass, so I would assume the problem was local to<br>
builder203. Not sure why it was always selected, except because this<br>
was the only one that failed, so was always up for getting new jobs. <br>
<br>
Maybe we should increase the number of builder so this doesn't happen,<br>
as I guess the others builders were busy at that time ?<br>
<br>
-- <br>
Michael Scherer<br>
Sysadmin, Community Infrastructure and Platform, OSAS<br>
<br>
<br>
</blockquote></div>