[Gluster-Maintainers] [Gluster-devel] Another regression in release-3.7 and master

Thu Apr 7 15:51:31 UTC 2016

On Thu, Apr 07, 2016 at 07:24:05PM +0530, Kaushal M wrote:
> On Thu, Apr 7, 2016 at 6:23 PM, Kaushal M <kshlmster at gmail.com> wrote:
> > On Thu, Apr 7, 2016 at 6:00 PM, Atin Mukherjee <amukherj at redhat.com> wrote:
> >>
> >>
> >> On 04/07/2016 05:37 PM, Kaushal M wrote:
> >>>
> >>> On 7 Apr 2016 5:36 p.m., "Niels de Vos" <ndevos at redhat.com
> >>> <mailto:ndevos at redhat.com>> wrote:
> >>>>
> >>>> On Thu, Apr 07, 2016 at 05:13:54PM +0530, Kaushal M wrote:
> >>>> > On Thu, Apr 7, 2016 at 5:11 PM, Kaushal M <kshlmster at gmail.com
> >>> <mailto:kshlmster at gmail.com>> wrote:
> >>>> > > We've hit another regression.
> >>>> > >
> >>>> > > With management encryption enabled, daemons like NFS and SHD don't
> >>>> > > start on the current heads of release-3.7 and master branches.
> >>>> > >
> >>>> > > I still have no clear root cause for it, and would appreciate some
> >>> help.
> >>>> >
> >>>> > This was working with 3.7.9 from what I've heard.
> >>>>
> >>>> Do we have a simple test-case for this? If someone write a script, we
> >>>> should be able to "git bisect" it pretty quickly.
> >>>
> >>> I am doing this right now.
> >> "b33f3c9 glusterd: Bug fixes for IPv6 support" has caused this
> >> regression. I am yet to find the RCA though.
> >
> > git-bisect agrees with this as well.
> >
> > I initially thought it was because GlusterD didn't listen on IPv6
> > (checked using `ss`).
> > This change makes it so that connections to localhost use ::1 instead
> > of 127.0.0.1, and so the connection failed.
> > This should have caused all connection attempts to fail, irrespective
> > of it being encrypted or not.
> > But the failure only happens when management encryption is enabled.
> > So this theory doesn't make sense.
> 
> This is the part of the problem!
> 
> The initial IPv6 connection to ::1 fails for non encrypted connections as well.
> But these connections correctly retry connect with the next address
> once the first connect attempt fails.
> Since the next address is 127.0.0.1, the connection succeeds, volfile
> is fetched and the daemon starts.
> 
> Encrypted connections on the other hand, give up after the first
> failure and don't attempt a reconnect.
> This is somewhat surprising to me, as I'd recently fixed an issue
> which caused crashes when encrypted connections attempted a reconnect
> after a failure to connect.
> 
> I'll diagnose this a little bit more and try to find a solution.

Or revert the change since it was introduced in 3.7.10 and nobody relies
on that yet. Try to get it fixed properly for 3.7.12?

Niels
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/maintainers/attachments/20160407/2f58ea14/attachment.sig>