[Gluster-devel] [erik.jacobson at hpe.com: [Gluster-users] gluster forcing IPV6 on our IPV4 servers, glusterd fails (was gluster update question regarding new DNS resolution requirement)]

Erik Jacobson erik.jacobson at hpe.com
Tue Sep 21 13:06:58 UTC 2021


Thank you for the ideas so far everybody. I'll just reply to this email
but I appreciated all the notes.

While it's true that we use "A little bit of IPV6 but mostly IPV4" in
our management infrastructure, these gluster servers don't really need
IPV6 at all at this time. They happen to have IPV6 only addresses on
these specific servers due to what appear to be SUSE defaults. On these
servers, we made no intentional IPV6 addresses on our own.

The distro stayed the same (SLES15SP3) and we are using a bonded network
interface for the gluster traffic. All that I changed was gluster 7.9 to
9.3 and hit this problem. I'm not an expert in the network code but that
patch snip I sent seems to not be detecting the situation well for our
specific case.

Coming up in the next few releases, we will start to have more IPV6
options, with an eventual goal of, depending on the customer, allowing
only IPV6 as a choice. At this time, we have mostly IPV4 with a couple
devices that can only talk IPV6. So the problem will only get worse I guess
for the mixed case (which we are not yet suffering from. We're pure ipv4
right now).

Since this was an upgrade, the volumes were made with IPV4 style IP
addresses for the bricks. I'm not an expert but that might be an
indication right there that we aren't using IPV6 for the gluster part.
Down the line, if we make new volumes or change existing ones and
specify IPV6 addresses, then maybe it could see it's an IPV6 style
address and pick that stack?  I'm just making stuff up here, I'm not an
expert in IPV6 but will probably know lots more in a couple months :) :)

Thank you all.

We will run with the "incorrect but works for now" patch for now. Let me
know if you want us to try something.

Example ip address information ('bond0' used for the gluster part):

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
14: ens1f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 48:df:37:bd:94:a0 brd ff:ff:ff:ff:ff:ff
    altname enp18s0f0
15: ens1f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 48:df:37:bd:94:a8 brd ff:ff:ff:ff:ff:ff
    altname enp18s0f1
16: eno1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 54:80:28:65:b5:d9 brd ff:ff:ff:ff:ff:ff
    altname enp2s0f0
17: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 54:80:28:65:b5:da brd ff:ff:ff:ff:ff:ff
    altname enp2s0f1
18: eno3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 54:80:28:65:b5:db brd ff:ff:ff:ff:ff:ff
    altname enp2s0f2
19: eno4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 54:80:28:65:b5:dc brd ff:ff:ff:ff:ff:ff
    altname enp2s0f3
20: enp1s0f4u4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 36:7c:3f:0c:98:2d brd ff:ff:ff:ff:ff:ff
21: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 48:df:37:bd:94:a0 brd ff:ff:ff:ff:ff:ff
    inet 172.23.100.11/16 brd 172.23.255.255 scope global bond0
       valid_lft forever preferred_lft forever
    inet 172.24.255.241/16 brd 172.24.255.255 scope global bond0:bmc
       valid_lft forever preferred_lft forever
    inet 172.23.255.243/16 brd 172.23.255.255 scope global secondary bond0
       valid_lft forever preferred_lft forever
    inet6 fe80::4adf:37ff:febd:94a0/64 scope link
       valid_lft forever preferred_lft forever




Example SUSE ifcfg-bond0 file:

STARTMODE='onboot'
BOOTPROTO='static'
IPADDR='172.23.100.11'
NETMASK='255.255.0.0'
# SU leaders require access to BMCs bond0:bmc
IPADDR_bmc=172.24.255.241
NETMASK_bmc=255.255.0.0
LABEL_bmc=bmc
BONDING_MASTER='yes'
BONDING_SLAVE_0='ens1f0'
BONDING_MODULE_OPTS='mode=active-backup all_slaves_active=1  miimon=100' # backup mode
LINK_REQUIRED='no'



Best wishes,

Erik


On Tue, Sep 21, 2021 at 01:15:44PM +0100, Paul Jakma wrote:
> Hi,
> 
> I'd love to have more of a discussion on this. There are issues in the code
> around IPv4 assumptions, and wider issues around identity that make Gluster
> hard to operate on a dual-stack / many-network setup.
> 
> E.g., the assumption that a peer IP will have a hostname that resolves to
> the same IP.
> 
> Paul
> 
> On Tue, 21 Sep 2021, Mohit Agrawal wrote:
> 
> > Hi,
> > 
> >  In gluster we do support one kind of address family (either IPV4 or IPV6)
> > and it depends on the user
> >  what address family they want to use.It is a configurable option a user
> > can set the value in volfile . Here
> >  It seems you are facing an issue as you mentioned " HPE is forcing IPV4
> > servers in to to an IPV6 address", i
> >  think you can avoid an error if u pass address-family=inet6(
> > xlator-option="transport.address-family=inet6")
> >  during mount a volume  and in that case you should not face an issue.
> >  For more please refer this https://github.com/gluster/glusterfs/pull/2666
> > 
> > Thanks,
> > Mohit Agrawal
> > 
> > 
> > 
> > On Tue, Sep 21, 2021 at 2:42 PM Erik Jacobson <erik.jacobson at hpe.com> wrote:
> > 
> > > Dear devel team -
> > > 
> > > I botched the email address here. I type "hpcm-devel" like 30 times a
> > > day so I mistyped that. Sorry about that.
> > > 
> > > Any advice appreciated and see attached patch that "gets it going for
> > > us" but obviously not something you could release.
> > > 
> > > Erik
> > > 
> > > 
> > > 
> > > ---------- Forwarded message ----------
> > > From: Erik Jacobson <erik.jacobson at hpe.com>
> > > To: gluster-users at gluster.org, hpcm-devel at gluster.org
> > > Cc:
> > > Bcc:
> > > Date: Mon, 20 Sep 2021 16:46:12 -0500
> > > Subject: [Gluster-users] gluster forcing IPV6 on our IPV4 servers,
> > > glusterd fails (was gluster update question regarding new DNS resolution
> > > requirement)
> > > I pretended I'm a low-level C programmer with network and filesystem
> > > experience for a few hours.
> > > 
> > > I'm not sure what the right solution is but what was happening was the
> > > code was trying to treat our IPV4 hosts as AF_INET6 and the family was
> > > incompatible with our IPV4 IP addresses. Yes, we need to move to IPV6
> > > but we're hoping to do that on our own time (~50 years like everybody
> > > else :)
> > > 
> > > I found a chunk of the code that seemed to be force-setting us to
> > > AF_INET6.
> > > 
> > > While I'm sure it is not 100% the correct patch, the patch attached and
> > > pasted below is working for me so I'll integrate it with our internal
> > > build to continue testing.
> > > 
> > > Please let me know if there is a configuration item I missed or a
> > > different way to do this. I added -devel to this email.
> > > 
> > > In the previous thread, you would have seen that we're testing a
> > > hopeful change that will upgrade our deployed customers from gluster
> > > 7.9 to gluster 9.3.
> > > 
> > > Thank you!! Advice on next steps would be appreciated !!
> > > 
> > > 
> > > diff -Narup glusterfs-9.3-ORIG/rpc/rpc-transport/socket/src/name.c
> > > glusterfs-9.3-NEW/rpc/rpc-transport/socket/src/name.c
> > > --- glusterfs-9.3-ORIG/rpc/rpc-transport/socket/src/name.c      2021-06-29
> > > 00:27:44.381408294 -0500
> > > +++ glusterfs-9.3-NEW/rpc/rpc-transport/socket/src/name.c       2021-09-20
> > > 16:34:28.969425361 -0500
> > > @@ -252,9 +252,16 @@ af_inet_client_get_remote_sockaddr(rpc_t
> > >      /* Need to update transport-address family if address-family is not
> > > provided
> > >         to command-line arguments
> > >      */
> > > +    /* HPE This is forcing our IPV4 servers in to to an IPV6 address
> > > +     * family that is not compatible with IPV4. For now we will just set
> > > it
> > > +     * to AF_INET.
> > > +     */
> > > +    /*
> > >      if (inet_pton(AF_INET6, remote_host, &serveraddr)) {
> > >          sockaddr->sa_family = AF_INET6;
> > >      }
> > > +    */
> > > +    sockaddr->sa_family = AF_INET;
> > > 
> > >      /* TODO: gf_resolve is a blocking call. kick in some
> > >         non blocking dns techniques */
> > > 
> > > 
> > > On Mon, Sep 20, 2021 at 11:35:35AM -0500, Erik Jacobson wrote:
> > > > I missed the other important log snip:
> > > > 
> > > > The message "E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6]
> > > 0-resolver: error in getaddrinfo [{family=10}, {ret=Address family for
> > > hostname not supported}]" repeated 620 times between [2021-09-20
> > > 15:49:23.720633 +0000] and [2021-09-20 15:50:41.731542 +0000]
> > > > 
> > > > So I will dig in to the code some here.
> > > > 
> > > > 
> > > > On Mon, Sep 20, 2021 at 10:59:30AM -0500, Erik Jacobson wrote:
> > > > > Hello all! I hope you are well.
> > > > > 
> > > > > We are starting a new software release cycle and I am trying to find a
> > > > > way to upgrade customers from our build of gluster 7.9 to our build of
> > > > > gluster 9.3
> > > > > 
> > > > > When we deploy gluster, we foribly remove all references to any host
> > > > > names and use only IP addresses. This is because, if for any reason a
> > > > > DNS server is unreachable, even if the peer files have IPs and DNS, it
> > > > > causes glusterd to be unable to reach peers properly. We can't really
> > > > > rely on /etc/hosts either because customers take artistic licene with
> > > > > their /etc/hosts files and don't realize that problems that can cause.
> > > > > 
> > > > > So our deployed peer files look something like this:
> > > > > 
> > > > > uuid=46a4b506-029d-4750-acfb-894501a88977
> > > > > state=3
> > > > > hostname1=172.23.0.16
> > > > > 
> > > > > That is, with full intention, we avoid host names.
> > > > > 
> > > > > When we upgrade to gluster 9.3, we fall over with these errors and
> > > > > gluster is now partitioned and the updated gluster servers can't reach
> > > > > anybody:
> > > > > 
> > > > > [2021-09-20 15:50:41.731543 +0000] E
> > > [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS
> > > resolution failed on host 172.23.0.16
> > > > > 
> > > > > 
> > > > > As you can see, we have defined on purpose everything using IPs but in
> > > > > 9.3 it appears this method fails. Are there any suggestions short of
> > > > > putting real host names in peer files?
> > > > > 
> > > > > 
> > > > > 
> > > > > FYI
> > > > > 
> > > > > This supercomputer will be using gluster for part of its system
> > > > > management. It is how we deploy the Image Objects (squashfs images)
> > > > > hosted on NFS today and served by gluster leader nodes and also store
> > > > > system logs, console logs, and other data.
> > > > > 
> > > > > https://www.olcf.ornl.gov/frontier/
> > > > > 
> > > > > 
> > > > > Erik
> > > > > ________
> > > > > 
> > > > > 
> > > > > 
> > > > > Community Meeting Calendar:
> > > > > 
> > > > > Schedule -
> > > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > > > > Bridge: https://meet.google.com/cpu-eiue-hvk
> > > > > Gluster-users mailing list
> > > > > Gluster-users at gluster.org
> > > > > https://lists.gluster.org/mailman/listinfo/gluster-users
> > > > ________
> > > > 
> > > > 
> > > > 
> > > > Community Meeting Calendar:
> > > > 
> > > > Schedule -
> > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > > > Bridge: https://meet.google.com/cpu-eiue-hvk
> > > > Gluster-users mailing list
> > > > Gluster-users at gluster.org
> > > > https://lists.gluster.org/mailman/listinfo/gluster-users
> > > ________
> > > 
> > > 
> > > 
> > > Community Meeting Calendar:
> > > 
> > > Schedule -
> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > > Bridge: https://meet.google.com/cpu-eiue-hvk
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > https://lists.gluster.org/mailman/listinfo/gluster-users
> > > -------
> > > 
> > > Community Meeting Calendar:
> > > Schedule -
> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > > Bridge: https://meet.google.com/cpu-eiue-hvk
> > > 
> > > Gluster-devel mailing list
> > > Gluster-devel at gluster.org
> > > https://lists.gluster.org/mailman/listinfo/gluster-devel
> > > 
> > > 
> > 
> 
> -- 
> Paul Jakma | paul at jakma.org | @pjakma | Key ID: 0xD86BF79464A2FF6A
> Fortune:
> About the only thing on a farm that has an easy time is the dog.


More information about the Gluster-devel mailing list