[Gluster-users] Broken status, peer probe, "DNS resolution failed on host" and "Error disabling sockopt IPV6_V6ONLY: "Protocol not available" after updating from gluster 7.9 to 9.1

Fri Jul 23 04:54:22 UTC 2021

Hi Strahil,

I am using repo builds from
https://download.opensuse.org/repositories/filesystems/openSUSE_Leap_15.2/x86_64/
(currently glusterfs-9.1-lp152.88.2.x86_64.rpm) and don't build them.

Perhaps the builds at
https://download.opensuse.org/repositories/home:/glusterfs:/Leap15.2-9/openSUSE_Leap_15.2/x86_64/
are better (currently glusterfs-9.1-lp152.112.1.x86_64.rpm), does anyone
know?

None of the repos currently have 9.3.

And regardless, I don't care for gluster using IPv6 if IPv4 works fine. Is
there a way to make it stop trying to use IPv6 and only use IPv4?

Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | @ArtemR <http://twitter.com/ArtemR>

On Thu, Jul 22, 2021 at 9:09 PM Strahil Nikolov <hunter86_bg at yahoo.com>
wrote:

> Did you try with latest 9.X ? Based on the release notes that should be
> 9.3 .
>
> Best Regards,
> Strahil Nikolov
>
> On Fri, Jul 23, 2021 at 3:06, Artem Russakovskii
> <archon810 at gmail.com> wrote:
> Hi all,
>
> I just filed this ticket https://github.com/gluster/glusterfs/issues/2648,
> and wanted to bring it to your attention. Any feedback would be appreciated.
>
> Description of problem:
> We have a 4-node replicate cluster running gluster 7.9. I'm currently
> setting up a new cluster on a new set of machines and went straight for
> gluster 9.1.
>
> However, I was unable to probe any servers due to this error:
>
> [2021-07-17 00:31:05.228609 +0000] I [MSGID: 106487] [glusterd-handler.c:1160:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req nexus2 24007
> [2021-07-17 00:31:05.229727 +0000] E [MSGID: 101075] [common-utils.c:3657:gf_is_local_addr] 0-management: error in getaddrinfo [{ret=Name or service not known}]
> [2021-07-17 00:31:05.230785 +0000] E [MSGID: 106408] [glusterd-peer-utils.c:217:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: Name or service not known
>  [Unknown error -2]
> [2021-07-17 00:31:05.353971 +0000] I [MSGID: 106128] [glusterd-handler.c:3719:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: nexus2 (24007)
> [2021-07-17 00:31:05.375871 +0000] W [MSGID: 106061] [glusterd-handler.c:3488:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
> [2021-07-17 00:31:05.375903 +0000] I [rpc-clnt.c:1010:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
> [2021-07-17 00:31:05.377021 +0000] E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not known}]
> [2021-07-17 00:31:05.377043 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host nexus2
> [2021-07-17 00:31:05.377147 +0000] I [MSGID: 106498] [glusterd-handler.c:3648:glusterd_friend_add] 0-management: connect returned 0
> [2021-07-17 00:31:05.377201 +0000] I [MSGID: 106004] [glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer <nexus2> (<00000000-0000-0000-0000-000000000000>), in state <Establishing Connection>, has disconnected from glusterd.
> [2021-07-17 00:31:05.377453 +0000] E [MSGID: 101032] [store.c:464:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
>
> I then wiped the /var/lib/glusterd dir to start clean and downgraded to
> 7.9, then attempted to peer probe again. This time, it worked fine, proving
> 7.9 is working, same as it is on prod.
>
> At this point, I made a volume, started it, and played around with testing
> to my satisfaction. Then I decided to see what would happen if I tried to
> upgrade this working volume from 7.9 to 9.1.
>
> The end result is:
>
>    - gluster volume status is only showing the local gluster node and not
>    any of the remote nodes
>    - data does seem to replicate, so the connection between the servers
>    is actually established
>    - logs are now filled with constantly repeating messages like so:
>
> [2021-07-22 23:29:31.039004 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host nexus2
> [2021-07-22 23:29:31.039212 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host citadel
> [2021-07-22 23:29:31.039304 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host hive
> The message "E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not known}]" repeated 119 times between [2021-07-22 23:27:34.025983 +0000] and [2021-07-22 23:29:31.039302 +0000]
> [2021-07-22 23:29:34.039369 +0000] E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not known}]
> [2021-07-22 23:29:34.039441 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host nexus2
> [2021-07-22 23:29:34.039558 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host citadel
> [2021-07-22 23:29:34.039659 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host hive
> [2021-07-22 23:29:37.039741 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host nexus2
> [2021-07-22 23:29:37.039921 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host citadel
> [2021-07-22 23:29:37.040015 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host hive
>
> When I issue a command in cli:
>
> ==> cli.log <==
> [2021-07-22 23:38:11.802596 +0000] I [cli.c:840:main] 0-cli: Started running gluster with version 9.1
> **[2021-07-22 23:38:11.804007 +0000] W [socket.c:3434:socket_connect] 0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Operation not supported"**
> [2021-07-22 23:38:11.906865 +0000] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=0}]
>
> **Mandatory info:** **- The output of the `gluster volume info` command**:
>
> gluster volume info
>
> Volume Name: ap
> Type: Replicate
> Volume ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 4 = 4
> Transport-type: tcp
> Bricks:
> Brick1: nexus2:/mnt/nexus2_block1/ap
> Brick2: forge:/mnt/forge_block1/ap
> Brick3: hive:/mnt/hive_block1/ap
> Brick4: citadel:/mnt/citadel_block1/ap
> Options Reconfigured:
> performance.client-io-threads: on
> nfs.disable: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> cluster.self-heal-daemon: enable
> client.event-threads: 4
> cluster.data-self-heal-algorithm: full
> cluster.lookup-optimize: on
> cluster.quorum-count: 1
> cluster.quorum-type: fixed
> cluster.readdir-optimize: on
> cluster.heal-timeout: 1800
> disperse.eager-lock: on
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> network.inode-lru-limit: 500000
> network.ping-timeout: 7
> network.remote-dio: enable
> performance.cache-invalidation: on
> performance.cache-size: 1GB
> performance.io-thread-count: 4
> performance.md-cache-timeout: 600
> performance.rda-cache-limit: 256MB
> performance.read-ahead: off
> performance.readdir-ahead: on
> performance.stat-prefetch: on
> performance.write-behind-window-size: 32MB
> server.event-threads: 4
> cluster.background-self-heal-count: 1
> performance.cache-refresh-timeout: 10
> features.ctime: off
> cluster.granular-entry-heal: enable
>
> - The output of the gluster volume status command:
>
> gluster volume status
> Status of volume: ap
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> ------------------------------------------------------------------------------
> Brick forge:/mnt/forge_block1/ap            49152     0          Y       2622
> Self-heal Daemon on localhost               N/A       N/A        N       N/A
>
> Task Status of Volume ap
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
> - The output of the gluster volume heal command:
>
> gluster volume heal ap enable
> Enable heal on volume ap has been successful
>
> gluster volume heal ap
> Launching heal operation to perform index self heal on volume ap has been unsuccessful:
> Self-heal daemon is not running. Check self-heal daemon log file.
>
> - The operating system / glusterfs version:
> OpenSUSE 15.2, glusterfs 9.1.
>
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net | @ArtemR <http://twitter.com/ArtemR>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210722/254d28f3/attachment.html>