[Bugs] [Bug 1524058] New: gluster peer command stops working with unhelpful error messages when DNS doens 't work

Sat Dec 9 17:44:38 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1524058

            Bug ID: 1524058
           Summary: gluster peer command stops working with unhelpful
                    error messages when DNS doens't work
           Product: GlusterFS
           Version: mainline
         Component: core
          Assignee: bugs at gluster.org
          Reporter: nh2-redhatbugzilla at deditus.de
                CC: bugs at gluster.org

Description of problem:

Gluster 3.12.3 on Linux.

Consider the following outputs. None of that makes any sense to me:

[root at node-1:~]# time gluster peer probe status
peer probe: success. Host status port 24007 already in peer list

real  0m10.060s

[root at node-1:~]# time gluster peer status
peer status: failed

real  0m0.051s

[root at node-1:~]# time gluster pool list
pool list: failed

real  0m0.050s

[root at node-1:~]# gluster peer probe 10.0.0.1
peer probe: success. Probe on localhost not needed

[root at node-1:~]# gluster peer probe 10.0.0.2
peer probe: success. Host 10.0.0.2 port 24007 already in peer list

[root at node-1:~]# gluster peer detach status
peer detach: failed: One of the peers is probably down. Check with 'peer
status'

[root at node-1:~]# gluster peer status
peer status: failed

First, when I run `gluster peer probe status` (which is not a reasonable
command, as it now thinks that `status` is a hostname), why does it say "peer
probe: success. Host status port 24007 already in peer list"? That makes no
sense, there is no host called "status" in my network.

Next, `gluster peer status` fails; the error message in extremely unhelpful
"peer status: failed" as it contains no information on the failure.

Later probes of e.g. `10.0.0.1` suggest that there's already a working "peer
list" with some contents, but apparently I have no way at all to list those
peers.

When I try to detach the apparently-attached garbage peer called "status", I
get told to run `peer status`, but it doesn't work.

What's going on here?

The glusterd log (/var/log/glusterfs/glusterd.log) gives some insight:

[2017-12-09 17:34:21.858454] I [MSGID: 106487]
[glusterd-handler.c:1485:__glusterd_handle_cli_list_friends] 0-glusterd:
Received cli list req
[2017-12-09 17:34:21.858517] W [dict.c:912:str_to_data]
(-->/nix/store/y9qg9jan88wnsszmb1badhyfak2znpz7-glusterfs-3.12.3/lib/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x104db4)
[0x7f6f54fdadb4]
-->/nix/store/y9qg9jan88wnsszmb1badhyfak2znpz7-glusterfs-3.12.3/lib/libglusterfs.so.0(dict_set_str+0x16)
[0x7f6f60919be6]
-->/nix/store/y9qg9jan88wnsszmb1badhyfak2znpz7-glusterfs-3.12.3/lib/libglusterfs.so.0(str_to_data+0x82)
[0x7f6f60918122] ) 0-dict: value is NULL [Invalid argument]
[2017-12-09 17:34:23.103687] E [MSGID: 101075]
[common-utils.c:320:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Temporary
failure in name resolution)
[2017-12-09 17:34:23.103714] E [name.c:267:af_inet_client_get_remote_sockaddr]
0-management: DNS resolution failed on host status

Looks like the real error is `getaddrinfo failed`, probably some DNS problem on
my system.

So:

* Could `gluster peer status` tell me directly about this problem, instead of
saying "failed"?
* Why do `gluster peer status` and `gluster pool list` fail if DNS doesn't
work? I'd assume if there is a list of hosts, I should be able to view it, any
time.
* What's going on with the weird success message of adding a non-existant host?
* What's up with `0-dict: value is NULL [Invalid argument]`?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.