[Gluster-users] DNS resolution failure at boot

Brian Hicks brian at aster.is
Fri Dec 4 17:24:13 UTC 2015


I’m actually not sure what you’re talking about here. Was it the 
reference to `x-systemd.requires`? Where would that go?

Regardless, the network devices *are* up and configured, I can add an 
`ExecStartPre` to the glusterd unit file that reads `/bin/bash -c 
“while ! dig +short {name}.node.consul; sleep 1; done”`. I added one 
of those for each host it needed to contact, and the systemd output file 
showed that the names were resolved before glusterd started up, but 
glusterd failed with the same error about not being able to resolve.

On 4 Dec 2015, at 11:01, Atin Mukherjee wrote:

> You wouldn't need vdsm service here as the mail thread was for an 
> ovirt use
> case. Have you tried changing the service file following what Kaushal
> mentioned in that mail?
>
> -Atin
> Sent from one plus one
> On Dec 4, 2015 10:27 PM, "Brian Hicks" <brian at aster.is> wrote:
>
>> Ah, just tried it on some fresh machines. Looks like the solution 
>> that
>> worked there isn’t making my cluster any happier. Any other 
>> thoughts?
>>
>> (to be clear, looks like that was adding vdsmd-network.service as an 
>> After
>> target, and vdsmd.service as a Before target)
>>
>> On 4 Dec 2015, at 10:06, Atin Mukherjee wrote:
>>
>> You might be experiencing this:
>> https://www.gluster.org/pipermail/gluster-users/2015-November/024292.html
>>
>> -Atin
>> Sent from one plus one
>> On Dec 4, 2015 9:07 PM, "Brian Hicks" brian at aster.is wrote:
>>
>> Hi all,
>>
>> I’m running Gluster 3.7.6 on Centos 7.1, and using Consul for DNS 
>> (for
>> example, putting all the glusterd servers at 
>> glusterfs.service.consul.)
>>
>> I’m seeing odd behavior when I reboot the nodes running glusterd.
>> Basically, it doesn’t seem to be able to resolve names at boot. I 
>> have the
>> default settings as well as using a systemd drop-in file to make sure 
>> that
>> glusterd starts after DNS is active (nothing complex, just After and
>> Require for consul and dnsmasq.) I’ve even tried adding an 
>> ExecStartPre
>> with a bash while loop that runs until dig can resolve the addresses 
>> listed
>> in the log file below. Nothing seems to help, my
>> etc-glusterfs-glusterd.vol.log always contains these lines, and 
>> glusterd
>> fails to start.
>>
>> Oddly, if I run systemctl start glusterd after the boot process 
>> completes,
>> it starts just fine. Is there some other network target I need to 
>> include
>> in my systemd unit file?
>>
>> [2015-12-02 22:50:17.493630] I [MSGID: 100030] 
>> [glusterfsd.c:2318:main]
>> 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 
>> 3.7.6
>> (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
>> [2015-12-02 22:50:17.916025] I [MSGID: 106478] [glusterd.c:1350:init]
>> 0-management: Maximum allowed open file descriptors set to 65536
>> [2015-12-02 22:50:17.916063] I [MSGID: 106479] [glusterd.c:1399:init]
>> 0-management: Using /var/lib/glusterd as working directory
>> [2015-12-02 22:50:17.980724] E 
>> [rpc-transport.c:292:rpc_transport_load]
>> 0-rpc-transport: /usr/lib64/glusterfs/3.7.6/rpc-transport/rdma.so: 
>> cannot
>> open shared object file: No such file or directory
>> [2015-12-02 22:50:17.980743] W 
>> [rpc-transport.c:296:rpc_transport_load]
>> 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is 
>> not
>> valid or not found on this machine
>> [2015-12-02 22:50:17.980753] W 
>> [rpcsvc.c:1597:rpcsvc_transport_create]
>> 0-rpc-service: cannot create listener, initing the transport failed
>> [2015-12-02 22:50:17.980762] E [MSGID: 106243] [glusterd.c:1623:init]
>> 0-management: creation of 1 listeners failed, continuing with 
>> succeeded
>> transport
>> [2015-12-02 22:50:18.605503] I [MSGID: 106228]
>> [glusterd.c:433:glusterd_check_gsync_present] 0-glusterd: 
>> geo-replication
>> module not installed in the system [No such file or directory]
>> [2015-12-02 22:50:18.669326] I [MSGID: 106513]
>> [glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd: 
>> retrieved
>> op-version: 30706
>> [2015-12-02 22:50:27.786383] I [MSGID: 106498]
>> [glusterd-handler.c:3579:glusterd_friend_add_from_peerinfo] 
>> 0-management:
>> connect returned 0
>> [2015-12-02 22:50:27.809153] I 
>> [rpc-clnt.c:984:rpc_clnt_connection_init]
>> 0-management: setting frame-timeout to 600
>> [2015-12-02 22:50:27.809078] I [MSGID: 106498]
>> [glusterd-handler.c:3579:glusterd_friend_add_from_peerinfo] 
>> 0-management:
>> connect returned 0
>> [2015-12-02 22:50:37.844756] E [MSGID: 101075]
>> [common-utils.c:306:gf_resolve_ip6] 0-resolver: getaddrinfo failed 
>> (Name or
>> service not known)
>> [2015-12-02 22:50:37.844822] E
>> [name.c:247:af_inet_client_get_remote_sockaddr] 0-management: DNS
>> resolution failed on host resching-os-control-02.node.consul
>> [2015-12-02 22:50:37.845167] I 
>> [rpc-clnt.c:984:rpc_clnt_connection_init]
>> 0-management: setting frame-timeout to 600
>> [2015-12-02 22:50:37.845259] I [MSGID: 106004]
>> [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: 
>> Peer
>> <resching-os-control-02.node.consul>
>> (<9cf99313-dd68-4ac7-acbb-b018cc167ec2>), in state <Peer in Cluster>, 
>> has
>> disconnected from glusterd.
>> [2015-12-02 22:50:37.845321] E [MSGID: 106155]
>> [glusterd-utils.c:199:glusterd_unlock] 0-management: Cluster lock not 
>> held!
>> [2015-12-02 22:50:47.880585] E [MSGID: 101075]
>> [common-utils.c:306:gf_resolve_ip6] 0-resolver: getaddrinfo failed 
>> (Name or
>> service not known)
>> [2015-12-02 22:50:47.880675] E
>> [name.c:247:af_inet_client_get_remote_sockaddr] 0-management: DNS
>> resolution failed on host resching-os-control-01.node.consul
>> [2015-12-02 22:50:47.880870] I [MSGID: 106004]
>> [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: 
>> Peer
>> <resching-os-control-01.node.consul>
>> (<cc7ced64-e3c2-403d-ae01-59ad3f68d6e6>), in state <Peer in Cluster>, 
>> has
>> disconnected from glusterd.
>> [2015-12-02 22:50:47.880910] E [MSGID: 106155]
>> [glusterd-utils.c:199:glusterd_unlock] 0-management: Cluster lock not 
>> held!
>> [2015-12-02 22:50:51.583949] E [MSGID: 101075]
>> [common-utils.c:306:gf_resolve_ip6] 0-resolver: getaddrinfo failed 
>> (Name or
>> service not known)
>> [2015-12-02 22:50:51.584013] E
>> [name.c:247:af_inet_client_get_remote_sockaddr] 0-management: DNS
>> resolution failed on host resching-os-control-02.node.consul
>> [2015-12-02 22:50:51.584159] I [MSGID: 106004]
>> [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: 
>> Peer
>> <resching-os-control-02.node.consul>
>> (<9cf99313-dd68-4ac7-acbb-b018cc167ec2>), in state <Peer in Cluster>, 
>> has
>> disconnected from glusterd.
>> [2015-12-02 22:50:57.917351] E [MSGID: 106408]
>> [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname]
>> 0-management: error in getaddrinfo: Name or service not known
>> [Unknown error -2]
>> [2015-12-02 22:51:02.605954] E [MSGID: 101075]
>> [common-utils.c:306:gf_resolve_ip6] 0-resolver: getaddrinfo failed 
>> (Name or
>> service not known)
>> [2015-12-02 22:51:02.605990] E
>> [name.c:247:af_inet_client_get_remote_sockaddr] 0-management: DNS
>> resolution failed on host resching-os-control-01.node.consul
>> [2015-12-02 22:51:02.606077] I [MSGID: 106004]
>> [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: 
>> Peer
>> <resching-os-control-01.node.consul>
>> (<cc7ced64-e3c2-403d-ae01-59ad3f68d6e6>), in state <Peer in Cluster>, 
>> has
>> disconnected from glusterd.
>> [2015-12-02 22:51:07.938471] E [MSGID: 101075]
>> [common-utils.c:3127:gf_is_local_addr] 0-management: error in 
>> getaddrinfo:
>> Name or service not known
>>
>> [2015-12-02 22:51:07.938526] E [MSGID: 106187]
>> [glusterd-store.c:4266:glusterd_resolve_all_bricks] 0-glusterd: 
>> resolve
>> brick failed in restore
>> [2015-12-02 22:51:07.938559] E [MSGID: 101019] 
>> [xlator.c:428:xlator_init]
>> 0-management: Initialization of volume 'management' failed, review 
>> your
>> volfile again
>> [2015-12-02 22:51:07.938571] E [graph.c:322:glusterfs_graph_init]
>> 0-management: initializing translator failed
>> [2015-12-02 22:51:07.938579] E [graph.c:661:glusterfs_graph_activate]
>> 0-graph: init failed
>> [2015-12-02 22:51:07.947613] W glusterfsd.c:1236:cleanup_and_exit
>> [0x7fda0f9fc24d] -->/usr/sbin/glusterd(glusterfs_process_volfp+0x126)
>> [0x7fda0f9fc0f6] -->/usr/sbin/glusterd(cleanup_and_exit+0x69)
>> [0x7fda0f9fb6d9] ) 0-: received signum (0), shutting down
>>
>> Thanks,
>> Brian Hicks
>> ------------------------------
>>
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151204/f2540af5/attachment.html>


More information about the Gluster-users mailing list