[Bugs] [Bug 1325491] Daemons cannot connect to GlusterD when management encryption is enabled
bugzilla at redhat.com
bugzilla at redhat.com
Sun Apr 10 04:55:08 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1325491
--- Comment #2 from Vijay Bellur <vbellur at redhat.com> ---
COMMIT: http://review.gluster.org/13931 committed in release-3.7 by Kaushal M
(kaushal at redhat.com)
------
commit 6a1d6da4588726ea0e1d0b0b6eb204a9d829db19
Author: Kaushal M <kaushal at redhat.com>
Date: Thu Apr 7 20:21:18 2016 +0530
socket: Don't cleanup encrypted transport in socket_connect()
..instead cleanup only in socket_poller()
Backport of be99ddd from master
With commit d117466 socket_poller() wasn't launched from socket_connect
(for encrypted connections), if connect() failed. This was done to
prevent the socket private data from being double unreffed, from the
cleanups in both socket_poller() and socket_connect(). This allowed
future reconnects to happen successfully.
If a socket reconnects is sort of decided by the rpc notify function
registered. The above change worked with glusterd, as the glusterd rpc
notify function (glusterd_peer_rpc_notify()) continuously allowed
reconnects on failure.
mgmt_rpc_notify(), the rpc notify function in glusterfsd, behaves
differently.
For a DISCONNECT event, if more volfile servers are available or if more
addresses are available in the dns cache, it allows reconnects. If not
it terminates the program.
For a CONNECT event, it attempts to do a volfile fetch rpc request. If
sending this rpc fails, it immediately terminates the program.
One side effect of commit d117466, was that the encrypted socket was
registered with epoll, unintentionally, on a connect failure. A weird
thing happens because of this. The epoll notifier notifies
mgmt_rpc_notify() of a CONNECT event, instead of a DISCONNECT as
expected. This causes mgmt_rpc_notify() to attempt an unsuccessful
volfile fetch rpc request, and terminate.
(I still don't know why the epoll raises the CONNECT event)
Commit 46bd29e fixed some issues with IPv6 in GlusterFS. This caused
address resolution in GlusterFS to also request of IPv6 addresses
(AF_UNSPEC) instead of just IPv4. On most systems, this causes the IPv6
addresses to be returned first.
GlusterD listens on 0.0.0.0:24007 by default. While this attaches to all
interfaces, it only listens on IPv4 addresses. GlusterFS daemons and
bricks are given 'localhost' as the volfile server. This resolves to
'::1' as the first address.
When using management encryption, the above reasons cause the daemon
processes to fail to fetch volfiles and terminate.
Solution
--------
The solution to this is simple. Instead of cleaning up the encrypted
socket in socket_connect(), launch socket_poller() and let it cleanup
the socket instead. This prevents the unintentional registration with
epoll, and socket_poller() sends the correct events to the rpc notify
functions, which allows proper reconnects to happen.
Change-Id: Idb0c0a828743cccca51cfdd1aa6458cfa0a9d100
BUG: 1325491
Signed-off-by: Kaushal M <kaushal at redhat.com>
Reviewed-on: http://review.gluster.org/13931
Smoke: Gluster Build System <jenkins at build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
Tested-by: Gluster Build System <jenkins at build.gluster.com>
CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=1uHZy83IMY&a=cc_unsubscribe
More information about the Bugs
mailing list