[Gluster-devel] Issues with encrypted management on 3.8.0

Mon Jun 20 06:59:34 UTC 2016

On Fri, Jun 17, 2016 at 7:39 PM, Michael Wyraz <michael at wyraz.de> wrote:
> Hi,
>
> I have set up a 3-node-cluster (4 bricks each node, 3 replicas). Without TLS
> on management connection, everything is fine. With TLS I get several errors
> with different symptomps. Basically the errors causes some daemons not to
> start. So e.g. Quota is not working correctly since quota daemon cannot be
> started.
>

This is a known issue, that should have been documented in the release
notes. My bad that I didn't do it.
I've added this issue, cause and workaround to an existing bug about
re-connection issues with encrypted connections in glusterfs [1].

tl;dr:
This happens because the GlusterFS rpc layer is configured in 3.8 to
use AF_UNSPEC getaddrinfo(), which returns both IPv6 and IPv4
addresses.
Most systems are configured to return IPv6 addresses with a higher priority.
GlusterD doesn't listen on IPv6 addresses yet (only on 0.0.0.0). So
the initial connection attempt to glusterd fails.
Encrypted connections have issues right now, that prevent
reconnections with the next addresses (IPv4 mostly) from happening.
The daemons fail to start, because they cannot connect to glusterd to
get volfiles.
This doesn't happen with non-encrypted connections because
reconnection works properly.

2 workarounds,
- Increase preference for IPv4 addresses in getaddinfo by editing /etc/gai.conf
Or,
- Edit /etc/hosts and remove IPv6 entry for localhost

~kaushal

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1333317#c11

> My environment is:
>
> Debian Jessie, "LATEST" glusterfs repository
>
> On each node in /etc/hosts:
>    xxx.xxx.xxx.xxx full.qualified.node.name
> for each node including the local one.
>
> Certs:
> - Created a self signed CA Cert with XCA /etc/ssl/glusterfs.ca
>
> Certificate:
>     Data:
>         Version: 3 (0x2)
>         Serial Number: 1 (0x1)
>     Signature Algorithm: sha224WithRSAEncryption
>         Issuer: CN=cluster1.backups....
>         Validity
>             Not Before: Jun 17 00:00:00 2016 GMT
>             Not After : Jun 16 23:59:59 2041 GMT
>         Subject: CN=cluster1.backups....
>         Subject Public Key Info:
>             Public Key Algorithm: rsaEncryption
>                 Public-Key: (4096 bit)
>                 Modulus:
>                     ...
>         X509v3 extensions:
>             X509v3 Basic Constraints: critical
>                 CA:TRUE
>             X509v3 Subject Key Identifier:
>                 FE:BD:92:1D:8D:B5:DB:42:32:7E:BC:A3:EC:15:0D:D3:9F:64:34:69
>             X509v3 Key Usage:
>                 Certificate Sign, CRL Sign
>             Netscape Cert Type:
>                 SSL CA, S/MIME CA, Object Signing CA
>             Netscape Comment:
>                 xca certificate
>     Signature Algorithm: sha224WithRSAEncryption
>              ....
>
> - Created a Cert for each node /etc/ssl/glusterfs.pem
>
> Certificate:
>     Data:
>         Version: 3 (0x2)
>         Serial Number: 4 (0x4)
>     Signature Algorithm: sha256WithRSAEncryption
>         Issuer: CN=cluster1.backups....
>         Validity
>             Not Before: Jun 17 00:00:00 2016 GMT
>             Not After : Jun 16 23:59:59 2041 GMT
>         Subject: CN=c1-m3.cluster1.backups....
>         Subject Public Key Info:
>             Public Key Algorithm: rsaEncryption
>                 Public-Key: (4096 bit)
>                 Modulus:
>                     ...
>         X509v3 extensions:
>             X509v3 Basic Constraints: critical
>                 CA:FALSE
>             X509v3 Subject Key Identifier:
>                 35:36:9D:37:BC:AA:59:34:94:3D:D9:20:73:17:74:08:CA:AA:9F:FA
>             X509v3 Key Usage:
>                 Digital Signature, Non Repudiation, Key Encipherment
>             Netscape Cert Type:
>                 SSL Server
>             Netscape Comment:
>                 xca certificate
>     Signature Algorithm: sha256WithRSAEncryption
>          ...
>
> - Put the Cert private key to /etc/ssl/glusterfs.key
> - Created 4096 bit dh params to /etc/ssl/dhparam.pem
>
> Here are the corresponding error logs when I start the volume with TLS
> enabled from this node (other nodes are similar):
>
>
> ==> /var/log/glusterfs/cli.log <==
> [2016-06-17 13:44:22.577033] I [cli.c:730:main] 0-cli: Started running
> gluster with version 3.8.0
> [2016-06-17 13:44:22.580865] I [socket.c:4047:socket_init] 0-glusterfs: SSL
> support for glusterd is ENABLED
> [2016-06-17 13:44:22.581855] I [socket.c:4047:socket_init] 0-glusterfs: SSL
> support for glusterd is ENABLED
> [2016-06-17 13:44:22.654191] I [MSGID: 101190]
> [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with
> index 1
> [2016-06-17 13:44:22.654277] W [socket.c:696:__socket_rwv] 0-glusterfs:
> readv on /var/run/gluster/quotad.socket failed (Invalid argument)
>
> ==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log <==
> [2016-06-17 13:44:22.668903] W [common-utils.c:1805:gf_string2boolean]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.8.0/xlator/mgmt/glusterd.so(+0xe94e0)
> [0x7fc6ef8ab4e0]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.8.0/xlator/mgmt/glusterd.so(+0xb14f0)
> [0x7fc6ef8734f0]
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_string2boolean+0x151)
> [0x7fc6f497aeb1] ) 0-management: argument invalid [Invalid argument]
> [2016-06-17 13:44:22.723185] I [socket.c:454:ssl_setup_connection]
> 0-socket.management: peer CN = c1-m1.cluster1.backups.evermind.de
> [2016-06-17 13:44:22.735305] I [MSGID: 106143]
> [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick
> /vol/vol1/brick1 on port 49152
> [2016-06-17 13:44:22.737439] I [rpc-clnt.c:991:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2016-06-17 13:44:22.737547] I [socket.c:4047:socket_init] 0-management: SSL
> support for glusterd is ENABLED
> [2016-06-17 13:44:22.801368] I [socket.c:454:ssl_setup_connection]
> 0-socket.management: peer CN = c1-m1.cluster1.backups.evermind.de
> [2016-06-17 13:44:22.817745] I [MSGID: 106143]
> [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick
> /vol/vol2/brick1 on port 49153
> [2016-06-17 13:44:22.820079] I [rpc-clnt.c:991:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2016-06-17 13:44:22.820169] I [socket.c:4047:socket_init] 0-management: SSL
> support for glusterd is ENABLED
> [2016-06-17 13:44:22.884935] I [socket.c:454:ssl_setup_connection]
> 0-socket.management: peer CN = c1-m1.cluster1.backups.evermind.de
> [2016-06-17 13:44:22.901411] I [MSGID: 106143]
> [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick
> /vol/vol3/brick1 on port 49154
> [2016-06-17 13:44:22.904488] I [rpc-clnt.c:991:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2016-06-17 13:44:22.904589] I [socket.c:4047:socket_init] 0-management: SSL
> support for glusterd is ENABLED
> [2016-06-17 13:44:22.968140] I [socket.c:454:ssl_setup_connection]
> 0-socket.management: peer CN = c1-m1.cluster1.backups.evermind.de
> [2016-06-17 13:44:22.984827] I [MSGID: 106143]
> [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick
> /vol/vol4/brick1 on port 49155
> [2016-06-17 13:44:22.988199] I [rpc-clnt.c:991:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2016-06-17 13:44:22.988299] I [socket.c:4047:socket_init] 0-management: SSL
> support for glusterd is ENABLED
> [2016-06-17 13:44:24.352019] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already
> stopped
> [2016-06-17 13:44:24.352102] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: nfs service is
> stopped
> [2016-06-17 13:44:24.353414] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd
> already stopped
> [2016-06-17 13:44:24.353460] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: glustershd service
> is stopped
> [2016-06-17 13:44:24.353523] I [MSGID: 106567]
> [glusterd-svc-mgmt.c:196:glusterd_svc_start] 0-management: Starting
> glustershd service
> [2016-06-17 13:44:24.359687] W [socket.c:3131:socket_connect] 0-glustershd:
> Ignore failed connection attempt on
> /var/run/gluster/6fe8cafd75bf10ffd386275f0bd67a06.socket, (No such file or
> directory)
> [2016-06-17 13:44:24.362092] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad already
> stopped
> [2016-06-17 13:44:24.362145] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: quotad service is
> stopped
> [2016-06-17 13:44:24.362207] I [MSGID: 106567]
> [glusterd-svc-mgmt.c:196:glusterd_svc_start] 0-management: Starting quotad
> service
>
> ==> /var/log/glusterfs/glustershd.log <==
> [2016-06-17 13:44:24.365420] I [MSGID: 100030] [glusterfsd.c:2408:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.0
> (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
> /var/lib/glusterd/glustershd/run/glustershd.pid -l
> /var/log/glusterfs/glustershd.log -S
> /var/run/gluster/6fe8cafd75bf10ffd386275f0bd67a06.socket --xlator-option
> *replicate*.node-uuid=7fa073a8-d641-43c8-b722-c260415a28d9)
>
> ==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log <==
> [2016-06-17 13:44:24.368724] W [socket.c:3131:socket_connect] 0-quotad:
> Ignore failed connection attempt on
> /var/run/gluster/e3e8332cbfa06af217d491807ac6478e.socket, (No such file or
> directory)
> [2016-06-17 13:44:24.368876] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already
> stopped
> [2016-06-17 13:44:24.368937] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: bitd service is
> stopped
> [2016-06-17 13:44:24.369063] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already
> stopped
> [2016-06-17 13:44:24.369098] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: scrub service is
> stopped
>
> ==> /var/log/glusterfs/glustershd.log <==
> [2016-06-17 13:44:24.370214] I [socket.c:4047:socket_init]
> 0-socket.glusterfsd: SSL support for glusterd is ENABLED
> [2016-06-17 13:44:24.371239] I [socket.c:4047:socket_init] 0-glusterfs: SSL
> support for glusterd is ENABLED
> [2016-06-17 13:44:24.372781] E [socket.c:3143:socket_connect] 0-glusterfs:
> connection attempt on  failed, (Connection refused)
> [2016-06-17 13:44:24.372929] I [MSGID: 101190]
> [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with
> index 1
> [2016-06-17 13:44:24.373051] W [socket.c:696:__socket_rwv] 0-glusterfs:
> writev on ::1:24007 failed (Success)
> [2016-06-17 13:44:24.373408] E [rpc-clnt.c:357:saved_frames_unwind] (-->
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f32827eb543]
> (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f32825b250f]
> (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f32825b262e]
> (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e)[0x7f32825b3e2e]
> (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f32825b4648]
> ))))) 0-glusterfs: forced unwinding frame type(GlusterFS Handshake)
> op(GETSPEC(2)) called at 2016-06-17 13:44:24.373080 (xid=0x1)
> [2016-06-17 13:44:24.373441] E [glusterfsd-mgmt.c:1686:mgmt_getspec_cbk]
> 0-mgmt: failed to fetch volume file (key:gluster/glustershd)
> [2016-06-17 13:44:24.373494] W [glusterfsd.c:1286:cleanup_and_exit]
> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1fa)
> [0x7f32825b253a] -->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x53a)
> [0x7f3282ce6c1a] -->/usr/sbin/glusterfs(cleanup_and_exit+0x57)
> [0x7f3282ce08d7] ) 0-: received signum (0), shutting down
>
> ==> /var/log/glusterfs/quotad.log <==
> [2016-06-17 13:44:24.374607] I [MSGID: 100030] [glusterfsd.c:2408:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.0
> (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p
> /var/lib/glusterd/quotad/run/quotad.pid -l /var/log/glusterfs/quotad.log -S
> /var/run/gluster/e3e8332cbfa06af217d491807ac6478e.socket --xlator-option
> *replicate*.data-self-heal=off --xlator-option
> *replicate*.metadata-self-heal=off --xlator-option
> *replicate*.entry-self-heal=off)
> [2016-06-17 13:44:24.379241] I [socket.c:4047:socket_init]
> 0-socket.glusterfsd: SSL support for glusterd is ENABLED
> [2016-06-17 13:44:24.380264] I [socket.c:4047:socket_init] 0-glusterfs: SSL
> support for glusterd is ENABLED
> [2016-06-17 13:44:24.381727] E [socket.c:3143:socket_connect] 0-glusterfs:
> connection attempt on  failed, (Connection refused)
> [2016-06-17 13:44:24.381872] I [MSGID: 101190]
> [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with
> index 1
> [2016-06-17 13:44:24.381982] W [socket.c:696:__socket_rwv] 0-glusterfs:
> writev on ::1:24007 failed (Success)
> [2016-06-17 13:44:24.382342] E [rpc-clnt.c:357:saved_frames_unwind] (-->
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f74fb650543]
> (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f74fb41750f]
> (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f74fb41762e]
> (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e)[0x7f74fb418e2e]
> (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f74fb419648]
> ))))) 0-glusterfs: forced unwinding frame type(GlusterFS Handshake)
> op(GETSPEC(2)) called at 2016-06-17 13:44:24.382010 (xid=0x1)
> [2016-06-17 13:44:24.382377] E [glusterfsd-mgmt.c:1686:mgmt_getspec_cbk]
> 0-mgmt: failed to fetch volume file (key:gluster/quotad)
> [2016-06-17 13:44:24.382432] W [glusterfsd.c:1286:cleanup_and_exit]
> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1fa)
> [0x7f74fb41753a] -->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x53a)
> [0x7f74fbb4bc1a] -->/usr/sbin/glusterfs(cleanup_and_exit+0x57)
> [0x7f74fbb458d7] ) 0-: received signum (0), shutting down
>
> ==> /var/log/glusterfs/cmd_history.log <==
> [2016-06-17 13:44:25.783916]  : v start backups : SUCCESS
>
> ==> /var/log/glusterfs/cli.log <==
> [2016-06-17 13:44:25.784065] I [cli-rpc-ops.c:1414:gf_cli_start_volume_cbk]
> 0-cli: Received resp to start volume
> [2016-06-17 13:44:25.784188] I [input.c:31:cli_batch] 0-: Exiting with: 0
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel