[Bugs] [Bug 1622405] New: Problem with SSL/TLS encryption on Gluster 4.0 & 4.1

Mon Aug 27 05:42:03 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1622405

            Bug ID: 1622405
           Summary: Problem with SSL/TLS encryption on Gluster 4.0 & 4.1
           Product: GlusterFS
           Version: 3.12
         Component: core
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: mchangir at redhat.com
                CC: amukherj at redhat.com, andreihavriliuc at gmail.com,
                    atumball at redhat.com, bugs at gluster.org,
                    david.spisla at iternity.com, jstrunk at redhat.com,
                    mchangir at redhat.com, omar.kohl at iternity.com,
                    pasik at iki.fi
        Depends On: 1601356

+++ This bug was initially created as a clone of Bug #1601356 +++

Hello,

This is my first time reporting a bug on bugzilla, so let me know if I post
something wrong.

Description of problem:

I am doing some tests with GlusterFS 4.0 and 4.1 and I can't seem to solve some 
SSL/TLS issues. I am trying to set up a 2 node replicated gluster volume 
with SSL/TLS. For this setup, I use 3 KVM VMs (2 storage nodes + 1 
client node). For the networking part, I use a dedicated private LAN for 
the KVM VMs. Each VM is able to ping the other, so there's no problem 
with the connectivity.

Version-Release number of selected component (if applicable):

These are the installed packages on gluster-client:

[root at gluster-client ~]# rpm -qa | grep "gluster\|fuse"
glusterfs-4.1.1-1.el7.x86_64
centos-release-gluster41-1.0-1.el7.centos.x86_64
glusterfs-libs-4.1.1-1.el7.x86_64
glusterfs-client-xlators-4.1.1-1.el7.x86_64
glusterfs-fuse-4.1.1-1.el7.x86_64

And these are the installed packages on gluster1 and gluster2 storage nodes:

[root at gluster1 ~]# rpm -qa | grep "gluster\|fuse"
glusterfs-api-4.1.1-1.el7.x86_64
centos-release-gluster41-1.0-1.el7.centos.x86_64
glusterfs-libs-4.1.1-1.el7.x86_64
glusterfs-4.1.1-1.el7.x86_64
glusterfs-cli-4.1.1-1.el7.x86_64
glusterfs-fuse-4.1.1-1.el7.x86_64
glusterfs-server-4.1.1-1.el7.x86_64
glusterfs-client-xlators-4.1.1-1.el7.x86_64

=====================================================

These are the informations regarding the gluster volume:

[root at gluster1 ~]# gluster volume info vol01

Volume Name: vol01
Type: Replicate
Volume ID: ab7426a5-23ab-40ff-91af-a5b977152553
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gluster1:/data/glusterfs/gluster1/vol01/brick1
Brick2: gluster2:/data/glusterfs/gluster2/vol01/brick1
Options Reconfigured:
ssl.cipher-list: ALL
network.ping-timeout: 5
server.ssl: on
client.ssl: on
auth.ssl-allow: *
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

=====================================================

Here is the peers information:

[root at gluster1 ~]# gluster peer status
Number of Peers: 1

Hostname: gluster2
Uuid: f506bf62-6551-46b0-8a5b-457ae1fde839
State: Peer in Cluster (Connected)

=====================================================

Here is the volume status:

[root at gluster1 ~]# gluster volume status vol01
Status of volume: vol01
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster1:/data/glusterfs/gluster1/vol
01/brick1                                   49152     0          Y       11196
Brick gluster2:/data/glusterfs/gluster2/vol
01/brick1                                   49152     0          Y       11013
Self-heal Daemon on localhost               N/A       N/A        Y       11315
Self-heal Daemon on gluster2                N/A       N/A        Y       11086

Task Status of Volume vol01
------------------------------------------------------------------------------
There are no active volume tasks

=====================================================

How reproducible:

Steps to Reproduce:
1. Install GlusterFS 4.0 or 4.1
2. Make a 2-node replicated gluster volume 
with SSL/TLS
3. After doing all the necessary settings, try to copy a file to the Fuse mount
on the client node.

I've also put a .txt file with my procedure of installing the Gluster nodes and
client. Let me know if you see anything wrong with it.

Actual results:

I receive this error: "Transport endpoint is not connected" after I issue the
copy command.

Expected results:

I expected the file to be copied without a problem, like in version 3.12.

Additional info:

There is a Gluster mailing list thread about this. I will post it here just so
that the two are linked:

https://lists.gluster.org/pipermail/gluster-users/2018-July/034353.html

The 
mount works fine until I try to copy an archive, multiple smaller files 
or a bigger file on it (meaning it shows correctly in df -Th and I can 
create several files with "touch file1 file2..."). Basically, after any 
data transfer, I get these errors.

I followed the indications from the redhat page:

https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.1/html/administration_guide/chap-network_encryption

UPDATE 1:
I tried doing the exact same steps in Gluster 3.12 and had no problem. 
The steps worked and SSL/TLS was enabled. There was no transport error 
or anything and I also checked if SSL/TLS was enabled. Afterwards, I 
also tried with the new release 4.1 and the problem persists (same error 
with "Transport endpoint is not connected").
Let me know if you need any other info. Any help is much appreciated.

Regards,
Andrei H.

--- Additional comment from Omar K on 2018-07-19 17:37:33 IST ---

I can confirm the same issue. When copying a few small files onto the FUSE
mount it is no problem but as soon as you put any "load" onto it (that means
more than a few files, or big files like ISO images) the connection is
interrupted with the error message as shown above.

Our current workaround is to disable server.ssl and client.ssl for the volumes.

We never had this problem with Gluster 3.12 .

--- Additional comment from Milind Changire on 2018-07-31 11:51:47 IST ---

As per Step 8

8. Set up TLS/SSL encryption on all nodes and clients (gluster1, 
gluster2, gluster-client):

openssl genrsa -out /etc/ssl/glusterfs.key 2048

In gluster1 node:
openssl req -new -x509 -key /etc/ssl/glusterfs.key -subj "/CN=gluster1" 
-out /etc/ssl/glusterfs.pem
In gluster2 node:
openssl req -new -x509 -key /etc/ssl/glusterfs.key -subj "/CN=gluster2" 
-out /etc/ssl/glusterfs.pem
In gluster-client node:
openssl req -new -x509 -key /etc/ssl/glusterfs.key -subj 
"/CN=gluster-client" -out /etc/ssl/glusterfs.pem

----------

As per Step 15

15. Setup SSL/TLS access to the volume:

gluster volume set vol01 auth.ssl-allow 'gluster01,gluster02,gluster-client'

gluster volume set vol01 client.ssl on
gluster volume set vol01 server.ssl on

gluster volume set vol01 network.ping-timeout "5"

gluster volume start vol01

----------

Please note that the Common Name mentioned during SSL key/cert generation is
"gluster1" but mentioned in auth.ssl-allow is "gluster01". Please note the '0'
prefixed to '1'.

Is this a typo during bug reporting or an actual typo during volume
configuration ?

If this is a typo during volume configuration, it needs to be corrected.
Please set auth.ssl-allow to:

gluster volume set vol01 auth.ssl-allow 'gluster1,gluster2,gluster-client'

--- Additional comment from Omar K on 2018-07-31 12:32:01 IST ---

We use auth.ssl-allow "*" and we have the same issue so I'm guessing that's not
the problem...

--- Additional comment from Havri on 2018-08-01 01:40:00 IST ---

Hello,

It's just a typo during bug reporting. I also tried Omar's setting with
auth.ssl-allow "*" and the issue was the same.

Let me know if you need any other info.

Thank you.

--- Additional comment from Atin Mukherjee on 2018-08-11 16:59:45 IST ---

Milind - Please see comment 4. Do we have any further investigation done ?

--- Additional comment from Milind Changire on 2018-08-17 12:07:42 IST ---

I've built RPMs using the release-4.1 branch with commit
f33a61086da43af5a5de2ba99b4045a63cf5bd79 at HEAD

There are no issues with SSL configuration.

As per the steps listed in the attachment, the server and client pem files are
not signed by a CA.
This being an upstream BZ, I'll recommend user to look at:
https://stackoverflow.com/questions/21297139/how-do-you-sign-a-certificate-signing-request-with-your-certification-authority

-----

There's also no problem using self-signed certificates either.

--- Additional comment from Milind Changire on 2018-08-17 13:38:54 IST ---

I tried copying a 900MB ISO and saw the following problems:

I can see the following errors in the client/mount log:

[2018-08-17 07:21:20.602283] E [socket.c:2167:__socket_read_frag] 0-rpc: wrong
MSG-TYPE (574) received from 192.168.122.87:24007
[2018-08-17 07:21:20.602297] T [socket.c:2801:socket_poller] 0-patchy-client-0:
disconnecting socket
[2018-08-17 07:21:20.602365] D [MSGID: 0] [client.c:2241:client_rpc_notify]
0-patchy-client-0: got RPC_CLNT_DISCONNECT
[2018-08-17 07:21:20.602379] I [MSGID: 114018]
[client.c:2254:client_rpc_notify] 0-patchy-client-0: disconnected from
patchy-client-0. Client process will keep trying to connect to glusterd until
brick's port is available

On the brick side:

[2018-08-17 07:21:00.723552] E [MSGID: 115067]
[server-rpc-fops_v2.c:1316:server4_writev_cbk] 0-patchy-server: 562: WRITEV 0
(3fd3cf86-419e-43eb-88ad-72b12263fab6), client:
CTX_ID:47717648-2a74-49b5-8e39-4069a86b2246-GRAPH_ID:0-PID:1553-HOST:centos7-2-PC_NAME:patchy-client-0-RECON_NO:-0,
error-xlator: - [Bad file descriptor]

--- Additional comment from Omar K on 2018-08-24 19:58:01 IST ---

I just re-tested using the commit tagged as v4.1.2 (044f9df65) and the problems
persist as described above. The log messages are the same as the ones Milind is
getting. From the client's perspective the copy operation of an ISO file aborts
with an error message. Few small files can be copied with no problems.

Milind do you therefore confirm that a problem exists, or is it unclear?

--- Additional comment from Milind Changire on 2018-08-24 20:00:58 IST ---

I confirm that the problem is present.

--- Additional comment from Worker Ant on 2018-08-25 18:54:09 IST ---

REVIEW: https://review.gluster.org/20993 (rpc: handle EAGAIN when
SSL_ERROR_SYSCALL is returned) posted (#2) for review on release-4.1 by Milind
Changire

--- Additional comment from Milind Changire on 2018-08-25 18:57:30 IST ---

Please note that the master branch and release-4.1 branch have diverged
significantly.
So the patch is not applicable to master branch.

Also, this issue has already been addressed in the master branch.

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1601356
[Bug 1601356] Problem with SSL/TLS encryption on Gluster 4.0 & 4.1
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.