[Gluster-users] Transparent encryption and authentication in distributed systems with non-trusted servers

Edward Shishkin edward at redhat.com
Thu Aug 7 20:43:46 UTC 2014


Hello everyone,

Here we provide some basic backgrounds in addition to the manpages at
http://www.gluster.org/community/documentation/index.php?title=Features/disk-encryption

Comments, questions, any your experience are welcome.

Thanks,
Edward.



             Transparent encryption and authentication
         in distributed systems with non-trusted servers



Distributed systems impose tighter requirements to at-rest encryption.
This is because your encrypted data will be stored on servers, which
are de facto untrusted. In particular, your private encrypted data can
be subjected to analysis and tampering, which eventually will lead to
its revealing, if it is not properly protected. Specifically, usually
it is not enough to just encrypt data. In distributed systems serious
protection of your personal data is possible only in conjunction with
a special process, which is called authentication. GlusterFS provides
such enhanced service: In GlusterFS encryption is enhanced with
authentication. Currently we provide protection from "silent
tampering". This is a kind of tampering which is hard to detect,
because it doesn't break POSIX compliance. Specifically, we protect
encryption-specific file's metadata which includes unique file's
object id (GFID), cipher algorithm id, cipher block size and other
attributes used by the encryption process.


                          Restrictions


1. We encrypt only file content. The feature of transparent encryption
doesn't protect file names: they are neither encrypted, nor verified.
Protection of file names is not so critical as protection of
encryption-specific file's metadata: any attacks based on tampering
file names will break POSIX compliance and result in massive
corruption, which is easy to detect.

2. The feature of transparent encryption is not supported in NFS
mounts of GlusterFS volumes: NFS's file handles introduce security
issues, which are hard to resolve.

3. The feature of transparent encryption is incompatible with
GlusterFS performance translators (quick-read, write-behind and
open-behind).


               I. Trusted and non-trusted machines


Suppose you are a user of this feature (transparent encryption and
authentication in GlusterFS). You qualify every machine as either
trusted, or non-trusted.

Examples:

I.1 Your personal laptop, which is under your supervision is trusted
    machine.

I.2 Remote GlusterFS servers, which are not under your supervision
    are non-trusted machines. They are managed by a good admin, but
    you don't trust him your private data.

I.3 Clouds are important example of a set of non-trusted machines:
    you don't know what is going on there at all.


               II. Trusted and non-trusted objects


Every machine contains objects (in the RAM, disks, registers, etc).

All objects of every non-trusted machine are non-trusted by
definition.

Trusted machine contains both type of objects (trusted and
non-trusted).

Sources of non-trusted objects on your trusted machine:
. non-trusted media;
. non-trusted network;
. social engineering;
. etc.

Sources of trusted objects on your trusted machine:
. trusted media;
. trusted network;
. process of verification of non-trusted objects (authentication).

Examples:

II.1 email that you have received without any checks is non-trusted
     object;

II.2 email with properly verified digital signature is trusted object;

II.3 Your secret key properly generated on your trusted machine,
     or retrieved from trusted media is trusted object.

II.4 You wanted to look at user accounts on your trusted local
     machine. The string "/etc/passwd" that you have passed to the
     open(2) is trusted object.

II.5 Someone you don't know asked you to check if you have a file
     "/foo/bar" on your trusted local machine. The string "/foo/bar",
     that you have passed to readdir(2) is non-trusted object.

II.6 Encrypted content of any regular file received from the
     non-trusted GlusterFS servers before processing by the crypt
     translator is non-trusted object.

II.7 Decrypted content of your regular file after successful
     processing by the crypt translator is trusted object.

II.8 List of file names provided by ls(1) for your mounted encrypted
     GlusterFS volume is non-trusted object (see subsection 1 of the
     Restrictions above).


Status of some objects on your trusted machine can be changed from
"non-trusted" to "trusted" by a special process, which is called
authentication. Authentication includes creating/checking a special
MAC (Message Authentication Code) for every object that you will want
to verify after its storing on the non-trusted machines.


          III. Encryption and authentication in GlusterFS


In GlusterFS a special translator (encryption/crypt) is responsible
for both, encryption and authentication. The crypt translator works
only on trusted client machines.


                          Data Encryption


We encrypt file content by the AES cipher algorithm with XTS cipher
mode. This mode provides "weak data authentication". Tampering of
ciphertext created in this mode will lead to unpredictable changes
in the plain text, i.e. in data corruption, which is easy to detect.

Data encryption is performed with unique per-file cipher key
generated by master volume key and "salted" by the unique trusted
object id (GFID).


                      Metadata Authentication


In the feature of transparent encryption the unique object id (GFID)
is an important encryption-specific attribute, which needs protection,
since it is stored on the non-trusted servers.

We protect GFID by creating/checking MACs. Every such MAC is "salted"
by the trusted absolute file name. This "salt" is needed to prevent a
special kind of tampering, which extends the scope of per-file data
cipher key (there are known attacks based on such extending). Every
hardlink adds a respective MAC to the file. When the file is renamed,
the respective MAC gets updated. Whenever the file is opened,
we check all the per-name MACs up to the first match. No matches means
failed verification: GlusterFS will refuse to open such file.
Otherwise, the status of file's GFID will be changed to "trusted".

Encryption-specific file attributes including the array of per-name
MACs are stored on the untrusted server as file's xattrs with special
key "trusted.glusterfs.crypt.att.cfmt".

Let's create a file in our encrypted volume mounted at /mnt/glusterfs:

# pwd
/mnt/glusterfs

# touch file
# getfattr -n trusted.glusterfs.crypt.att.cfmt -d -e hex file
trusted.glusterfs.crypt.att.cfmt=0x00fe824979347d90af1c8d87d67faae50d243d84d641

The first byte in the string contains version (format) of the string
(0 in this example). In this format per-name MAC is 8 bytes long, and
array of all MACs locates at the end of the string. In this example
file has only one name and the string respectively contains only one
MAC (aae50d243d84d641).

Let's now create a hardlink:

# ln file file-link
# getfattr -n trusted.glusterfs.crypt.att.cfmt -d -e hex file
trusted.glusterfs.crypt.att.cfmt=0x00fe824979347d90af1c8d87d67faae50d243d84d641c66d01a59a6a2e8c

The file has acquired the second name ("file-link"), and the string
respectively has been supplemented with the second MAC
(c66d01a59a6a2e8c).

# mv file file-renamed
# getfattr -n trusted.glusterfs.crypt.att.cfmt -d -e hex file-renamed
trusted.glusterfs.crypt.att.cfmt=0x00fe824979347d90af1c8d87d67fb46f237c94cb87dec66d01a59a6a2e8c

We changed the file name, and the respective MAC was updated
(the new value is b46f237c94cb87de)

# rm -f file-renamed
# getfattr -n trusted.glusterfs.crypt.att.cfmt -d -e hex file-link
trusted.glusterfs.crypt.att.cfmt=0x00fe824979347d90af1c8d87d67f
c66d01a59a6a2e8c

We removed the hardlink, and respective MAC (b46f237c94cb87de) was
removed.

13 bytes right after the format id (fe824979347d90af1c8d87d67f in our
example) contain the following attributes, which are encrypted in a
special AEAD mode:

. data cipher algorithm id;
. data cipher mode id;
. encoded block size;
. encryption translator id;
. encoded size of the data cipher key;

GFID is verified only for file operations, which invoke crypto
transforms (cipher, authentication, etc). In particular, we need to
vertify GFID during ->open(), ->read(), ->write(), ->truncate(),
->link(), etc. file operations. The example of a file operation, which
doesn't require to verify GFID is ->readdir() (see subsection 1 of the
Restrictions above).

Currently GFID verification procedure is encapsulated in FOP->open()
of the crypt translator. So the crypt translator mandatory calls the
FOP->open() whenever the trusted GFID is required and is not in the
cache (e.g. during FOP->truncate()).


         IV. Why we don't support the feature of transparent
            encryption in NFS mounts of GlusterFS volumes


In NFS mounts of GlusterFS volumes file operations usually don't have
file names. They manipulate with file handles instead (which actually
are GFIDs). Respectively, we have to be sure that every file handle
in the cache of the client machine is trusted. This is not simple to
implement with a guarantee that future changes in GlusterFS code won't
add a security hole, which will lead to appearing of non-verified file
handles in the cache of the client machine.


More information about the Gluster-users mailing list