[Gluster-devel] Handling EOBs in CloudFS

Thu Jul 14 21:01:43 UTC 2011

Hello everyone,
any comments, suggestions are welcome..

        Handling EOBs (end-of-blocks) for transparent
     encryption, checking integrity and data authentication

                           DRAFT

This was designed for CloudFS, which uses 2-level protocol (high and
low) supported by xlators which reside on server and client sides
respectively.

Definition of EOB. Storage class

If file size isn't a multiple of cblock (cipher block) size, then we
also need to store special padding needed to decrypt its last block
with some cipher modes like CBC. This padding contains a part of
ciphertext and must be considered as a part of this file. We'll call
this padding end-of-file (EOF). If plain text has size a multiple of
cblock size, then encrypted file won't have (or will have empty) EOF.

Signatures (HMACs, etc) for checking integrity, data authentication,
etc. have the same nature as EOF. Every such signature is created
for some logical block in a file. This is not a padding though, as
in the case of EOF, but anyway such signatures are associated with
file's data, and we'll consider a class of object, which includes
EOFs, HMACs, etc, and call them EOBs (end-of-block).

We define storage class of EOBs as "data", i.e. this can be considered
as part of file's data: we can not read/write data block without
reading/writing its EOB.

Storing EOBs. Approaches and Issues

Approach 1: Storing EOBs as xattr values.

In this case we store a file in parts which are not adjacent
from the standpoint of Cloudfs. That said we need to split
read, and this makes this operation inatomic. This means
that read(2) will return data compound of parts of different
"versions".

Example:

Suppose we have a file F stored in 2 different parts F1 and F2.

Process A writes a file F (to be of version 1);
Process B reads a file F (part F1);
Process C writes a file F (to be of version 2);
Process B reads a file F (part F2);

As the result process B returns data compound of
parts of different versions 1 and 2.

This non-atomicity is different from the non-atomicity that takes
place in the kernel (local file systems): kernel guarantees
that all PAGE_SIZE reads with PAGE_SIZE-aligned offsets are
atomic (this is because reads and writes in kernel acquire
page locks). Whereas, in our case we'll have that F2 doesn't
necessarily have PAGE_SIZE-aligned offset.

That said it can happen that we'll get complaints from users,
who don't expect such non-atomicity. Moreover, in the case when
EOBs are HMACs for checking integrity, or authentication we'll
have false positives, as nobody guarantees that versions of HMAC
and respective data block will coincide.

Solution:

In this approach we need to serialize truncates, appending
writes and sequences RbRe (read block, read EOB).

Approach 2: Storing in file's body.

In this case EOBs are stored in file's body (via appending to
a file in the case of EOF, or interspacing a file with HMACs,
etc). So file with his EOBs is the whole from the standpoint
of Cloudfs, and there is no problems with atomicity specific
to Approach 1.

However, in this case all our files maintained by low-level
local fs will have increased sizes (added total size of all EOBs).
So that actual file size must be stored as additional attribute
(e.g. as xattr value).

->open() method of the high-level translator loads actual
file size to the cloudfs-specific part of inode via fetching
->getxattr(), so that it is persistent in the memory on server.

Any ->truncate() and appending ->write() of the high-level
xlator update in-core and on-disk actual sizes simultaneously
(via fetching ->setxattr() for the last one). This actual size
is what should be returned to user by ->fstat(), ->lookup(),
etc. as st_size.