[Gluster-Maintainers] [Gluster-devel] Release 4.0: Release notes (please read and contribute)

Sat Feb 10 01:23:55 UTC 2018


On 02/10/2018 01:24 AM, Shyam Ranganathan wrote:
> On 02/02/2018 10:26 AM, Ravishankar N wrote:
>>>> 2) "Replace MD5 usage to enable FIPS support" - Ravi, Amar
>> + Kotresh who has done most (all to be precise) of the patches listed in
>> https://github.com/gluster/glusterfs/issues/230  in case he would like
>> to add anything.
>>
>> There is a pending work for this w.r.t rolling upgrade support.  I hope
>> to work on this next week, but I cannot commit anything looking at other
>> things in my queue :(.
> I have this confusion reading the issue comments, so if one of the
> servers is updated, and other server(s) in the replica are still old,
> then self heal deamon would work, without the fix?
>
>  From the comment [1], I understand that the new node self heal deamon
> would crash.
>
> If the above is true and the fix is to be at MD5 till <condition> then
> that is a must fix before release, as there is no way to upgrade and
> handle heals, and hence not get into split brains later as I understand.
>
> Where am I going wrong? or, is this understanding correct?
This is right. In the new node, the shd (afr) requests the checksum from 
both bricks (assuming a 1x2 setup). In saving the checksum in its local 
structures in the cbk, (see __checksum_cbk, 
https://github.com/gluster/glusterfs/blob/release-4.0/xlators/cluster/afr/src/afr-self-heal-data.c#L45), 
it will do a memcpy of SHA256_DIGEST_LENGTH bytes even if the older 
brick sends only MD5_DIGEST_LENGTH bytes. It might or not crash but it 
is illegal memory access.

The summary of changes we had in mind are:
- Restore md5sum in the code.
- Have a volume set option for posix xlator tied to 
GD_OP_VERSION_4_0_0.  By default, without this option set, 
posix_rcheksum will still send MD5SUM.
- Amar has introduced a flag in gfx_rchecksum_rsp. At the brick side, 
set that flag to 1 only if we are sending SHA256
- change rchecksum fop_cbk signature to include the flag (or maybe 
capture the flag in response xadata dict instead?).
- In afr depending on whether the flag is set or not, memcpy the 
appropriate length.
- After upgrade is complete and cluster op version becomes 
GD_OP_VERSION_4_0_0, user can set the volume option and from then on 
wards rchecksum will use SHA256.

Regards,
Ravi
>
>> To add more clarity, for fresh setup (clients + servers) in 4.0,
>> enabling FIPS works fine. But we need to handle case of old servers and
>> new clients and vice versa. If this can be considered a bug fix, then
>> here is my attempt at the release notes for this fix:
>>
>> "Previously, if gluster was run on a FIPS enabled system, it used to
>> crash because gluster used MD5 checksum in various places like self-heal
>> and geo-rep. This has been fixed by replacing MD5 with SHA256 which is
>> FIPS compliant."
>>
>> I'm happy to update the above text in doc/release-notes/4.0.0.md and
>> send it on gerrit for review.
> I can take care of this, no worries. I need the information, so that I
> do not misrepresent :) provided any which way is fine...
>
> [1] https://github.com/gluster/glusterfs/issues/230#issuecomment-358293386