[Gluster-devel] Solving Ctime Issue with legacy files [BUG 1593542]

Tue Jun 18 06:33:43 UTC 2019

Hi Xavi,

Reply inline.

On Mon, Jun 17, 2019 at 5:38 PM Xavi Hernandez <jahernan at redhat.com> wrote:

> Hi Kotresh,
>
> On Mon, Jun 17, 2019 at 1:50 PM Kotresh Hiremath Ravishankar <
> khiremat at redhat.com> wrote:
>
>> Hi All,
>>
>> The ctime feature is enabled by default from release gluster-6. But as
>> explained in bug [1]  there is a known issue with legacy files i.e., the
>> files which are created before ctime feature is enabled. These files would
>> not have "trusted.glusterfs.mdata" xattr which maintain time attributes. So
>> on, accessing those files, it gets created with latest time attributes.
>> This is not correct because all the time attributes (atime, mtime, ctime)
>> get updated instead of required time attributes.
>>
>> There are couple of approaches to solve this.
>>
>> 1. On accessing the files, let the posix update the time attributes from
>> the back end file on respective replicas. This obviously results in
>> inconsistent "trusted.glusterfs.mdata" xattr values with in replica set.
>> AFR/EC should heal this xattr as part of metadata heal upon accessing this
>> file. It can chose to replicate from any subvolume. Ideally we should
>> consider the highest time from the replica and treat it as source but I
>> think that should be fine as replica time attributes are mostly in sync
>> with max difference in order of few seconds if am not wrong.
>>
>>    But client side self heal is disabled by default because of
>> performance reasons [2]. If we chose to go by this approach, we need to
>> consider enabling at least client side metadata self heal by default.
>> Please share your thoughts on enabling the same by default.
>>
>> 2. Don't let posix update the legacy files from the backend. On lookup
>> cbk, let the utime xlator update the time attributes from statbuf received
>> synchronously.
>>
>> Both approaches are similar as both results in updating the xattr during
>> lookup. Please share your inputs on which approach is better.
>>
>
> I prefer second approach. First approach is not feasible for EC volumes
> because self-heal requires that k bricks (on a k+r configuration) agree on
> the value of this xattr, otherwise it considers the metadata damaged and
> needs manual intervention to fix it. During upgrade, first r bricks with be
> upgraded without problems, but trusted.glusterfs.mdata won't be healed
> because r < k. In fact this xattr will be removed from new bricks because
> the majority of bricks agree on xattr not being present. Once the r+1 brick
> is upgraded, it's possible that posix sets different values for
> trusted.glusterfs.mdata, which will cause self-heal to fail.
>
> Second approach seems better to me if guarded by a new option that enables
> this behavior. utime xlator should only update the mdata xattr if that
> option is set, and that option should only be settable once all nodes have
> been upgraded (controlled by op-version). In this situation the first
> lookup on a file where utime detects that mdata is not set, will require a
> synchronous update. I think this is good enough because it will only happen
> once per file. We'll need to consider cases where different clients do
> lookups at the same time, but I think this can be easily solved by ignoring
> the request if mdata is already present.
>

Initially there were two issues.
1. Upgrade Issue with EC Volume as described by you.
         This is solved with the patch [1]. There was a bug in ctime posix
where it was creating xattr even when ctime is not set on client (during
utimes system call). With patch [1], the behavior
    is that utimes system call will only update the
"trusted.glusterfs.mdata" xattr if present else it won't create. The new
xattr creation should only happen during entry operations (i.e create,
mknod and others).
   So there won't be any problems with upgrade. I think we don't need new
option dependent on op version if I am not wrong.

2. After upgrade, how do we update "trusted.glusterfs.mdata" xattr.
        This mail thread was for this. Here which approach is better? I
understand from EC point of view the second approach is the best one. The
question I had was, Can't EC treat 'trusted.glusterfs.mdata'
    as special xattr and add the logic to heal it from one subvolume  (i.e.
to remove the requirement of having to have consistent data on k subvolumes
in k+r configuration).

        Second approach is independent of AFR and EC. So if we chose this,
do we need new option to guard? If the upgrade steps is to upgrade server
first and then client, we don't need to guard I think?

>
> Xavi
>
>
>>
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1593542
>> [2] https://github.com/gluster/glusterfs/issues/473
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>

-- 
Thanks and Regards,
Kotresh H R
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20190618/962b6b98/attachment-0001.html>