[Gluster-devel] Consistent time attributes (ctime, atime and mtime) across replica set and distribution set

Mohammed Rafi K C rkavunga at redhat.com
Wed Mar 8 09:26:02 UTC 2017


Thanks for your comments


On 03/07/2017 08:28 PM, Shyam wrote:
> On 02/28/2017 12:51 AM, Mohammed Rafi K C wrote:
>> Hi All,
>>
>>
>> We discussed the problem $subject in the mail thread [1]. Based on the
>> comments and suggestions I will summarize the design (Made as points for
>> simplicity.)
>
> Some generic comment(s),
>
> - I would like to see the design in such a form that, when server side
> replication and disperse become a reality, parts of the
> design/implementation remain the same. This way work need not be
> redone when server side anything happens. (I would have extended this
> to adapt to DHT2 as well, but will leave that to you)
>
> - As this evolves we need to assign clear responsibilities to
> different xlators that we are discussing, which will eventually happen
> anyway, but just noting it ahead for clarity as we discuss this further.

I feel it is much more flexible with server side replication/ec .
Because we have a leader where we can control the behavior like
generating the timestamp, syncing the data to other replica etc. I need
to think about dht2. Nevertheless I will include this details in a document.

>
>>
>>
>> 1) As part of each fop, top layer will generate a time stamp and pass it
>> to the down along with other param.
>>
>>     1.1) This will bring a dependency for NTP synced clients along with
>> servers
>
> As stated by others and even noted by you, this requirement is a bit
> of a bother, but as the current DHT, AFR, EC architecture stands, this
> is what we need to start with. Just noting my observation as well on
> this.
>
>>
>>     1.2) There can be a diff in time if the fop stuck in the xlator for
>> various reason, for ex: because of locks.
>>
>>
>> 2) On the server posix layer stores the value in the memory (inode ctx)
>> and will sync the data periodically to the disk as an extended attr
>
> I believe this should not be a part of the posix xlator. our posix
> store definition is that it uses a local file-system underneath that
> is POSIX compliant. This need, for storing the time information
> outside of POSIX specification (as an xattr or otherwise), is
> something that is gluster specific. As a result we should not fold
> this into the posix xlator.
>
> For example, if we replace posix later with a db store, or a key-vlaue
> store, we would need to code this cache management of time information
> again for these stores, but if we abstract it out, then we do not need
> to do the same.

I agree that we may have to re-implement this if we coupled with posix xlator. But this is a very small piece of code where we store this time in indeo ctx and syncing it when require. Also as Amar pointed out each, back-end store may have different behavior. We can write this as abstract way so that we can re-use this tomorrow. But IMHO, I don't see this as an xlator.

On 03/08/2017 11:35 AM, Amar Tumballi wrote:
> I see this xattr similar to that of 'gfid'. Because as gfid is like our
> 'stat->st_ino', this xattr will be our 'stat->st_{c,m,a}time', which is 
> very much a part of backend support requirement IMO.



>
> I may need to chew on this further, but at the moment I think this
> functionality should be stuffed into posix-xlator, and rather should
> live on its own.
>
>>
>>      2.1) of course sync call also will force it. And fop comes for an
>> inode which is not linked, we do the sync immediately.
>>
>>
>> 3) Each time when inodes are created or initialized it read the data
>> from disk and store it.
>>
>>
>> 4) Before setting to inode_ctx we compare the timestamp stored and the
>> timestamp received, and only store if the stored value is lesser than
>> the current value.
>>
>>
>> 5) So in best case data will be stored and retrieved from the memory. We
>> replace the values in iatt with the values in inode_ctx.
>>
>>
>> 6) File ops that changes the parent directory attr time need to be
>> consistent across all the distributed directories across the subvolumes.
>> (for eg: a create call will change ctime and mtime of parent dir)
>>
>>      6.1) This has to handle separately because we only send the fop to
>> the hashed subvolume.
>>
>>      6.2) We can asynchronously send the timeupdate setattr fop to the
>> other subvoumes and change the values for parent directory if the file
>> fops is successful on hashed subvolume.
>
> Am I right in understanding that this is not part of the solution, and
> just a suggestion on what we may do in the future, or is it part of
> the solution proposed?

If we have an agreement from dht maintainers, I'm ready to take dht part
also as part of this effort :) .

>
> If the latter (i.e part of the solution proposed), which layer has the
> responsibility to asynchronously update other DHT subvolumes?
>
> For example, posix-xlator does not have that knowledge, so it should
> not be that xlator, which means it is some other xlator, now is that
> on the client or on the server, and how is it crash consistent are
> some things that come to mind when reading this, but will wait for the
> details before thinking aloud.

Yes, In the proposed solution it was dht who has to initiate the fop to
sync the time attributes to the other subvolumes (synchronously or
asynchronously) after let's say a create fop in the hashed subvol (just
en ag ;) ).

I'm totally in agreement with crash consistency, but thinking in broad
normally posix doesn't guarantee the persistence of the data unless
there is an explicit sync call . I thought we can include this also as a
cache coherence problem. What do you think ?



>
>>
>>      6.3) This will have a window where the times are inconsistent
>> across dht subvolume (Please provide your suggestions)
>>
>>
>> 7) Currently we have couple of mount options for time attributes like
>> noatime, relatime , nodiratime etc. But we are not explicitly handled
>> those options even if it is given as mount option when gluster mount.
>> [2]
>>
>>      7.1) We always relay on back end storage layer behavior, if you
>> have given those mount options when you mount your disk, you will get
>> this behaviour
>>
>>      7.2) Now if we are taking effort to fix the consistency issue, do
>> we need to honour those options by our own ?
>>
>>
>> Please provide your comments and suggestions.
>>
>>
>> [1] :
>> http://lists.gluster.org/pipermail/gluster-devel/2016-January/048003.html
>>
>>
>>
>> Regards
>>
>> Rafi KC
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>



More information about the Gluster-devel mailing list