[Gluster-devel] Consistent time attributes (ctime, atime and mtime) across replica set and distribution set

Shyam srangana at redhat.com
Tue Mar 7 14:58:46 UTC 2017


On 02/28/2017 12:51 AM, Mohammed Rafi K C wrote:
> Hi All,
>
>
> We discussed the problem $subject in the mail thread [1]. Based on the
> comments and suggestions I will summarize the design (Made as points for
> simplicity.)

Some generic comment(s),

- I would like to see the design in such a form that, when server side 
replication and disperse become a reality, parts of the 
design/implementation remain the same. This way work need not be redone 
when server side anything happens. (I would have extended this to adapt 
to DHT2 as well, but will leave that to you)

- As this evolves we need to assign clear responsibilities to different 
xlators that we are discussing, which will eventually happen anyway, but 
just noting it ahead for clarity as we discuss this further.

>
>
> 1) As part of each fop, top layer will generate a time stamp and pass it
> to the down along with other param.
>
>     1.1) This will bring a dependency for NTP synced clients along with
> servers

As stated by others and even noted by you, this requirement is a bit of 
a bother, but as the current DHT, AFR, EC architecture stands, this is 
what we need to start with. Just noting my observation as well on this.

>
>     1.2) There can be a diff in time if the fop stuck in the xlator for
> various reason, for ex: because of locks.
>
>
> 2) On the server posix layer stores the value in the memory (inode ctx)
> and will sync the data periodically to the disk as an extended attr

I believe this should not be a part of the posix xlator. our posix store 
definition is that it uses a local file-system underneath that is POSIX 
compliant. This need, for storing the time information outside of POSIX 
specification (as an xattr or otherwise), is something that is gluster 
specific. As a result we should not fold this into the posix xlator.

For example, if we replace posix later with a db store, or a key-vlaue 
store, we would need to code this cache management of time information 
again for these stores, but if we abstract it out, then we do not need 
to do the same.

I may need to chew on this further, but at the moment I think this 
functionality should be stuffed into posix-xlator, and rather should 
live on its own.

>
>      2.1) of course sync call also will force it. And fop comes for an
> inode which is not linked, we do the sync immediately.
>
>
> 3) Each time when inodes are created or initialized it read the data
> from disk and store it.
>
>
> 4) Before setting to inode_ctx we compare the timestamp stored and the
> timestamp received, and only store if the stored value is lesser than
> the current value.
>
>
> 5) So in best case data will be stored and retrieved from the memory. We
> replace the values in iatt with the values in inode_ctx.
>
>
> 6) File ops that changes the parent directory attr time need to be
> consistent across all the distributed directories across the subvolumes.
> (for eg: a create call will change ctime and mtime of parent dir)
>
>      6.1) This has to handle separately because we only send the fop to
> the hashed subvolume.
>
>      6.2) We can asynchronously send the timeupdate setattr fop to the
> other subvoumes and change the values for parent directory if the file
> fops is successful on hashed subvolume.

Am I right in understanding that this is not part of the solution, and 
just a suggestion on what we may do in the future, or is it part of the 
solution proposed?

If the latter (i.e part of the solution proposed), which layer has the 
responsibility to asynchronously update other DHT subvolumes?

For example, posix-xlator does not have that knowledge, so it should not 
be that xlator, which means it is some other xlator, now is that on the 
client or on the server, and how is it crash consistent are some things 
that come to mind when reading this, but will wait for the details 
before thinking aloud.

>
>      6.3) This will have a window where the times are inconsistent
> across dht subvolume (Please provide your suggestions)
>
>
> 7) Currently we have couple of mount options for time attributes like
> noatime, relatime , nodiratime etc. But we are not explicitly handled
> those options even if it is given as mount option when gluster mount. [2]
>
>      7.1) We always relay on back end storage layer behavior, if you
> have given those mount options when you mount your disk, you will get
> this behaviour
>
>      7.2) Now if we are taking effort to fix the consistency issue, do
> we need to honour those options by our own ?
>
>
> Please provide your comments and suggestions.
>
>
> [1] :
> http://lists.gluster.org/pipermail/gluster-devel/2016-January/048003.html
>
>
> Regards
>
> Rafi KC
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>


More information about the Gluster-devel mailing list