[Gluster-devel] Consistent time attributes (ctime, atime and mtime) across replica set and distribution set
Niels de Vos
ndevos at redhat.com
Tue Feb 28 07:20:18 UTC 2017
On Tue, Feb 28, 2017 at 11:21:55AM +0530, Mohammed Rafi K C wrote:
> Hi All,
> We discussed the problem $subject in the mail thread . Based on the
> comments and suggestions I will summarize the design (Made as points for
> 1) As part of each fop, top layer will generate a time stamp and pass it
> to the down along with other param.
> 1.1) This will bring a dependency for NTP synced clients along with
What do you mean with "top layer"? Is this on the Gluster client, or
does the time get inserted on the bricks?
I think we should not require a hard dependency on NTP, but have it
strongly suggested. Having a synced time in a clustered environment is
always helpful for reading and matching logs.
> 1.2) There can be a diff in time if the fop stuck in the xlator for
> various reason, for ex: because of locks.
Or just slow networks? Blocking (mandatory?) locks should be handled
correctly. The time a FOP is blocked can be long.
> 2) On the server posix layer stores the value in the memory (inode ctx)
> and will sync the data periodically to the disk as an extended attr
> 2.1) of course sync call also will force it. And fop comes for an
> inode which is not linked, we do the sync immediately.
Does it need to be in the posix layer?
> 3) Each time when inodes are created or initialized it read the data
> from disk and store it.
> 4) Before setting to inode_ctx we compare the timestamp stored and the
> timestamp received, and only store if the stored value is lesser than
> the current value.
> 5) So in best case data will be stored and retrieved from the memory. We
> replace the values in iatt with the values in inode_ctx.
> 6) File ops that changes the parent directory attr time need to be
> consistent across all the distributed directories across the subvolumes.
> (for eg: a create call will change ctime and mtime of parent dir)
> 6.1) This has to handle separately because we only send the fop to
> the hashed subvolume.
> 6.2) We can asynchronously send the timeupdate setattr fop to the
> other subvoumes and change the values for parent directory if the file
> fops is successful on hashed subvolume.
> 6.3) This will have a window where the times are inconsistent
> across dht subvolume (Please provide your suggestions)
Isn't this the same problem for 'normal' AFR volumes? I guess self-heal
needs to know how to pick the right value for the [cm]time xattr.
> 7) Currently we have couple of mount options for time attributes like
> noatime, relatime , nodiratime etc. But we are not explicitly handled
> those options even if it is given as mount option when gluster mount. 
Where is the URL for ?
> 7.1) We always relay on back end storage layer behavior, if you
> have given those mount options when you mount your disk, you will get
> this behaviour
These options are for "not writing the atime", so if there is a client
that does not use these options for mounting, the atime will be updated
upon each access. Using these options on the brick-level, and not
through fuse, nfs or smb would prevent it for all clients. Those are two
use-cases, they probably need to be handled both in the future as well.
> 7.2) Now if we are taking effort to fix the consistency issue, do
> we need to honour those options by our own ?
I do not think you need to handle them, and just rely on the filesystems
(fuse, nfs and smb) to take care of it. However, check if Samba or
NFS-Ganesha have config options for these, in that case, we might need
to be able to tune it too.
> Please provide your comments and suggestions.
Please update https://bugzilla.redhat.com/show_bug.cgi?id=1318493 with
your findings too.
When this is fixed, caching solutions (like FS-Cache for NFS, SMB) will
work much better. As mentioned in the BUG, we would be able to add a
"birth time" attribute as well.
>  :
> Rafi KC
> Gluster-devel mailing list
> Gluster-devel at gluster.org
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 801 bytes
Desc: not available
More information about the Gluster-devel