[Gluster-devel] Tracking File Creations, Modifications, and Deletions

Gordan Bobic gordan at bobich.net
Tue Jul 21 00:23:16 UTC 2009


On 21/07/2009 00:28, Drew Morris wrote:
> Hi All...
> We are developing a custom translator to log modifications to files
> (including creation, update and deletion)

mtime attribute?

> into database.

Have you looked into SeznamFS?

> *Our Current Approach:*
> By reviewing the Gluster and FUSE source code and documentation, we
> concluded that the following FOPs should be monitored for this purpose:
> open, create, mknod, truncate, ftruncate, writev, flush, release, unlink and
> rename.

You should really look into SeznamFS.

> We would like to insert one record per each file modification, hence we
> need a mechanism to aggregate multiple operations such as open, writev
> and flush over one file-descriptor into a single update.
>
> For performance sake and preventing dirty reads, we would like to do
> a database row insertion in the callback of the very last action that is
> performed. By other means, during write we just set flags as modified
> in file descriptor context and perform the insert in the very last action.
>
> The major issue is that (as most of the docs and FAQ indicated) there
> is no reliable mechanism to decide which FOP action is the last one.

If I'm following what you are saying, that's not sensibly doable because 
you never know if there will be another operation. You have to treat 
each op as the last one, because you don't know what happens next. So 
you'll have to log all of them, and if you only ever want one of them, 
key them by file path hash in your DB so that each op overwrites the 
previous log. But if you're doing that, you might as well just to a 
recursive scan for mtime to see what's changed and take it from there.

> We monitored file system interaction via trace module and noticed
> that the flush action is called several times and release is never invoked
> in many cases.

Bug?

> This issue forced us to log the very first flush which is quite problematic
> for a number of reasons including the fact that we can never be sure the
> operation is finished before triggering any of our asynchronous operations
> and we are slowing down the initial write because we are waiting on the
> log action to complete.

Have you tried it using a dummy FS, rather than piggybacking on 
GlusterFS? If so, did you observe the same flush/release behaviour?

> *Question:*
> Does anyone have a better solution for this issue? Perhaps there should
> be a mechanism to notify us of the closing of a file, otherwise an open
> file descriptor will remain forever.
> We would really love to find any other reliable method that allows us to
> track these operations at a higher level.
>
> We would greatly appreciate any new approach that can overcome these
> deficiencies.

Other than SeznamFS which I mentioned above, perhaps CopyFS might give 
you a better base to work on? The sort of thing you are describing 
doesn't strike me as a major use-case for GlusterFS.

Gordan





More information about the Gluster-devel mailing list