[Gluster-users] Synchronous replication, or no?

Thu Apr 9 15:45:19 UTC 2015

> Jeff: I don't really understand how a write-behind translator could keep data
> in memory before flushing to the replication module if the replication is
> synchronous. Or put another way, from whose perspective is the replication
> synchronous? The gluster daemon or the creating client?

That's actually a more complicated question than many would think.  When we
say "synchronous replication" we're talking about *durability* (i.e. does
the disk see it) from the perspective of the replication module.  It does
none of its own caching or buffering.  When it is asked to do a write, it
does not report that write as complete until all copies have been updated.

However, durability is not the same as consistency (i.e. do *other clients*
see it) and the replication component does not exist in a vacuum.  There
are other components both before and after that can affect durability and
consistency.  We've already touched on the "after" part.  There might be
caches at many levels that become stale as the result of a file being
created and written.  Of particular interest here are "negative directory
entries" which indicate that a file is *not* present.  Until those expire,
it is possible to see a file as "not there" even though it does actually
exist on disk.  We can control some of this caching, but not all.

The other side is *before* the replication module, and that's where
write-behind comes in.  POSIX does not require that a write be immediately
durable in the absence of O_SYNC/fsync and so on.  We do honor those
requirements where applicable.  However, the most common user expectation
is that we will defer/batch/coalesce writes, because making every write
individually immediate and synchronous has a very large performance impact.
Therefore we implement write-behind, as a layer above replication.  Absent
any specific request to perform a write immediately, data might sit there
for an indeterminate (but usually short) time before the replication code
even gets to see it.

I don't think write-behind is likely to be the issue here, because it
only applies to data within a file.  It will pass create(2) calls through
immediately, so all servers should become aware of the file's existence
right away.  On the other hand, various forms of caching on the *client*
side (even if they're the same physical machines) could still prevent a
new file from being seen immediately.