[Gluster-devel] RENAME syscall semantics
milindchangire at gmail.com
Fri Dec 11 14:32:55 UTC 2015
Gluster uses changelogs to perform geo-replication. The changelogs record
syscalls which are forwarded from the master cluster and played on slave
cluster to provide the geo-replication feature.
If two hard-links (h1 and h2) point to the same inode and a Python
statement of os.rename(h1, h2) is executed, then no syscall gets logged to
the changelog i.e. the syscall never reaches Gluster.
Is this behavior of renaming hard-links pointing to same inode guaranteed
to NOT reach the file-system specific code?
I'm repeating myself, but I think an example would help me explain much
Consider the following sequence of syscalls:
CREATE f1 /* create file f1 */
LINK f1 h1 /* create hard-link h1 pointing to f1 */
RENAME h1 h2 /* rename hard-link h1 to h2 */
All of the above goes well and we have f1 and h2 existing on the master and
However, if geo-replication is stopped and restarted, then due to clock
synchronization issues between nodes, the last changelog is replayed on the
slave cluster. This replay causes problems during hard-link renames. So,
the previously defined set of syscalls are replayed on the slave:
CREATE f1 /* ignored by Gluster due to same gfid exists */
LINK f1 h1 /* h1 created since it does not exist */
RENAME h1 h2 /* silently ignored since it never reaches
* Gluster since h1 and h2 point to the same inode
So, at the slave cluster, we now have f1, h1 and h2.
The issue now is, how to do away with the extra link h1 getting accumulated
on the Gluster file-system. Ideally it shouldn't exist after changelogs are
Can Gluster assume that if the operands to a RENAME syscall point to the
same inode, then file-system specific code to handle the rename syscall
will never be invoked?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gluster-devel