[Gluster-devel] [RFC] on the fly GFID change for a existing file

Thu May 2 06:29:01 UTC 2013

On Fri, Apr 26, 2013 at 3:37 PM, Amar Tumballi <atumball at redhat.com> wrote:

> Hi,
>
> This is an extended discussion on patch http://review.gluster.org/4702
>
> With this patch going in, a mount point can be made to access the files
> directly using the gfid of the files, and not just path.
>
> We are planning to use this along with Changelog [1] for enhanced
> geo-replication feature we are planning to develop. Now, the good thing is,
> with these combination we have many benefits, which primarily includes not
> crawling the system to find the changes done in last N minutes.
>
> But the challenge we have now is with the upgrading of the existing
> geo-replication setup to newer one, where we would need to keep the 'slave'
> volume's files to have the exact same GFID as that of 'master' volume. To
> achieve this, when we upgrade, we would need a *method* to change the GFID
> of the existing files in 'slave' volume on the fly.
>
> We have couple of options:
>
> 1. delete the '.glusterfs/' directory from the slave volume and use the
> aux-gfid-path based mount to do the lookup (with proper GLUSTERFS_GFID env
> variable set), so it creates the gfid with new one.
>  * needs a change in posix xlator to overwrite existing 'trusted.gfid'
> attribute too.
>

This approach is inline with gfid healing we already have and requires very
less code change within glusterfs (we might need a script to remove
existing gfid xattrs and send a stat on each file with correct gfid from
master). However the parallel use of mount point  by other applications
during the window of time (after gfid xattr deletion, before lookup with
correct gfid copied from master) might result in a file having a different
gfid from its counterpart in master. Hence, this approach does not bode
well for rolling upgrades (upgrades without downtime).

>
> 2. bring a setxattr() interface to change the gfid on the fly based on a
> virtual xattr.
>  * needs extra check as the existing inode number (aka, gfid) suddenly
> changes, and we need to handle it gracefully.
>

This approach is also simple, but requires relatively more code change
within glusterfs (compared to 1). If we happen to change gfid of an already
looked up inode, protocol/client fails revalidate (re-lookup) on that file
with ESTALE and fuse-bridge sends a fresh lookup on that path. The effect
of this would be we send a new nodeid to kernel as reply of revalidate. And
kernel too will create a new inode replacing old inode for that file. Hence
AFAIK this wouldn't cause any problem. This approach is also good since we
change gfid atomically. Hence rolling upgrades are possible with this
approach.

Given the advantages of approach 2, I would prefer it. Thanks to Csaba for
his inputs.

> Let me know if someone has better options, or can take a call on which is
> the better approach.
>
> Regards,
> Amar
>
> [1] - http://review.gluster.org/4766
>
> ______________________________**_________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/**mailman/listinfo/gluster-devel<https://lists.nongnu.org/mailman/listinfo/gluster-devel>
>

regards,
-- 
Raghavendra G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130502/d3ef0440/attachment-0001.html>