[Bugs] [Bug 1258377] New: ACL created on a dht.linkto file on a files that skipped rebalance

bugzilla at redhat.com bugzilla at redhat.com
Mon Aug 31 08:52:46 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1258377

            Bug ID: 1258377
           Summary: ACL created on a dht.linkto file on a files that
                    skipped rebalance
           Product: GlusterFS
           Version: 3.7.3
         Component: distribute
          Severity: high
          Priority: high
          Assignee: bugs at gluster.org
          Reporter: nbalacha at redhat.com
                CC: asriram at redhat.com, bkunal at redhat.com,
                    bugs at gluster.org, cbuissar at redhat.com,
                    gluster-bugs at redhat.com, mlawrenc at redhat.com,
                    nbalacha at redhat.com, nsathyan at redhat.com,
                    storage-qa-internal at redhat.com
        Depends On: 1234610, 1247563
            Blocks: 1216951



+++ This bug was initially created as a clone of Bug #1247563 +++

+++ This bug was initially created as a clone of Bug #1234610 +++

DESCRIPTION

When rebalancing a file with ACL, if you are unlucky enough so that the file is
skipped due to __dht_check_free_space() failure (or maybe other reasons), the
ACLs will still be migrated on the destination file, containing the dht.linkto
XATTR.

This leads to issues where on the client, the file will be seen twice :

---8<----
[root at rhs30-node1 severalfiles]# echo bla > file
[root at rhs30-node1 severalfiles]# setfacl -m u:cedric:w file
[root at rhs30-node1 severalfiles]# ll file
-rw-rw-r--+ 1 root root 4 Jun 22 21:42 file
[root at rhs30-node1 severalfiles]# mv file renamed
[root at rhs30-node1 severalfiles]# gluster volume rebalance thingluster start
volume rebalance: thingluster: success: Rebalance on thingluster has been
started successfully. Use rebalance status command to check status of the
rebalance process.
ID: 9277c760-2495-4058-894e-12e95e005b90

[root at rhs30-node1 severalfiles]# ls -li | grep renamed
12641300739152549957 -rw-rw-r-T+ 1 root root        4 Jun 22 21:43 renamed
12641300739152549957 -rw-rw-r-T+ 1 root root        4 Jun 22 21:43 renamed
---->8----

looking on the bricks :

on my node 2 :
524419 -rw-rw-r--+ 2 root root 4 Jun 22 21:42
/mnt/bricks/thiny/gluster/severalfiles/renamed

On my node 4 :
524427 -rw-rw-r-T+ 2 root root 4 Jun 22 21:43
/mnt/bricks/thiny/gluster/severalfiles/renamed


We can see that the node 4 inherited from the ACLs, while this usually does not
happen : regular dht.linkto files do not have ACLs.


--- Additional comment from Cedric Buissart on 2015-06-22 16:02:04 EDT ---

I suspect that the issue is a side effect of
xlators/cluster/dht/src/dht-rebalance.c , function
__dht_rebalance_create_dst_file :

static inline int
__dht_rebalance_create_dst_file (xlator_t *to, xlator_t *from, loc_t *loc,
struct iatt *stbuf,
                                 dict_t *dict, fd_t **dst_fd, dict_t *xattr)
{
[...]
        /* TODO: move all xattr related operations to fd based operations */
        ret = syncop_listxattr (from, loc, &xattr);
        if (ret < 0) {
                gf_msg (this->name, GF_LOG_WARNING, 0,
                        DHT_MSG_MIGRATE_FILE_FAILED,
                        "Migrate file failed:"
                        "%s: failed to get xattr from %s (%s)",
                        loc->path, from->name, strerror (-ret));
                ret = -1;
        }

        /* create the destination, with required modes/xattr */
        ret = __dht_rebalance_create_dst_file (to, from, loc, &stbuf,
                                               dict, &dst_fd, xattr);
        if (ret)
                goto out;

        ret = __dht_check_free_space (to, from, loc, &stbuf, flag);
        if (ret) {
                goto out;
        }


Where the xattr are copied over (syncop_listxattr();
__dht_rebalance_create_dst_file()) but then we might cancel the migration if
__dht_check_free_space() returns a failure ... leading to xattr having been
modified on the destination.

or maybe we want to change the behavior or readdir/getdents ?

--- Additional comment from Bipin Kunal on 2015-06-23 03:47:08 EDT ---

(In reply to Cedric Buissart from comment #2)
> I suspect that the issue is a side effect of
> xlators/cluster/dht/src/dht-rebalance.c , function
> __dht_rebalance_create_dst_file :
> 
I think the function which Cedric wanted to point is :
int
dht_migrate_file (xlator_t *this, loc_t *loc, xlator_t *from, xlator_t *to,
                  int flag)
> {
> [...]
>         /* TODO: move all xattr related operations to fd based operations */
>         ret = syncop_listxattr (from, loc, &xattr);
>         if (ret < 0) {
>                 gf_msg (this->name, GF_LOG_WARNING, 0,
>                         DHT_MSG_MIGRATE_FILE_FAILED,
>                         "Migrate file failed:"
>                         "%s: failed to get xattr from %s (%s)",
>                         loc->path, from->name, strerror (-ret));
>                 ret = -1;
>         }
> 
>         /* create the destination, with required modes/xattr */
>         ret = __dht_rebalance_create_dst_file (to, from, loc, &stbuf,
>                                                dict, &dst_fd, xattr);
>         if (ret)
>                 goto out;
> 
>         ret = __dht_check_free_space (to, from, loc, &stbuf, flag);
>         if (ret) {
>                 goto out;
>         }
> 
> 
> Where the xattr are copied over (syncop_listxattr();
> __dht_rebalance_create_dst_file()) but then we might cancel the migration if
> __dht_check_free_space() returns a failure ... leading to xattr having been
> modified on the destination.
> 
> or maybe we want to change the behavior or readdir/getdents ?

I think we should be calling function __dht_check_free_space() at least prior
to __dht_rebalance_create_dst_file else we should have proper cleanup activity
in the "out" section.


Thanks,
Bipin Kunal

--- Additional comment from Nithya Balachandran on 2015-06-23 06:39:02 EDT ---

What Cedric and Bipin have said is correct. I can reproduce the issue and am
working on a fix. I will update with progress tomorrow.

--- Additional comment from Cedric Buissart on 2015-06-23 10:00:39 EDT ---

(In reply to Bipin Kunal from comment #3)
> 
> I think we should be calling function __dht_check_free_space() at least
> prior to __dht_rebalance_create_dst_file else we should have proper cleanup
> activity in the "out" section.
>  
> 
> Thanks,
> Bipin Kunal

Yes, I was thinking the same. After a quick try, it seems to work :
I have the "data movement attempted from node [..] with higher disk space to a
node" in the logs, but this time, did not lead to dups in the ls output.

However, additionally : should we also look at the readdir behaviour and check
why the ACLs on the linkto file leads to showing dups ?

--- Additional comment from Nithya Balachandran on 2015-06-23 13:23:36 EDT ---

Cedric and Bipin, 
The change discussed above should certainly work but I want to look into a few
more areas to understand the impact - specifically wrt to the new
lookup-optimize option we are introducing in 3.1. I will update the BZ tomorrow
again.

--- Additional comment from Nithya Balachandran on 2015-06-24 04:12:41 EDT ---

This happens only if posix acls are set on the file being rebalanced and is
easily reproduced using the following steps:

1. Create a distributed volume, say vol1
2. Mount it using the following command:
 mount -t glusterfs -o acl <server>:vol1 /mnt/vol1

3. Create a file on the gluster volume from the mount point and set a posix acl
on it
#> cd /mnt/vol1
#> touch file1
#> setfacl -m u:root:rw file1

#>l

total 0
-rw-rw-r--+ 1 root root    0 Jun 24 13:21 file1


4. Write some content into the file and rename the file so it hashes to a
different brick, say file2. This will create a linkto file on the new hashed
subvol.

5. Start a rebalance. Somehow force that file to be skipped (I used gdb).


Post the rebalance, it is seen that the linkto file on the brick now has
additional permissions because of the posix acls that are set on it as part of
the xattr copy in  __dht_rebalance_create_dst_file. Listing the file on the
mount point will show duplicate entries as these additional permissions cause
the IS_DHT_LINKFILE_MODE check to fail. The linkto file is then treated as a
data file.


Before rebalance:

./x1:

---------T 2 root root 0 Jun 24 13:10 file2   <-- Valid linkto file

./x3:

-rw-rwxr--+ 2 root root 3214 Jun 24 13:10 file2



After rebalance:

./x1:

-rw-rwxr-T+ 2 root root 3214 Jun 24 13:11 file2  <---Linkto file perms changed

./x3:

-rw-rwxr--+ 2 root root 3214 Jun 24 13:10 file2  <---  actual data file




Impact:

The user sees 2 entries for the file listing on the mount point. In some cases,
operations are sent to the wrong copy of the file causing data loss/corruption.



Workaround:

Use 

gluster volume rebalance <volname> start force

to prevent the file being skipped because of the space check.


Once this has occurred, the affected linkto files have to be identified and
deleted from the brick (not the mount point).

--- Additional comment from Nithya Balachandran on 2015-06-24 04:29:03 EDT ---

Analysis:

1. The additional permissions set on the linkto file as a result of copying the
posix acl xattrs in __dht_rebalance_create_dst_file cause the
IS_DHT_LINKFILE_MODE check to fail.

2. If the __dht_check_free_space fails post this, the modified linkto file is
not cleaned up.

3. Operations then treat this file as the data file and can cause data
corruption/loss. I see the ops sometimes going to the correct file after 
initially accessing the wrong file but I am still investigating why.


Possible fixes:
1. Reverse the order of calling __dht_rebalance_create_dst_file and
__dht_check_free_space.

This is the simplest way and will fix this specific issue. It however does not
address the problem of not cleaning up the dst file in case of failures later
in the file migration  sequence.

2. Cleaning up a dst file in case of migration skip/failure.
This might have issues where, if acls are set on a file undergoing migration,
it will be set on the target file anyway as part of FOP handling. Subsequent
fops may end up being performed on the wrong file. That this actually happens
however needs to be confirmed.


3. Modify the IS_DHT_LINKFILE_MODE to ignore the other permissions. This would
require extensive study/testing to determine the impact and is not recommended
at this point.

--- Additional comment from Anand Avati on 2015-07-28 06:55:45 EDT ---

REVIEW: http://review.gluster.org/11774 (cluster/dht: Check for space before
creating migration dst file) posted (#1) for review on master by N Balachandran
(nbalacha at redhat.com)

--- Additional comment from Anand Avati on 2015-08-27 03:49:41 EDT ---

REVIEW: http://review.gluster.org/12025 (cluster/dht: Don't set posix acls on
linkto files) posted (#1) for review on master by N Balachandran
(nbalacha at redhat.com)

--- Additional comment from Anand Avati on 2015-08-27 03:54:16 EDT ---

REVIEW: http://review.gluster.org/12025 (cluster/dht: Don't set posix acls on
linkto files) posted (#2) for review on master by N Balachandran
(nbalacha at redhat.com)

--- Additional comment from Anand Avati on 2015-08-27 06:01:36 EDT ---

REVIEW: http://review.gluster.org/12025 (cluster/dht: Don't set posix acls on
linkto files) posted (#3) for review on master by N Balachandran
(nbalacha at redhat.com)

--- Additional comment from Anand Avati on 2015-08-27 06:48:14 EDT ---

REVIEW: http://review.gluster.org/12025 (cluster/dht: Don't set posix acls on
linkto files) posted (#4) for review on master by N Balachandran
(nbalacha at redhat.com)

--- Additional comment from Anand Avati on 2015-08-27 06:56:53 EDT ---

REVIEW: http://review.gluster.org/12025 (cluster/dht: Don't set posix acls on
linkto files) posted (#5) for review on master by N Balachandran
(nbalacha at redhat.com)

--- Additional comment from Anand Avati on 2015-08-28 07:10:36 EDT ---

REVIEW: http://review.gluster.org/12025 (cluster/dht: Don't set posix acls on
linkto files) posted (#6) for review on master by N Balachandran
(nbalacha at redhat.com)

--- Additional comment from Anand Avati on 2015-08-31 03:28:04 EDT ---

COMMIT: http://review.gluster.org/12025 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com) 
------
commit b9c730f3960efd454c8363ee39dc144e4c0dc835
Author: Nithya Balachandran <nbalacha at redhat.com>
Date:   Thu Aug 27 13:10:18 2015 +0530

    cluster/dht: Don't set posix acls on linkto files

    Posix acls on a linkto file change the file's permission
    bits and cause DHT to treat it as a non-linkto file.This
    happens on the migration failure of a file on which posix
     acls were set.

    The fix prevents posix acls from being set on a linkto
    file and copies them across only after a file has
    been successfully migrated.

    Change-Id: Iccf7ff6fba49fe05d691d9b83bf76a240848b212
    BUG: 1247563
    Signed-off-by: Nithya Balachandran <nbalacha at redhat.com>
    Signed-off-by: N Balachandran <nbalacha at redhat.com>
    Reviewed-on: http://review.gluster.org/12025
    Tested-by: NetBSD Build System <jenkins at build.gluster.org>
    Reviewed-by: Raghavendra G <rgowdapp at redhat.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1234610
[Bug 1234610] ACL created on a dht.linkto file on a files that skipped
rebalance
https://bugzilla.redhat.com/show_bug.cgi?id=1247563
[Bug 1247563] ACL created on a dht.linkto file on a files that skipped
rebalance
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list