[Gluster-devel] 1402538 : Assertion failure during rebalance of symbolic links

Xavier Hernandez xhernandez at datalab.es
Wed Dec 14 09:36:47 UTC 2016


On 12/14/2016 10:28 AM, Pranith Kumar Karampuri wrote:
>
>
> On Wed, Dec 14, 2016 at 2:54 PM, Xavier Hernandez <xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>> wrote:
>
>     On 12/14/2016 10:17 AM, Pranith Kumar Karampuri wrote:
>
>
>
>         On Wed, Dec 14, 2016 at 1:48 PM, Xavier Hernandez
>         <xhernandez at datalab.es <mailto:xhernandez at datalab.es>
>         <mailto:xhernandez at datalab.es <mailto:xhernandez at datalab.es>>>
>         wrote:
>
>             There's another issue with the patch that Ashish sent.
>
>             The original problem is that a setattr on a symbolic link gets
>             transformed to a regular file while the fop is being
>         executed. Even
>             if we apply the Ashish' patch to avoid the assert, the
>         setattr fop
>             will still succeed and incorrectly change the attributes of a
>             gluster special file that shouldn't change.
>
>             I think that's a bigger problem that needs to be addressed
>         globally.
>
>             I'm sure this is not an easy solution, but probably the best way
>             would be to have distinct inodes for the gluster link files
>         and the
>             original file. This way most of these problems should be solved.
>
>
>         Is there any reason why there is a difference in type of the file on
>         hashed/cached subvols? We can have the same type of file on both dht
>         subvolumes? That will prevent unlink of regular file and
>         recreate with
>         the actual type of the file?
>
>
>     I think the problem is not only the type of the inode. There are
>     more things involved. If we allow operations intended for regular
>     files to succeed on the dht link file itself, the operation won't be
>     visible and may affect future actions.
>
>     How it's prevented that the setattr modifies an already created link
>     file ? or at least, are these changes propagated to the real file
>     later and the link is restored to the original state ? if so, how
>     dht detects all this without any locks ? if it's able to detect
>     that, why does it send the setattr request anyway ?
>
>
> I think the assert messages are coming at the time of marking the file
> as '---------T' file. DHT makes sure the actual fop happens on the
> cached subvolume. But this linkto file will be present in hashed
> subvolume indicating it is a linkto file (i.e. 'T' file and there will
> be an extended attribute telling where the actual file is in an extended
> attrbute). With EC in picture this marking of linkto file by doing a
> setattr as '---------T' file is asserting.

If I understand correctly, the real file and the link file not only 
share the gfid but also share the inode structure itself (otherwise dht 
would send the new inode to setattr and that problem won't happen). This 
means that we have a single inode structure to represent a symbolic link 
and a regular file at the same time. This seems very bad to me.

>
>
>
>
>
>
>             Xavi
>
>
>             On 12/14/2016 09:02 AM, Xavier Hernandez wrote:
>
>                 On 12/14/2016 06:10 AM, Raghavendra Gowdappa wrote:
>
>
>
>                     ----- Original Message -----
>
>                         From: "Pranith Kumar Karampuri"
>         <pkarampu at redhat.com <mailto:pkarampu at redhat.com>
>                         <mailto:pkarampu at redhat.com
>         <mailto:pkarampu at redhat.com>>>
>                         To: "Ashish Pandey" <aspandey at redhat.com
>         <mailto:aspandey at redhat.com>
>                         <mailto:aspandey at redhat.com
>         <mailto:aspandey at redhat.com>>>
>                         Cc: "Gluster Devel" <gluster-devel at gluster.org
>         <mailto:gluster-devel at gluster.org>
>                         <mailto:gluster-devel at gluster.org
>         <mailto:gluster-devel at gluster.org>>>, "Shyam Ranganathan"
>                         <srangana at redhat.com
>         <mailto:srangana at redhat.com> <mailto:srangana at redhat.com
>         <mailto:srangana at redhat.com>>>,
>                         "Nithya Balachandran"
>                         <nbalacha at redhat.com
>         <mailto:nbalacha at redhat.com> <mailto:nbalacha at redhat.com
>         <mailto:nbalacha at redhat.com>>>,
>                         "Xavier Hernandez" <xhernandez at datalab.es
>         <mailto:xhernandez at datalab.es>
>                         <mailto:xhernandez at datalab.es
>         <mailto:xhernandez at datalab.es>>>,
>                         "Raghavendra Gowdappa" <rgowdapp at redhat.com
>         <mailto:rgowdapp at redhat.com>
>                         <mailto:rgowdapp at redhat.com
>         <mailto:rgowdapp at redhat.com>>>
>                         Sent: Tuesday, December 13, 2016 9:29:46 PM
>                         Subject: Re: 1402538 : Assertion failure during
>                         rebalance of symbolic
>                         links
>
>                         On Tue, Dec 13, 2016 at 2:45 PM, Ashish Pandey
>                         <aspandey at redhat.com
>         <mailto:aspandey at redhat.com> <mailto:aspandey at redhat.com
>         <mailto:aspandey at redhat.com>>>
>
>                         wrote:
>
>                             Hi All,
>
>                             We have been seeing an issue where re balancing
>                             symbolic links leads
>                             to an
>                             assertion failure in EC volume.
>
>                             The root cause of this is that while migrating
>                             symbolic links to
>                             other sub
>                             volume, it creates a link file (with attributes
>                             .........T) .
>                             This file is a regular file.
>                             Now, during migration a setattr comes to
>         this link
>                             and because of
>                             possible
>                             race, posix_stat return stats of this "T" file.
>                             In ec_manager_seattr, we receive callbacks
>         and check
>                             the type of
>                             entry. If
>                             it is a regular file we try to get size and
>         if it is
>                             not there, we
>                             raise an
>                             assert.
>                             So, basically we are checking a size of the link
>                             (which will not have
>                             size) which has been returned as regular
>         file and we
>                             are ending up when
>                             this condition
>                             becomes TRUE.
>
>                             Now, this looks like a problem with re
>         balance and
>                             difficult to fix at
>                             this point (as per the discussion).
>                             We have an alternative to fix it in EC but
>         that will
>                             be more like a
>                             hack
>                             than an actual fix. We should not modify EC
>                             to deal with an individual issue which is in
>         other
>                             translator.
>
>
>                     I am afraid, dht doesn't have a better way of
>         handling this.
>                     While DHT
>                     maintains abstraction (of a symbolic link) to layers
>         above,
>                     the layers
>                     below it cannot be shielded from seeing the details
>         like a
>                     linkto file
>                     etc.
>
>
>                 That's ok, and I think it's the right thing to do. From
>         the point of
>                 view of EC, it's irrelevant how the file is seen by
>         upper layers. It
>                 only cares about the files below it.
>
>                     If the concern really is that the file is changing
>         its type
>                     in a span
>                     of single fop, we can probably explore the option of
>         locking
>                     (or other
>                     synchronization mechanisms) to prevent migration taking
>                     place, while a
>                     fop is in progress.
>
>
>                 That's the real problem. Some operations receive an inode
>                 referencing a
>                 symbolic link on input but the iatt structures from the
>         callback
>                 reference a regular file. It's even worse because it's an
>                 asynchronous
>                 race so some of the bricks may return a regular file and
>         some
>                 may return
>                 a symbolic link. If there are more than redundancy bricks
>                 returning a
>                 different type, the most probably result will be an I/O
>         error
>                 caused by
>                 inconsistent answers.
>
>                 Ashish wrote a patch to check the type of the inode at
>         the input
>                 instead
>                 of relying on the answers. While this could avoid the
>         assertion
>                 issued
>                 by ec, it doesn't solve the race, leaving room for the
>         I/O errors I
>                 mentioned earlier.
>
>                     But, I assume there will be performance penalties
>         for that too.
>
>
>                 Yes. I don't see any other way to really solve this
>         problem. A
>                 lock is
>                 needed.
>
>                 In ec we already have a problem that will need an
>         additional lock on
>                 rmdir, unlink and rename to avoid some races. This
>         change will
>                 also need
>                 support from locks xlator to avoid granting locks on deleted
>                 inodes. If
>                 dht is using one of these operations to replace the symbolic
>                 link by the
>                 gluster link file, I think this change could solve the I/O
>                 errors, but
>                 I'm not sure we could completely solve the problem.
>
>                 I'm not sure how dht does the transform from a symbolic
>         link to a
>                 gluster link file, but if it involves more than one fop
>         from the
>                 point
>                 of view of ec, there's nothing that ec can do to solve the
>                 problem. If
>                 another client accesses the file, ec can return any
>         intermediate
>                 state.
>                 DHT should take some lock to do all operations
>         atomically and avoid
>                 problems on other clients.
>
>                 I think that the mid-term approach to completely solve
>         the problem
>                 without a performance impact should be to implement some
>         kind of
>                 transaction mechanism that will reuse lock requests.
>         This would
>                 allow,
>                 among other things, that multiple atomic operations could be
>                 performed
>                 by different xlators but sharing the locks instead of
>         requiring each
>                 xlator to take an inodelk on its own.
>
>                 Xavi
>
>
>
>                             Now the question is how to proceed with
>         this? Any
>                             suggestions?
>
>
>                         Raghavendra/Nithya,
>                                  Could one of you explain the
>         difficulties in
>                         fixing this
>                         issue in
>                         DHT so that Xavi will also be caught up with why we
>                         should add this
>                         change
>                         in EC in the short term.
>
>
>
>                             Details on this bug can be found here -
>
>         https://bugzilla.redhat.com/show_bug.cgi?id=1402538
>         <https://bugzilla.redhat.com/show_bug.cgi?id=1402538>
>
>         <https://bugzilla.redhat.com/show_bug.cgi?id=1402538
>         <https://bugzilla.redhat.com/show_bug.cgi?id=1402538>>
>
>                             ----
>                             Ashish
>
>
>
>
>
>
>                         --
>                         Pranith
>
>
>
>
>
>
>         --
>         Pranith
>
>
>
>
>
> --
> Pranith



More information about the Gluster-devel mailing list