[Bugs] [Bug 1469964] New: cluster/dht: Fix hardlink migration failures

bugzilla at redhat.com bugzilla at redhat.com
Wed Jul 12 07:38:55 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1469964

            Bug ID: 1469964
           Summary: cluster/dht: Fix hardlink migration failures
           Product: GlusterFS
           Version: mainline
         Component: distribute
          Assignee: spalai at redhat.com
          Reporter: spalai at redhat.com
                CC: bugs at gluster.org



Description of problem:
There are few races in remove-brick hardlink migration code path detailed
below.

 A brief about how hardlink migration works:
     - Different hardlinks (to the same file) may hash to different bricks,
    but their cached subvol will be same. Rebalance picks up the first
hardlink,
    calculates it's  hash(call it TARGET) and set the hashed subvolume as an 
    xattr on the data file.
    - Now all the hardlinks those come after this will fetch that xattr and
will
    create linkto files on TARGET (all linkto files for the hardlinks will be 
    hardlink   to each other on TARGET).
    - When number of hardlinks on source is equal to the number of hardlinks on
    TARGET, the data migration will happen.

    RACE:1
      Since rebalance is multi-threaded, the first lookup (which decides where 
      the TARGET subvol should be), can be called by two hardlink migration 
      parallely and they may end up creating linkto files on two different 
      TARGET subvols. Hence, hardlinks won't be migrated.


    RACE:2
      The linkto files on TARGET can be created by other clients also if they
      are doing lookup on the hardlinks.  Consider a scenario where you have
100 
      hardlinks.  When rebalance is migrating 99th hardlink, as a result of 
      continuous lookups from other client, linkcount on TARGET is equal to 
      source linkcount. Rebalance will migrate data on the 99th hardlink
itself. 
      On 100th hardlink migration, hardlink will have TARGET as  cached 
      subvolume. If it's hash is also the same, then a migration will be 
      triggered from TARGET to TARGET leading to data loss.


 This is reproducible intermittently. Since this is related to hardlink
migration, this happens only with remove-brick process.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=i8mP20ehxw&a=cc_unsubscribe


More information about the Bugs mailing list