[Bugs] [Bug 1734251] New: Files inaccessible if one rebalance process is killed in a multinode volume

Tue Jul 30 04:56:58 UTC 2019

https://bugzilla.redhat.com/show_bug.cgi?id=1734251

            Bug ID: 1734251
           Summary: Files inaccessible if one rebalance process is killed
                    in a multinode volume
           Product: GlusterFS
           Version: 6
            Status: NEW
         Component: distribute
          Assignee: bugs at gluster.org
          Reporter: nbalacha at redhat.com
                CC: atumball at redhat.com, bugs at gluster.org
        Depends On: 1711764
            Blocks: 1714124
  Target Milestone: ---
    Classification: Community

+++ This bug was initially created as a clone of Bug #1711764 +++

Description of problem:

This is a consequence of https://review.gluster.org/#/c/glusterfs/+/17239/ and
lookup-optimize being enabled.

Rebalance directory processing steps on each node:

1. Set new layout on directory without the commit hash
2. List files on that local subvol. Migrate those files which fall into its
bucket. Lookups are performed on the files only if it is determined that it is
to be migrated by the process.
3. When done, update the layout on the local subvol with the layout containing
the commit hash.

When there are multiple rebalance processes processing the same directory, they
finish at different times and one process can update the layout with the commit
hash before the others are done listing and migrating their files.
Clients will therefore see a complete layout even before all files have been
looked up according to the new layout causing file access to fail.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Create a 2x2 volume spanning 2 nodes. Create some directories and files on
it.
2. Add 2 bricks to convert it to a 3x2 volume.
3. Start a rebalance on the volume and break into one rebalance process before
it starts processing the directories.
4. Allow the second rebalance process to complete. Kill the process that is
blocked by gdb.
5. Mount the volume and try to stat the files without listing the directories.

Actual results:

The stat will fail for several files with the error :

stat: cannot stat ‘<filename>’: No such file or directory

Expected results:

Additional info:

--- Additional comment from Nithya Balachandran on 2019-05-20 05:05:30 UTC ---

The easiest solution is to have each node do the file lookups before the call
to gf_defrag_should_i_migrate.

Pros:  Simple
Cons: Will introduce more lookups but is pretty much the same as the number
seen before https://review.gluster.org/#/c/glusterfs/+/17239/

--- Additional comment from Worker Ant on 2019-05-20 10:01:20 UTC ---

REVIEW: https://review.gluster.org/22746 (cluster/dht: Lookup all files when
processing directory) posted (#1) for review on master by N Balachandran

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1711764
[Bug 1711764] Files inaccessible if one rebalance process is killed in a
multinode volume
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.