[Bugs] [Bug 1711764] New: Files inaccessible if one rebalance process is killed in a multinode volume

bugzilla at redhat.com bugzilla at redhat.com
Mon May 20 04:54:54 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1711764

            Bug ID: 1711764
           Summary: Files inaccessible if one rebalance process is killed
                    in a multinode volume
           Product: GlusterFS
           Version: 4.1
            Status: NEW
         Component: distribute
          Assignee: bugs at gluster.org
          Reporter: nbalacha at redhat.com
                CC: bugs at gluster.org
  Target Milestone: ---
    Classification: Community



Description of problem:

This is a consequence of https://review.gluster.org/#/c/glusterfs/+/17239/ and
lookup-optimize being enabled.


Rebalance directory processing steps on each node:

1. Set new layout on directory without the commit hash
2. List files on that local subvol. Migrate those files which fall into its
bucket. Lookups are performed on the files only if it is determined that it is
to be migrated by the process.
3. When done, update the layout on the local subvol with the layout containing
the commit hash.

When there are multiple rebalance processes processing the same directory, they
finish at different times and one process can update the layout with the commit
hash before the others are done listing and migrating their files.
Clients will therefore see a complete layout even before all files have been
looked up according to the new layout causing file access to fail.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create a 2x2 volume spanning 2 nodes. Create some directories and files on
it.
2. Add 2 bricks to convert it to a 3x2 volume.
3. Start a rebalance on the volume and break into one rebalance process before
it starts processing the directories.
4. Allow the second rebalance process to complete. Kill the process that is
blocked by gdb.
5. Mount the volume and try to stat the files without listing the directories.


Actual results:

The stat will fail for several files with the error :

stat: cannot stat ‘<filename>’: No such file or directory


Expected results:


Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list