[Bugs] [Bug 1670155] New: Tiered volume files disappear when a hot brick is failed/restored until the tier detached.

bugzilla at redhat.com bugzilla at redhat.com
Mon Jan 28 18:04:14 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1670155

            Bug ID: 1670155
           Summary: Tiered volume files disappear when a hot brick is
                    failed/restored until the tier detached.
           Product: GlusterFS
           Version: 5
            Status: NEW
         Component: tiering
          Severity: medium
          Assignee: bugs at gluster.org
          Reporter: jbyers at stonefly.com
        QA Contact: bugs at gluster.org
                CC: bugs at gluster.org
  Target Milestone: ---
    Classification: Community



Tiered volume files disappear when a hot brick is failed/restored until the
tier detached.

Files residing in the hot tier of distributed hot-tiered
volume disappear when a hot tier brick is failed/restored.
The missing files do not reappear until the tier is detached.

The hot tier resident files are expected to disappear when the
brick it is on fails. When the brick is restored, they should
come back, but often they do not. This occurs when they stop
showing up in mount point 'ls -lsh' with results '??????????'. In
some cases, doing an ls or open on the full path name of the
file will bring it back, other times it will not. In those cases
the hot tier needs to be detached to get them back.

The problem occurs using either NFS or CIFS/Fuse mounts.

The problem was first seen with the cold tier being a Disperse
volume, but also occurs with a Distributed cold tier volume.

The problem was first seen on GlusterFS 3.12.14, and has been
reproduced on GlusterFS 5.2. Note that this first happened on
a production system, and was then reproduced in a lab
environment.

Test plan below.

# glusterd -V
glusterfs 5.2

##### Create the brick dirs and cold tier volume.

# mkdir /exports/cold-brick-1/dir
# mkdir /exports/cold-brick-2/dir
# mkdir /exports/cold-brick-3/dir
# mkdir /exports/hot-brick-1/dir
# mkdir /exports/hot-brick-2/dir
# mkdir /exports/hot-brick-3/dir

# gluster volume create tiered-vol transport tcp
10.0.0.5:/exports/cold-brick-1/dir
volume create: tiered-vol: success: please start the volume to access data
# gluster volume start tiered-vol
volume start: tiered-vol: success

##### Expand the cold tier volume.

# gluster volume add-brick tiered-vol 10.0.0.5:/exports/cold-brick-2/dir/
volume add-brick: success
# gluster volume add-brick tiered-vol 10.0.0.5:/exports/cold-brick-3/dir/
volume add-brick: success

##### Mount the volume.

# gluster volume set tiered-vol nfs.disable off
volume set: success
# mount 127.0.0.1:tiered-vol /mnt/tiered-vol/

##### Create volumes on the volume, not tiered yet.

# xfs_mkfile 1G /mnt/tiered-vol/file-1
# xfs_mkfile 1G /mnt/tiered-vol/file-2
# xfs_mkfile 1G /mnt/tiered-vol/file-3

# ls -lsh /mnt/tiered-vol/
total 3.0G
1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3

# ls -lsh /exports/*brick*/dir/*
1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-2/dir/file-3
1.1G -rw------- 2 root root 1.0G Jan 28 08:49 /exports/cold-brick-3/dir/file-1
1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-3/dir/file-2

# gluster volume info tiered-vol

Volume Name: tiered-vol
Type: Distribute
Volume ID: 0639e4e4-249d-485c-9995-90aa8be9c94e
Status: Started
Snapshot Count: 0
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: 10.0.0.5:/exports/cold-brick-1/dir
Brick2: 10.0.0.5:/exports/cold-brick-2/dir
Brick3: 10.0.0.5:/exports/cold-brick-3/dir
Options Reconfigured:
transport.address-family: inet
nfs.disable: off

# gluster volume status tiered-vol
Status of volume: tiered-vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.0.0.5:/exports/cold-brick-1/dir    62002     0          Y       120790
Brick 10.0.0.5:/exports/cold-brick-2/dir    62003     0          Y       120929
Brick 10.0.0.5:/exports/cold-brick-3/dir    62004     0          Y       120978
NFS Server on localhost                     2049      0          Y       121103

Task Status of Volume tiered-vol
------------------------------------------------------------------------------
Task                 : Rebalance
ID                   : 13a856c2-f511-475c-b2ff-f6e0190ade50
Status               : completed

##### Kill one of the brick processes, and note that the files
on that brick disappear. This is normal, and expected.

# kill 120929

# ls -lsh /mnt/tiered-vol/
total 2.0G
1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2

##### Start the brick processes, and see that all files are
back.

# gluster volume start tiered-vol force
volume start: tiered-vol: success

# ls -lsh /mnt/tiered-vol/
total 3.0G
1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3

##### Attach the hot tier, and create new files that are
stored there.

# gluster volume tier tiered-vol attach 10.0.0.5:/exports/hot-brick-1/dir
10.0.0.5:/exports/hot-brick-2/dir 10.0.0.5:/exports/hot-brick-3/dir
volume attach-tier: success

# xfs_mkfile 1G /mnt/tiered-vol/file-hot-1
# xfs_mkfile 1G /mnt/tiered-vol/file-hot-2
# xfs_mkfile 1G /mnt/tiered-vol/file-hot-3

# ls -lsh /mnt/tiered-vol/
total 6.0G
1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3
1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:59 file-hot-3

# ls -lsh /exports/*brick*/dir/*
1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-2/dir/file-3
   0 ---------T 2 root root    0 Jan 28 08:57
/exports/cold-brick-2/dir/file-hot-1
1.1G -rw------- 2 root root 1.0G Jan 28 08:49 /exports/cold-brick-3/dir/file-1
1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-3/dir/file-2
   0 ---------T 2 root root    0 Jan 28 08:58
/exports/cold-brick-3/dir/file-hot-2
   0 ---------T 2 root root    0 Jan 28 08:58
/exports/cold-brick-3/dir/file-hot-3
1.1G -rw------- 2 root root 1.0G Jan 28 08:58
/exports/hot-brick-2/dir/file-hot-1
1.1G -rw------- 2 root root 1.0G Jan 28 08:58
/exports/hot-brick-3/dir/file-hot-2
1.1G -rw------- 2 root root 1.0G Jan 28 08:59
/exports/hot-brick-3/dir/file-hot-3

# gluster volume status tiered-vol
Status of volume: tiered-vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick 10.0.0.5:/exports/hot-brick-3/dir     62007     0          Y       127766
Brick 10.0.0.5:/exports/hot-brick-2/dir     62006     0          Y       127744
Brick 10.0.0.5:/exports/hot-brick-1/dir     62003     0          Y       127722
Cold Bricks:
Brick 10.0.0.5:/exports/cold-brick-1/dir    62002     0          Y       120790
Brick 10.0.0.5:/exports/cold-brick-2/dir    62005     0          Y       123087
Brick 10.0.0.5:/exports/cold-brick-3/dir    62004     0          Y       120978
Tier Daemon on localhost                    N/A       N/A        Y       127804
NFS Server on localhost                     2049      0          Y       127795

##### Kill a brick process for the distributed hot tier
volume. See that the files stored there cannot be accessed.
This is normal and expected. This is a case where things
worked as expected.

# kill 127744
# ls -lsh /mnt/tiered-vol/
ls: cannot access /mnt/tiered-vol/file-hot-1: No such file or directory
total 5.0G
1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3
   ? ?????????? ? ?    ?       ?            ? file-hot-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:59 file-hot-3

# ls -lsh /mnt/tiered-vol/
ls: cannot access /mnt/tiered-vol/file-hot-1: No such file or directory
total 5.0G
1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3
   ? ?????????? ? ?    ?       ?            ? file-hot-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:59 file-hot-3

##### Start the hot tier brick process, and note that all
files are back.

# gluster volume start tiered-vol force
volume start: tiered-vol: success

# ls -lsh /mnt/tiered-vol/
total 6.0G
1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3
1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:59 file-hot-3

# ls -lsh /exports/*brick*/dir/*
1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-2/dir/file-3
   0 ---------T 2 root root    0 Jan 28 08:57
/exports/cold-brick-2/dir/file-hot-1
1.1G -rw------- 2 root root 1.0G Jan 28 08:49 /exports/cold-brick-3/dir/file-1
1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-3/dir/file-2
   0 ---------T 2 root root    0 Jan 28 08:58
/exports/cold-brick-3/dir/file-hot-2
   0 ---------T 2 root root    0 Jan 28 08:58
/exports/cold-brick-3/dir/file-hot-3
1.1G -rw------- 2 root root 1.0G Jan 28 08:58
/exports/hot-brick-2/dir/file-hot-1
1.1G -rw------- 2 root root 1.0G Jan 28 08:58
/exports/hot-brick-3/dir/file-hot-2
1.1G -rw------- 2 root root 1.0G Jan 28 08:59
/exports/hot-brick-3/dir/file-hot-3

# gluster volume status tiered-vol
Status of volume: tiered-vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick 10.0.0.5:/exports/hot-brick-3/dir     62007     0          Y       127766
Brick 10.0.0.5:/exports/hot-brick-2/dir     62010     0          Y       130185
Brick 10.0.0.5:/exports/hot-brick-1/dir     62003     0          Y       127722
Cold Bricks:
Brick 10.0.0.5:/exports/cold-brick-1/dir    62002     0          Y       120790
Brick 10.0.0.5:/exports/cold-brick-2/dir    62005     0          Y       123087
Brick 10.0.0.5:/exports/cold-brick-3/dir    62004     0          Y       120978
Tier Daemon on localhost                    N/A       N/A        Y       127804
NFS Server on localhost                     2049      0          Y       130217

##### Kill another brick process for the distributed hot tier
volume. See that the files stored there cannot be accessed.
The first 'ls' shows the missing files, but the second one
does not. This time the files will *not* come back when the
brick is restored. This is a problem.

# kill 127766
# ls -lsh /mnt/tiered-vol/
ls: cannot access /mnt/tiered-vol/file-hot-2: No such file or directory
ls: cannot access /mnt/tiered-vol/file-hot-3: No such file or directory
total 4.0G
1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3
1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-1
   ? ?????????? ? ?    ?       ?            ? file-hot-2
   ? ?????????? ? ?    ?       ?            ? file-hot-3

# ls -lsh /mnt/tiered-vol/
total 4.0G
1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3
1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-1

##### Restore the failed brick, but note that the files on it
are still gone. They still exist on the bricks though.

# gluster volume start tiered-vol force
volume start: tiered-vol: success

# ls -lsh /mnt/tiered-vol/
total 4.0G
1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3
1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-1

# ls -lsh /exports/*brick*/dir/*
1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-2/dir/file-3
   0 ---------T 2 root root    0 Jan 28 08:57
/exports/cold-brick-2/dir/file-hot-1
1.1G -rw------- 2 root root 1.0G Jan 28 08:49 /exports/cold-brick-3/dir/file-1
1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-3/dir/file-2
1.1G -rw------- 2 root root 1.0G Jan 28 08:58
/exports/hot-brick-2/dir/file-hot-1
1.1G -rw------- 2 root root 1.0G Jan 28 08:58
/exports/hot-brick-3/dir/file-hot-2
1.1G -rw------- 2 root root 1.0G Jan 28 08:59
/exports/hot-brick-3/dir/file-hot-3

##### Accessing the missing files by their full path sometimes
brings them back, but not in this case.

# ls -lsh /mnt/tiered-vol/file-hot-2
ls: cannot access /mnt/tiered-vol/file-hot-2: No such file or directory

# file /mnt/tiered-vol/file-hot-2
/mnt/tiered-vol/file-hot-2: cannot open `/mnt/tiered-vol/file-hot-2' (No such
file or directory)

# ls -lsh /mnt/tiered-vol/
total 4.0G
1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3
1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-1

##### Stopping and starting the volume does not help.

# gluster volume stop tiered-vol
Stopping volume will make its data inaccessible. Do you want to continue? (y/n)
y
volume stop: tiered-vol: success
# gluster volume start tiered-vol
volume start: tiered-vol: success

# ls -lsh /mnt/tiered-vol/
total 4.0G
1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3
1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-1

# ls -lsh /exports/*brick*/dir/*
1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-2/dir/file-3
   0 ---------T 2 root root    0 Jan 28 08:57
/exports/cold-brick-2/dir/file-hot-1
1.1G -rw------- 2 root root 1.0G Jan 28 08:49 /exports/cold-brick-3/dir/file-1
1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-3/dir/file-2
1.1G -rw------- 2 root root 1.0G Jan 28 08:58
/exports/hot-brick-2/dir/file-hot-1
1.1G -rw------- 2 root root 1.0G Jan 28 08:58
/exports/hot-brick-3/dir/file-hot-2
1.1G -rw------- 2 root root 1.0G Jan 28 08:59
/exports/hot-brick-3/dir/file-hot-3

##### Detaching the hot tier does usually bring the missing
files back.

# gluster volume tier tiered-vol detach start
volume detach tier start: success
ID: cec68278-f0b9-4289-81ab-6f3a60246c3e

# gluster volume tier tiered-vol detach status
volume detach tier status: success
        Node Rebalanced-files          size       scanned      failures      
skipped               status  run time in h:m:s
   ---------      -----------   -----------   -----------   -----------  
-----------         ------------     --------------
   localhost                0        0Bytes             7             0        
    0          in progress        0:00:21

# ls -lsh /mnt/tiered-vol/
total 6.0G
1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3
1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:59 file-hot-3

# ls -lsh /exports/*brick*/dir/*
1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-2/dir/file-3
1.0G ---------T 2 root root 1.0G Jan 28 09:09
/exports/cold-brick-2/dir/file-hot-1
1.1G -rw------- 2 root root 1.0G Jan 28 08:49 /exports/cold-brick-3/dir/file-1
1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-3/dir/file-2
1.0G ---------T 2 root root 1.0G Jan 28 09:09
/exports/cold-brick-3/dir/file-hot-2
1.0G ---------T 2 root root 1.0G Jan 28 09:09
/exports/cold-brick-3/dir/file-hot-3
1.1G -rw---S--T 2 root root 1.0G Jan 28 08:58
/exports/hot-brick-2/dir/file-hot-1
1.1G -rw---S--T 2 root root 1.0G Jan 28 08:58
/exports/hot-brick-3/dir/file-hot-2
1.1G -rw---S--T 2 root root 1.0G Jan 28 08:59
/exports/hot-brick-3/dir/file-hot-3

# gluster volume tier tiered-vol detach status
volume detach tier status: success
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                               localhost                3         3.0GB        
    7             0             0            completed        0:01:25
# gluster volume tier tiered-vol detach commit
volume detach tier commit: success

# ls -lsh /mnt/tiered-vol/
total 6.0G
1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3
1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-1
1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-2
1.0G -rw------- 1 root root 1.0G Jan 28 08:59 file-hot-3

# ls -lsh /exports/*brick*/dir/*
1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-2/dir/file-3
1.0G -rw------- 2 root root 1.0G Jan 28 08:58
/exports/cold-brick-2/dir/file-hot-1
1.1G -rw------- 2 root root 1.0G Jan 28 08:49 /exports/cold-brick-3/dir/file-1
1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-3/dir/file-2
1.0G -rw------- 2 root root 1.0G Jan 28 08:58
/exports/cold-brick-3/dir/file-hot-2
1.0G -rw------- 2 root root 1.0G Jan 28 08:59
/exports/cold-brick-3/dir/file-hot-3

EOM

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list