[Bugs] [Bug 1652548] New: Error reading some files - Stale file handle - distribute 2 - replica 3 volume with sharding

Thu Nov 22 10:52:38 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1652548

            Bug ID: 1652548
           Summary: Error reading some files - Stale file handle -
                    distribute 2 - replica 3 volume with sharding
           Product: GlusterFS
           Version: 3.12
         Component: sharding
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: marcoc at prismatelecomtesting.com
        QA Contact: bugs at gluster.org
                CC: bugs at gluster.org

Description of problem:
Error reading some files.
I'm trying to export a vm from gluster volume because oVirt pause the VM
because storage error but it's not possible due to "Stale file handle" errors.

I mounted the volume on another server:
s23gfs.ovirt:VOL_VMDATA on /mnt/VOL_VMDATA type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,allow_other,max_read=131072)

Trying to read the file with cp, rsync or qemu-img converter has the same
result:

# qemu-img convert -p -t none -T none -f qcow2
/mnt/VOL_VMDATA/d4f82517-5ce0-4705-a89f-5d3c81adf764/images/dbb038ee-2794-40e8-877a-a4806c47f11f/f81e0be9-db3e-48ac-876f-57b6f7cb3fe8
-O raw PLONE_active-raw.img
qemu-img: error while reading sector 2448441344: Stale file handle

Version-Release number of selected component (if applicable):
Gluster 3.12.15-1.el7

In mount log file I got many errors like:
[2018-11-20 03:20:24.471344] E [MSGID: 133010]
[shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on
shard 3558 failed. Base file gfid = 4feb4a7e-e1a3-4fa3-8d38-3b929bf52d14 [Stale
file handle]
[2018-11-20 08:56:21.110258] E [MSGID: 133010]
[shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on
shard 541 failed. Base file gfid = 2c1b6402-87b0-45cd-bd81-2cd3f38dd530 [Stale
file handle]

Is there a way to fix this? It's a distributed 2 - replicate 3 volume with
sharding.

Thanks,
Marco

Additional info:
# gluster volume info VOL_VMDATA

Volume Name: VOL_VMDATA
Type: Distributed-Replicate
Volume ID: 7bd4e050-47dd-481e-8862-cd6b76badddc
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: s20gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Brick2: s21gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Brick3: s22gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Brick4: s23gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Brick5: s24gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Brick6: s25gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Options Reconfigured:
auth.allow: 192.168.50.*,172.16.4.*,192.168.56.203
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: enable
features.shard-block-size: 512MB
cluster.data-self-heal-algorithm: full
nfs.disable: on
transport.address-family: inet

# gluster volume heal VOL_VMDATA info
Brick s20gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Status: Connected
Number of entries: 0

Brick s21gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Status: Connected
Number of entries: 0

Brick s22gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Status: Connected
Number of entries: 0

Brick s23gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Status: Connected
Number of entries: 0

Brick s24gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Status: Connected
Number of entries: 0

Brick s25gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Status: Connected
Number of entries: 0

# gluster volume status VOL_VMDATA
Status of volume: VOL_VMDATA
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick s20gfs.ovirt.prisma:/gluster/VOL_VMDA
TA/brick                                    49153     0          Y       3186 
Brick s21gfs.ovirt.prisma:/gluster/VOL_VMDA
TA/brick                                    49153     0          Y       5148 
Brick s22gfs.ovirt.prisma:/gluster/VOL_VMDA
TA/brick                                    49153     0          Y       3792 
Brick s23gfs.ovirt.prisma:/gluster/VOL_VMDA
TA/brick                                    49153     0          Y       3257 
Brick s24gfs.ovirt.prisma:/gluster/VOL_VMDA
TA/brick                                    49153     0          Y       4402 
Brick s25gfs.ovirt.prisma:/gluster/VOL_VMDA
TA/brick                                    49153     0          Y       3231 
Self-heal Daemon on localhost               N/A       N/A        Y       4192 
Self-heal Daemon on s25gfs.ovirt.prisma     N/A       N/A        Y       63185
Self-heal Daemon on s24gfs.ovirt.prisma     N/A       N/A        Y       39535
Self-heal Daemon on s20gfs.ovirt.prisma     N/A       N/A        Y       2785 
Self-heal Daemon on s23gfs.ovirt.prisma     N/A       N/A        Y       765  
Self-heal Daemon on s22.ovirt.prisma        N/A       N/A        Y       5828 

Task Status of Volume VOL_VMDATA
------------------------------------------------------------------------------
There are no active volume tasks

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.