[Bugs] [Bug 1480193] New: Running sysbench on vm disk from plain distribute gluster volume causes disk corruption

Thu Aug 10 11:27:03 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1480193

            Bug ID: 1480193
           Summary: Running sysbench on vm disk from plain distribute
                    gluster volume causes disk corruption
           Product: GlusterFS
           Version: 3.8
         Component: posix
          Keywords: Triaged
          Assignee: bugs at gluster.org
          Reporter: kdhananj at redhat.com
                CC: bugs at gluster.org, johan at kafit.se, kdhananj at redhat.com,
                    pkarampu at redhat.com, rhinduja at redhat.com,
                    rhs-bugs at redhat.com, sabose at redhat.com,
                    storage-qa-internal at redhat.com
        Depends On: 1472757, 1472758
            Blocks: 1458846, 1479692, 1479717

+++ This bug was initially created as a clone of Bug #1472758 +++

+++ This bug was initially created as a clone of Bug #1472757 +++

Description of problem:

I created a VM using disks stored in gluster volume, and ran the sysbench test
on it. This caused I/O errors and the disk to be mounted as read-only disk.

# sysbench prepare --test=oltp --mysql-table-engine=innodb --mysql-password=pwd
--oltp-table-size=500000000 --oltp-dist-type=gaussian

- Errors in dmesg
[ 2838.983763] blk_update_request: I/O error, dev vdb, sector 0
[ 2842.932506] blk_update_request: I/O error, dev vdb, sector 524722736
[ 2842.932577] Aborting journal on device vdb-8.
[ 2842.933882] EXT4-fs error (device vdb): ext4_journal_check_start:56:
Detected aborted journal
[ 2842.934009] EXT4-fs (vdb): Remounting filesystem read-only

On the host's gluster logs:
mount log:
[2017-07-19 07:42:47.219501] W [MSGID: 114031]
[client-rpc-fops.c:2938:client3_3_lookup_cbk] 0-glusterlocal1-client-0: remote
operation failed. Path: /.shard/c37d9820-0fd8-4d8e-af67-a9a54e5a99af.843
(00000000-0000-0000-0000-000000000000) [No data available]
[2017-07-19 07:42:47.219587] E [MSGID: 133010]
[shard.c:1725:shard_common_lookup_shards_cbk] 0-glusterlocal1-shard: Lookup on
shard 843 failed. Base file gfid = c37d9820-0fd8-4d8e-af67-a9a54e5a99af [No
data available]

brick log:
[2017-07-19 07:42:16.094979] E [MSGID: 113020] [posix.c:1361:posix_mknod]
0-glusterlocal1-posix: setting gfid on
/rhgs/bricks/gv1/.shard/c37d9820-0fd8-4d8e-af67-a9a54e5a99af.842 failed
[2017-07-19 07:42:47.218982] E [MSGID: 113002] [posix.c:253:posix_lookup]
0-glusterlocal1-posix: buf->ia_gfid is null for
/rhgs/bricks/gv1/.shard/c37d9820-0fd8-4d8e-af67-a9a54e5a99af.843 [No data
available]

gluster vol info:
Volume Name: glusterlocal1
Type: Distribute
Volume ID: 3b0d4b90-10a4-4a91-80c1-27d051daf731
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/rhgs/bricks/gv1
Options Reconfigured:
performance.strict-o-direct: on
storage.owner-gid: 107
storage.owner-uid: 107
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: off
performance.low-prio-threads: 32
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on

Version-Release number of selected component (if applicable):

How reproducible:
Always

Steps to Reproduce:
1. Create an image on gluster volume mount point using qemu-img create qemu-img
create -f qcow2  -o preallocation=off /mnt/glusterlocal1/vm1boot.img 500G
2. Start the VM with additional device as image created in Step 1
3. Install MariaDB and sysbench. Configure database to be on filesystem using
device in Step 1
4. Run sysbench prepare

Actual results:
Fails to create data.

--- Additional comment from Worker Ant on 2017-07-19 09:35:03 EDT ---

REVIEW: https://review.gluster.org/17821 (storage/posix: Use the ret value of
posix_gfid_heal()) posted (#2) for review on master by Krutika Dhananjay
(kdhananj at redhat.com)

--- Additional comment from Worker Ant on 2017-07-24 04:41:50 EDT ---

REVIEW: https://review.gluster.org/17821 (storage/posix: Use the ret value of
posix_gfid_heal()) posted (#3) for review on master by Krutika Dhananjay
(kdhananj at redhat.com)

--- Additional comment from Worker Ant on 2017-07-24 21:15:56 EDT ---

COMMIT: https://review.gluster.org/17821 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com) 
------
commit 669868d23eaeba42809fca7be134137c607d64ed
Author: Krutika Dhananjay <kdhananj at redhat.com>
Date:   Wed Jul 19 16:14:59 2017 +0530

    storage/posix: Use the ret value of posix_gfid_heal()

    ... to make the change in commit acf8cfdf truly useful.

    Without this, a race between entry creation fops and lookup
    at posix layer can cause lookups to fail with ENODATA, as
    opposed to ENOENT.

    Change-Id: I44a226872283a25f1f4812f03f68921c5eb335bb
    BUG: 1472758
    Signed-off-by: Krutika Dhananjay <kdhananj at redhat.com>
    Reviewed-on: https://review.gluster.org/17821
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>

--- Additional comment from Johan Bernhardsson on 2017-07-31 04:34:40 EDT ---

This also seem to affect disperse volumes.  Attaching logs from when we try to
copy an image with qemu-img from one storage to another.

We get this on vm storage configured like this:
Volume Name: fs02
Type: Disperse
Volume ID: 7f3d96e7-8d1e-48b8-bad0-dc5b3de13b38
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: vbgsan01:/gluster/fs02/fs02
Brick2: vbgsan02:/gluster/fs02/fs02
Brick3: vbgsan03:/gluster/fs02/fs02

from one brick log:
[2017-07-30 15:29:43.815362] E [MSGID: 113002] [posix.c:266:posix_lookup]
0-fs02-posix: buf->ia_gfid is null for
/gluster/fs02/fs02/.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5079 [No data
available]
[2017-07-30 15:29:43.815527] E [MSGID: 115050]
[server-rpc-fops.c:156:server_lookup_cbk] 0-fs02-server: 400238: LOOKUP
/.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5079
(be318638-e8a0-4c6d-977d-7a937aa84806/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5079)
==> (No data available) 
[No data available]
[2017-07-30 15:29:46.837927] W [MSGID: 113096]
[posix-handle.c:761:posix_handle_hard] 0-fs02-posix: link
/gluster/fs02/fs02/.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5180 ->
/gluster/fs02/fs02/.glusterfs/1e/2e/1e2e59bb-f8e3-41d1-93c6-db795bbc96d4failed 
[File exists]
[2017-07-30 15:29:46.837962] E [MSGID: 113020] [posix.c:1402:posix_mknod]
0-fs02-posix: setting gfid on
/gluster/fs02/fs02/.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5180 failed
[2017-07-30 15:29:51.418415] E [MSGID: 113002] [posix.c:266:posix_lookup]
0-fs02-posix: buf->ia_gfid is null for
/gluster/fs02/fs02/.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5315 [No data
available]
[2017-07-30 15:29:51.418471] E [MSGID: 115050]
[server-rpc-fops.c:156:server_lookup_cbk] 0-fs02-server: 414456: LOOKUP
/.shard/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5315
(be318638-e8a0-4c6d-977d-7a937aa84806/fea7c5e9-f1a7-4684-bd5c-08383af3a2fb.5315)
==> (No data available) 
[No data available]

>From mount log:
[2017-07-30 13:09:35.041843] I [MSGID: 109066] [dht-rename.c:1569:dht_rename]
0-fs02-dht: renaming
/0924ff77-ef51-435b-b90d-50bfbf2e8de7/images/cab4e8d0-a82a-4048-b0e8-a9c1bd2e38bf/797faedf-b7e2-4b3c-bff6-82264efa11f5.meta.new
(hash=fs02-disperse-0/cache=fs02-disperse-0) =>

/0924ff77-ef51-435b-b90d-50bfbf2e8de7/images/cab4e8d0-a82a-4048-b0e8-a9c1bd2e38bf/797faedf-b7e2-4b3c-bff6-82264efa11f5.meta
(hash=fs02-disperse-0/cache=fs02-disperse-0)
[2017-07-30 13:09:35.841507] I [MSGID: 109066] [dht-rename.c:1569:dht_rename]
0-fs02-dht: renaming
/0924ff77-ef51-435b-b90d-50bfbf2e8de7/images/cab4e8d0-a82a-4048-b0e8-a9c1bd2e38bf/4c3cc823-e977-4fa1-b233-718c445d632e.meta.new
(hash=fs02-disperse-0/cache=fs02-disperse-0) =>

/0924ff77-ef51-435b-b90d-50bfbf2e8de7/images/cab4e8d0-a82a-4048-b0e8-a9c1bd2e38bf/4c3cc823-e977-4fa1-b233-718c445d632e.meta
(hash=fs02-disperse-0/cache=<nul>)
[2017-07-30 13:36:57.236163] W [MSGID: 114031]
[client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-fs02-client-0: remote operation
failed. Path: /.shard/3997817a-4678-4e75-8131-438db9faca9a.1300
(00000000-0000-0000-0000-000000000000) [No data available]
[2017-07-30 13:36:57.237191] W [MSGID: 114031]
[client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-fs02-client-2: remote operation
failed. Path: /.shard/3997817a-4678-4e75-8131-438db9faca9a.1300
(00000000-0000-0000-0000-000000000000) [No data available]
[2017-07-30 13:36:57.237220] W [MSGID: 122053]
[ec-common.c:121:ec_check_status] 0-fs02-disperse-0: Operation failed on 1 of 3
subvolumes.(up=111, mask=111, remaining=000, good=101, bad=010)
[2017-07-30 13:36:57.237229] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-fs02-disperse-0: Heal failed [Invalid argument]
[2017-07-30 13:36:57.248807] W [MSGID: 114031]
[client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-fs02-client-0: remote operation
failed. Path: /.shard/3997817a-4678-4e75-8131-438db9faca9a.1301
(00000000-0000-0000-0000-000000000000) [No data available]
[2017-07-30 13:36:57.248973] W [MSGID: 114031]
[client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-fs02-client-1: remote operation
failed. Path: /.shard/3997817a-4678-4e75-8131-438db9faca9a.1301
(00000000-0000-0000-0000-000000000000) [No data available]
[2017-07-30 13:36:57.249502] W [MSGID: 122053]
[ec-common.c:121:ec_check_status] 0-fs02-disperse-0: Operation failed on 1 of 3
subvolumes.(up=111, mask=111, remaining=000, good=011, bad=100)
[2017-07-30 13:36:57.249524] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-fs02-disperse-0: Heal failed [Invalid argument]
[2017-07-30 13:36:57.249535] E [MSGID: 133010]
[shard.c:1725:shard_common_lookup_shards_cbk] 0-fs02-shard: Lookup on shard
1301 failed. Base file gfid = 3997817a-4678-4e75-8131-438db9faca9a [No data
available]
[2017-07-30 13:36:57.249585] W [fuse-bridge.c:2228:fuse_readv_cbk]
0-glusterfs-fuse: 84787: READ => -1 gfid=3997817a-4678-4e75-8131-438db9faca9a
fd=0x7f0cfa192210 (No data available)

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1458846
[Bug 1458846] [gluster] Problem moving vm from one storage domain to
another within the same datacenter
https://bugzilla.redhat.com/show_bug.cgi?id=1472757
[Bug 1472757] Running sysbench on vm disk from plain distribute gluster
volume causes disk corruption
https://bugzilla.redhat.com/show_bug.cgi?id=1472758
[Bug 1472758] Running sysbench on vm disk from plain distribute gluster
volume causes disk corruption
https://bugzilla.redhat.com/show_bug.cgi?id=1479692
[Bug 1479692] Running sysbench on vm disk from plain distribute gluster
volume causes disk corruption
https://bugzilla.redhat.com/show_bug.cgi?id=1479717
[Bug 1479717] Running sysbench on vm disk from plain distribute gluster
volume causes disk corruption
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.