[Bugs] [Bug 1532868] New: gluster upgrade causes vm disk errors

Tue Jan 9 22:56:11 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1532868

            Bug ID: 1532868
           Summary: gluster upgrade causes vm disk errors
           Product: GlusterFS
           Version: 3.12
         Component: fuse
          Severity: medium
          Assignee: bugs at gluster.org
          Reporter: bugs at josemaas.net
                CC: bugs at gluster.org

Description of problem:
We have a gluster replica x3 that is used for vm storage. After upgrading from
3.10 to 3.12 ovirt started logging "VM <vm name> has been paused due to unknown
storage error." and pausing the effected vms. In order to upgrade I used the
gluster upgrade guide. 

After looking into it some more I found the following lines in our mount log.
To me this seems that there is an issue with sharding. Note copying the vm
image and using the new one seems to fix the problem

[2018-01-09 15:45:47.098380] E [shard.c:426:shard_modify_size_and_block_count]
(-->/usr/lib64/glusterfs/3.12.3/xlator/cluster/distribute.so(+0x6abed)
[0x7f7241b3dbed]
-->/usr/lib64/glusterfs/3.12.3/xlator/features/shard.so(+0xbafe)
[0x7f72418bbafe]
-->/usr/lib64/glusterfs/3.12.3/xlator/features/shard.so(+0xb35b)
[0x7f72418bb35b] ) 0-tortoise-shard: Failed to get
trusted.glusterfs.shard.file-size for 6493ab88-f4a8-4696-a52e-6425a595fc80
[2018-01-09 15:45:47.098419] W [fuse-bridge.c:874:fuse_attr_cbk]
0-glusterfs-fuse: 4369015: STAT()
/aee19709-5859-4e48-b761-c4f8a140ea61/images/855cfe92-22ad-4f40-94f8-2547fa5e0f8e/e29713bf-557a-4943-a7c5-c29edc141c01
=> -1 (Invalid argument)

Version-Release number of selected component (if applicable):
centos-release-gluster310.noarch   1.0-1.el7.centos           @extras           
centos-release-gluster312.noarch   1.0-1.el7.centos           @extras           
glusterfs.x86_64                   3.12.3-1.el7              
@centos-gluster312
glusterfs-api.x86_64               3.12.3-1.el7              
@centos-gluster312
glusterfs-cli.x86_64               3.12.3-1.el7              
@centos-gluster312
glusterfs-client-xlators.x86_64    3.12.3-1.el7              
@centos-gluster312
glusterfs-fuse.x86_64              3.12.3-1.el7              
@centos-gluster312
glusterfs-gnfs.x86_64              3.12.3-1.el7              
@centos-gluster312
glusterfs-libs.x86_64              3.12.3-1.el7              
@centos-gluster312
glusterfs-server.x86_64            3.12.3-1.el7              
@centos-gluster312
libntirpc.x86_64                   1.5.3-1.el7               
@centos-gluster312
nfs-ganesha.x86_64                 2.5.3-1.el6               
@centos-gluster312
nfs-ganesha-gluster.x86_64         2.5.3-1.el6               
@centos-gluster312
userspace-rcu.x86_64               0.10.0-3.el7              
@centos-gluster312

previous version:
Dec 06 03:50:42 Updated: glusterfs-libs.x86_64 3.10.8-1.el7
Dec 06 03:50:42 Updated: glusterfs.x86_64 3.10.8-1.el7
Dec 06 03:50:42 Updated: glusterfs-client-xlators.x86_64 3.10.8-1.el7
Dec 06 03:50:42 Updated: glusterfs-api.x86_64 3.10.8-1.el7
Dec 06 03:50:42 Updated: glusterfs-fuse.x86_64 3.10.8-1.el7
Dec 06 03:50:42 Updated: glusterfs-cli.x86_64 3.10.8-1.el7
Dec 06 03:50:45 Updated: glusterfs-server.x86_64 3.10.8-1.el7
Dec 06 03:50:45 Updated: glusterfs-ganesha.x86_64 3.10.8-1.el

How reproducible:
Happened on multiple vm images. I do not have another cluster to try it on

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Additional info:
Volume Name: tortoise
Type: Replicate
Volume ID: d4c00537-f1e8-4c43-b21d-90c9a6c5dee9
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: stor1.example.com:/tort/brick
Brick2: stor2.example.com:/tort/brick
Brick3: stor3.example.com:/tort/brick (arbiter)
Options Reconfigured:
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.eager-lock: enable
performance.low-prio-threads: 32
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
auth.allow: *
network.ping-timeout: 10
cluster.quorum-type: auto
server.allow-insecure: on
storage.owner-gid: 36
storage.owner-uid: 36
performance.strict-o-direct: enable
network.remote-dio: enable
transport.address-family: inet
nfs.disable: on
features.shard-block-size: 256MB
cluster.enable-shared-storage: disable

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.