[Bugs] [Bug 1339246] New: High IO/load causes VMs to enter Paused state

Tue May 24 13:10:22 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1339246

            Bug ID: 1339246
           Summary: High IO/load causes VMs to enter Paused state
           Product: GlusterFS
           Version: 3.7.11
         Component: sharding
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: Sustugriel at gmail.com
        QA Contact: bugs at gluster.org
                CC: bugs at gluster.org

Description of problem:
Taking a backup image of a running VM's disk consistently causes the VM to
enter a paused state. It can then be resumed with no issues.

This problem has started since the enabling of the sharding translator, which
has led to drastically faster heal times.

Version-Release number of selected component (if applicable): 3.7.11-1

How reproducible: 50-100%. It's intermittent, sometimes the machines will
pause, other times they won't. Does not seem to be related to disk size.

Steps to Reproduce:
1. Create and install oVirt environment using GlusterFS as storage in
Distributed Replicate platform. 
2. Use default volume options, except enabling the sharding translator.
3. Create a Windows Server 2012 R2 VM, take a backup image using a VSS capture
utility like BackupExec, Acronis, Windows Server Backup, etc.

Actual results:
Machines will pause seconds after the backup has started. Hosts did not go
down, bricks did not go down. They can be resumed immediately which is
successful.

Expected results:
Machines should not pause.

Additional info:
Distributed replicate volume.
Number of bricks: 6
Replica Count: 3

Volume options:
cluster.self-heal-window-size: 256
cluster.data-self-heal-algorithm: full
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
server.allow-insecure: on
storage.owner-gid: 36
network.ping-timeout: 10
features.shard-block-size: 512MB
features.shard: on

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.