[Bugs] [Bug 1203739] New: Self-heal of sparse image files on 3-way replica "unsparsifies" the image

Thu Mar 19 14:59:01 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1203739

            Bug ID: 1203739
           Summary: Self-heal of sparse image files on 3-way replica
                    "unsparsifies" the image
           Product: GlusterFS
           Version: 3.6.2
         Component: replicate
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: mriedel at umaryland.edu
                CC: bugs at gluster.org, gluster-bugs at redhat.com

Description of problem:
This is similar to 1190633. We have a 3-way replica that is used to store
sparse oVirt disk images. The same servers are also the oVirt nodes.

When a machine is taken down for maintenance, the third machine's healing
process "inflates" the sparse image to full size.

In our scenario, we have 3 servers, each is an oVirt node and hosts a brick for
the replica.

In oVirt, one node is the SPM (Storage Master), and the other two are "normal."

We leave the SPM alone, but put one of the "normal" machines into maintenance.
Remove the brick from the machine being rebooted.

Then reboot the machine, and add the brick back to the volume. Oddly, it's the
*third* server that re-inflates the sparse images

Version-Release number of selected component (if applicable):
glusterfs-server-3.6.2-1.el6.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Set up 3-node replication with sparse images
2. Set one non-SPM oVirt node to maintenance
3. Remove brick from gluster volume of the server in maintenance
4. Reboot
5. Add the brick back to the volume
6. The SPM & the recently-rebooted server act correctly. The third, untouched
server, will inflate the sparse images.

Actual results:
Here's a "df" of the machines after rebooting.

SPM machine (not rebooted):
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/rootvg-ovirtlv
                      500G   12G  488G   3% /gluster/ovirt

"Normal" machine, the one that was rebooted:
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/rootvg-ovirtlv
                      500G   12G  489G   3% /gluster/ovirt

The third, untouched, "normal" machine:
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/rootvg-ovirtlv
                      500G   64G  437G  13% /gluster/ovirt

Expected results:
I expect the heal not to inflate sparse images on the third machine (or any of
them, actually).

This is definitely an issue since VMs tend to overallocate disk space.

The only work around is to move disk images between storage domains, and then
back to the original domain.

Additional info:
Here's the volume info:
gluster> volume info ovirt

Volume Name: ovirt
Type: Replicate
Volume ID: b39ed03e-0d03-40a2-acad-8384cf0c5cb4
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: server1:/gluster/ovirt/brick
Brick2: server2:/gluster/ovirt/brick
Brick3: server3:/gluster/ovirt/brick
Options Reconfigured:
cluster.data-self-heal-algorithm: diff
server.allow-insecure: on
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.