[Bugs] [Bug 1784402] New: storage.reserve ignored by self-heal so that bricks are 100% full

Tue Dec 17 11:16:35 UTC 2019

https://bugzilla.redhat.com/show_bug.cgi?id=1784402

            Bug ID: 1784402
           Summary: storage.reserve ignored by self-heal so that bricks
                    are 100% full
           Product: GlusterFS
           Version: 5
            Status: NEW
         Component: posix
          Assignee: bugs at gluster.org
          Reporter: david.spisla at iternity.com
                CC: bugs at gluster.org
  Target Milestone: ---
    Classification: Community

Created attachment 1645849
  --> https://bugzilla.redhat.com/attachment.cgi?id=1645849&action=edit
Gluster vo info and status, df -hT, heal info, logs of glfsheal and all related
bricks

Description of problem:
Setup: 3-Node VMWare Cluster (2 Storage Nodes and 1 Arbiter Node),
Distribute-Replica 2 Volume with 1 Arbiter brick per Replica-Tupel (see
attached file for the detail configuration).

Version-Release number of selected component (if applicable):
Gluster FS v5.10

How reproducible:
Steps to Reproduce:
1. Mount volume from a dedicated client machine
2. Disable network of node 2
3. Write to node 1 in the volume until it is full. The storage.reserve limit of
the local bricks should take effect and the bricks should therefore be +-1%
empty.
4. Disable network of node 1
5. Enable network of node 2
6. Write to node 2 in the same volume, but write the data into another
subfolder or use completely different data. Otherwise one would get an
Split-brain error which is not the issue here. Also write data until the bricks
reaches the storage.reserve limit.
7. Now the volume is filled up with twice the amount of data
8. Enable network of node 1

Actual results:
storage.reserve was ignored and all bricks are 100% full within a few seconds.
All brick processes died. Volume not mountable and can not trigger heal.

Expected results:
self-heal process should be blocked by storage.reserve and brick processes
still running and volume is accessible.

Additional info:
See attached file

The above scenario was not only reproduced on a VM Cluster. We could also
monitor it on a real HW Cluster

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.