[Bugs] [Bug 1266877] New: Possible memory leak during rebalance with large quantity of files

Mon Sep 28 10:49:45 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1266877

            Bug ID: 1266877
           Summary: Possible memory leak during rebalance with large
                    quantity of files
           Product: GlusterFS
           Version: mainline
         Component: distribute
          Keywords: Triaged
          Severity: high
          Priority: urgent
          Assignee: bugs at gluster.org
          Reporter: spalai at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com,
                    max at injapan.ru, nbalacha at redhat.com,
                    rgowdapp at redhat.com, rkavunga at redhat.com,
                    spalai at redhat.com
        Depends On: 1261234

+++ This bug was initially created as a clone of Bug #1261234 +++

Description of problem:
Gluster distributed volume with 4 bricks fails to rebalance due to memory
exhaustion.

I have a gluster distributed volume with 4 bricks on one physical server (this
seems strange but there are reasons for this). Bricks are formatted with ext4.
Volume spans 57T of storage space and currently contains ~2.5T in 30M files,
mostly located on brick 1. Rebalance fix-layout completed successfully, but
main rebalance fails to complete as server runs out of memory.

I've tried running
echo 2 > /proc/sys/vm/drop_caches

After approximately 24hrs server starts thrashing.

Version-Release number of selected component (if applicable):
glusterfs 3.7.3 built on Jul 28 2015 14:28:57

How reproducible:
Always

Steps to Reproduce:
1. Start rebalance
2. Wait ~24hrs

Actual results:
Server starts thrashing due to memory exhaustion.

Expected results:
Memory occupated by gluster remains relatively constant.

--- Additional comment from Susant Kumar Palai on 2015-09-16 14:26:12 MVT ---

Hi Max,
Can you share rebalance logs? What was the mem-usage of rebalance process when
it was OOM killed?

--- Additional comment from Max Gashkov on 2015-09-16 14:31:52 MVT ---

Hi,

Rebalance log is rather large (about 600M), I can grep for specific strings if
needed or share whole file privately (please indicate method for contacting you
directly).

OOM didn't kill the process, I did. It was around 2G RES at the time and with
the other glusterfsd processes it started swapping to the point when system
became unstable.

--- Additional comment from Susant Kumar Palai on 2015-09-16 14:35:15 MVT ---

(In reply to Max Gashkov from comment #2)
> Hi,
> 
> Rebalance log is rather large (about 600M), I can grep for specific strings
> if needed or share whole file privately (please indicate method for
> contacting you directly).
Can you grep for Error messages in rebalance log and update?
For contact:On IRC [#gluster]  nick: [spalai]
> 
> OOM didn't kill the process, I did. It was around 2G RES at the time and
> with the other glusterfsd processes it started swapping to the point when
> system became unstable.

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1261234
[Bug 1261234] Possible memory leak during rebalance with large quantity of
files
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.