[Gluster-devel] Need some advices regarding glusterd memory leak upto 120GB

Wed Nov 12 13:48:44 UTC 2014

That's a really huge leak.

>From my experience a  'gluster volume status <volname> fd/inode'
commands that could lead to huge rpc responses, when done on a volume
with many files being accessed.

I also vaguely remember having some problems with memory due to rpc
saved_frames. IIRC the saved_frames continued to grow due some
connection problems (or something) which lead to a large number of
concurrent rpc requests being active.

cc'ing krishnan to see if he remembers anything more.

On Wed, Nov 12, 2014 at 6:56 PM, Jaden Liang <jaden1q84 at gmail.com> wrote:
> Hi all,
>
> I am running gluster-3.4.5 on 2 servers. Each of them has 7 2TB HDDs to
> build a 7 * 2 distributed + replicated volume.
> I just notice that the glusterd consume about 120GB memory and get a
> coredump today. I read the mempool code try to identify which mempool eat
> the memory. Unfortunetly, the glusterd did not run with --mem-accounting.
> Now I just have a coredump file to debug... Anyway, I read some codes about
> mem_pool try to identify which mem_pool consumes such large memory. Here is
> the result:
>
> I wrote a gdb script to print out the glusterfsd_ctx->mempool_list:
>
> # script of gdb to print out all none-zero mem_pool
> set $head = &glusterfsd_ctx->mempool_list
> set $offset = (unsigned long)(&((struct mem_pool*)0)->global_list)
> set $pos = (struct mem_pool*)((unsigned long)($head->next) - $offset)
> set $memsum = 0
> while ( &$pos->global_list != $head)
> if ($pos->hot_count + $pos->curr_stdalloc)
> p *$pos
> set $thismempoolsize = ($pos->hot_count + $pos->curr_stdalloc) *
> $pos->padded_sizeof_type
> # This is the single mem_pool memory consume
> p $pos->name
> p $thismempoolsize
> set $memsum += $thismempoolsize
> end
> set $pos = (struct mem_pool*)((unsigned long)($pos->global_list.next) -
> $offset)
> end
> echo "Total mem used\n"
> p $memsum
>
> Then I got this output:
>
> (gdb) source gdb_show_mempool_list.gdb
> $459 = {list = {next = 0x1625a50, prev = 0x1625a50}, hot_count = 64,
> cold_count = 0, lock = 1, padded_sizeof_type = 6116, pool = 0x7ff2c9f94010,
> pool_end = 0x7ff2c9ff3910, real_sizeof_type = 6088,
>   alloc_count = 16919588, pool_misses = 16919096, max_alloc = 64,
> curr_stdalloc = 16824653, max_stdalloc = 16824655, name = 0x1625ad0
> "management:rpcsvc_request_t", global_list = {next = 0x16211f8,
>     prev = 0x1639368}}
> $460 = 0x1625ad0 "management:rpcsvc_request_t"
> $461 = 102899969172
> $462 = {list = {next = 0x7ff2cc0bf374, prev = 0x7ff2cc0bc2b4}, hot_count =
> 16352, cold_count = 32, lock = 1, padded_sizeof_type = 52, pool =
> 0x7ff2cc0bc010, pool_end = 0x7ff2cc18c010,
>   real_sizeof_type = 24, alloc_count = 169845909, pool_misses = 168448980,
> max_alloc = 16384, curr_stdalloc = 168231365, max_stdalloc = 168231560, name
> = 0x1621210 "glusterfs:data_t", global_list = {
>     next = 0x1621158, prev = 0x1625ab8}}
> $463 = 0x1621210 "glusterfs:data_t"
> $464 = 8748881284
> $465 = {list = {next = 0x7ff2cc18e770, prev = 0x7ff2cc18d2fc}, hot_count =
> 16350, cold_count = 34, lock = 1, padded_sizeof_type = 68, pool =
> 0x7ff2cc18d010, pool_end = 0x7ff2cc29d010,
>   real_sizeof_type = 40, alloc_count = 152853817, pool_misses = 151477891,
> max_alloc = 16384, curr_stdalloc = 151406417, max_stdalloc = 151406601, name
> = 0x1621170 "glusterfs:data_pair_t",
>   global_list = {next = 0x16210b8, prev = 0x16211f8}}
> $466 = 0x1621170 "glusterfs:data_pair_t"
> $467 = 10296748156
> $468 = {list = {next = 0x1621050, prev = 0x1621050}, hot_count = 4096,
> cold_count = 0, lock = 1, padded_sizeof_type = 140, pool = 0x7ff2cc29e010,
> pool_end = 0x7ff2cc32a010, real_sizeof_type = 112,
>   alloc_count = 16995288, pool_misses = 16986651, max_alloc = 4096,
> curr_stdalloc = 16820855, max_stdalloc = 16820882, name = 0x16210d0
> "glusterfs:dict_t", global_list = {next = 0x1621018,
>     prev = 0x1621158}}
> $469 = 0x16210d0 "glusterfs:dict_t"
> $470 = 2355493140
> "Total mem used
> "$471 = 124301091752
>
> --------------------------------------------------------------------------------------
> "management:rpcsvc_request_t" used 100G
> "glusterfs:data_t" used 8.7GB
> "glusterfs:data_pair_t" used 10GB
> "glusterfs:dict_t" use 2.3G
> Total: 124GB memory
>
> ---------------------------------------------------------------------------------------
> I assume this might happen in a lot of rpc request and not free.
> This happened several days ago, I am still trying to figure out what happen
> several days ago on my servers.
> Hope someone here might encountered this issue before, or any advices will
> be grateful!!!
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>