[Gluster-devel] finding and fixing memory leaks in xlators

Niels de Vos ndevos at redhat.com
Tue Apr 25 09:42:21 UTC 2017


On Tue, Apr 25, 2017 at 02:55:30PM +0530, Amar Tumballi wrote:
> Thanks for this detailed Email with clear instructions Niels.
> 
> Can we have this as github issues as well? I don't want this information to
> be lost as part of email thread in ML archive. I would like to see if I can
> get some interns to work on these leaks per component as part of their
> college project etc.

This is already reported as a GitHub issue as
https://github.com/gluster/glusterfs/issues/176 . Fixes for the leaks in
the different xlators will need a bug report in Bugzilla. For now I have
been creating new bugs per xlator, and added BZ 1425623 in the "blocks"
field of the xlator bugs.

Many fixes will be part of glusterfs-3.11. At the moment I am focussing
mostly on the Gluster core (libglusterfs) leaks. Leaks in the xlators
tend to be a little easier to fix, and would be more suitable for junior
engineers. I'm happy to explain more details if needed :)

Thanks!
Niels


> 
> -Amar
> 
> On Tue, Apr 25, 2017 at 2:29 PM, Niels de Vos <ndevos at redhat.com> wrote:
> 
> > Hi,
> >
> > with the use of gfapi it has become clear that Gluster was never really
> > developed to be loaded in applications. There are many different memory
> > leaks that get exposed through the usage of gfapi. Before, memory leaks
> > were mostly cleaned up automatically because processes would exit. Now,
> > there are applications that initialize a GlusterFS client to access a
> > volume, and de-init that client once not needed anymore. Unfortunately
> > upon the de-init, not all allocated memory is free'd again. These are
> > often just a few bytes, but for long running processes this can become a
> > real problem.
> >
> > Finding the memory leaks in xlators has been a tricky thing. Valgrind
> > would often not know what function/source the allocation did, and fixing
> > the leak would become a real hunt. There have been some patches merged
> > that make Valgrind work more easily, and a few patches still need a
> > little more review before everything is available. The document below
> > describes how to use a new "sink" xlator to detect memory leaks. This
> > xlator can only be merged when a change for the graph initialization is
> > included too:
> >
> >  - https://review.gluster.org/16796 - glusterfs_graph_prepare
> >  - https://review.gluster.org/16806 - sink xlator and developer doc
> >
> > It would be most welcome if other developers can reviwe the linked
> > changes, so that everyone can easily debug memory leaks from thair
> > favorite xlators.
> >
> > There is a "run-xlator.sh" script in my (for now) personal
> > "gluster-debug" repository that can be used to load an arbitrary xlator
> > along with "sink". See
> > https://github.com/nixpanic/gluster-debug/tree/master/gfapi-load-volfile
> > for more details.
> >
> > Thanks,
> > Niels
> >
> >
> > From doc/developer-guide/identifying-resource-leaks.md:
> >
> > # Identifying Resource Leaks
> >
> > Like most other pieces of software, GlusterFS is not perfect in how it
> > manages
> > its resources like memory, threads and the like. Gluster developers try
> > hard to
> > prevent leaking resources but releasing and unallocating the used
> > structures.
> > Unfortunately every now and then some resource leaks are unintentionally
> > added.
> >
> > This document tries to explain a few helpful tricks to identify resource
> > leaks
> > so that they can be addressed.
> >
> >
> > ## Debug Builds
> >
> > There are certain techniques used in GlusterFS that make it difficult to
> > use
> > tools like Valgrind for memory leak detection. There are some build options
> > that make it more practical to use Valgrind and other tools. When running
> > Valgrind, it is important to have GlusterFS builds that contain the
> > debuginfo/symbols. Some distributions (try to) strip the debuginfo to get
> > smaller executables. Fedora and RHEL based distributions have sub-packages
> > called ...-debuginfo that need to be installed for symbol resolving.
> >
> >
> > ### Memory Pools
> >
> > By using memory pools, there are no allocation/freeing of single structures
> > needed. This improves performance, but also makes it impossible to track
> > the
> > allocation and freeing of srtuctures.
> >
> > It is possible to disable the use of memory pools, and use standard
> > `malloc()`
> > and `free()` functions provided by the C library. Valgrind is then able to
> > track the allocated areas and verify if they have been free'd. In order to
> > disable memory pools, the Gluster sources needs to be configured with the
> > `--enable-debug` option:
> >
> > ```shell
> > ./configure --enable-debug
> > ```
> >
> > When building RPMs, the `.spec` handles the `--with=debug` option too:
> >
> > ```shell
> > make dist
> > rpmbuild -ta --with=debug glusterfs-....tar.gz
> > ```
> >
> > ### Dynamically Loaded xlators
> >
> > Valgrind tracks the call chain of functions that do memory allocations. The
> > addresses of the functions are stored and before Valgrind exits the
> > addresses
> > are resolved into human readable function names and offsets (line numbers
> > in
> > source files). Because Gluster loads xlators dynamically, and unloads then
> > before exiting, Valgrind is not able to resolve the function addresses into
> > symbols anymore. Whenever this happend, Valgrind shows `???` in the output,
> > like
> >
> > ```
> >   ==25170== 344 bytes in 1 blocks are definitely lost in loss record 233
> > of 324
> >   ==25170==    at 0x4C29975: calloc (vg_replace_malloc.c:711)
> >   ==25170==    by 0x52C7C0B: __gf_calloc (mem-pool.c:117)
> >   ==25170==    by 0x12B0638A: ???
> >   ==25170==    by 0x528FCE6: __xlator_init (xlator.c:472)
> >   ==25170==    by 0x528FE16: xlator_init (xlator.c:498)
> >   ...
> > ```
> >
> > These `???` can be prevented by not calling `dlclose()` for unloading the
> > xlator. This will cause a small leak of the handle that was returned with
> > `dlopen()`, but for improved debugging this can be acceptible. For this and
> > other Valgrind features, a `--enable-valgrind` option is available to
> > `./configure`. When GlusterFS is built with this option, Valgrind will be
> > able
> > to resolve the symbol names of the functions that do memory allocations
> > inside
> > xlators.
> >
> > ```shell
> > ./configure --enable-valgrind
> > ```
> >
> > When building RPMs, the `.spec` handles the `--with=valgrind` option too:
> >
> > ```shell
> > make dist
> > rpmbuild -ta --with=valgrind glusterfs-....tar.gz
> > ```
> >
> > ## Running Valgrind against a single xlator
> >
> > Debugging a single xlator is not trivial. But there are some tools to make
> > it
> > easier. The `sink` xlator does not do any memory allocations itself, but
> > contains just enough functionality to mount a volume with only the `sink`
> > xlator. There is a little gfapi application under `tests/basic/gfapi/` in
> > the
> > GlusterFS sources that can be used to run only gfapi and the core GlusterFS
> > infrastructure with the `sink` xlator. By extending the `.vol` file to load
> > more xlators, each xlator can be debugged pretty much separately (as long
> > as
> > the xlators have no dependencies on each other). A basic Valgrind run with
> > the
> > suitable configure options looks like this:
> >
> > ```shell
> > ./autogen.sh
> > ./configure --enable-debug --enable-valgrind
> > make && make install
> > cd tests/basic/gfapi/
> > make gfapi-load-volfile
> > valgrind ./gfapi-load-volfile sink.vol
> > ```
> >
> > Combined with other very useful options to Valgrind, the following
> > execution
> > shows many more useful details:
> >
> > ```shell
> > valgrind \
> >         --fullpath-after= --leak-check=full --show-leak-kinds=all \
> >         ./gfapi-load-volfile sink.vol
> > ```
> >
> > Note that the `--fullpath-after=` option is left empty, this makes Valgrind
> > print the full path and filename that contains the functions:
> >
> > ```
> > ==2450== 80 bytes in 1 blocks are definitely lost in loss record 8 of 60
> > ==2450==    at 0x4C29975: calloc (/builddir/build/BUILD/
> > valgrind-3.11.0/coregrind/m_replacemalloc/vg_replace_malloc.c:711)
> > ==2450==    by 0x52C6F73: __gf_calloc (/usr/src/debug/glusterfs-3.
> > 11dev/libglusterfs/src/mem-pool.c:117)
> > ==2450==    by 0x12F10CDA: init (/usr/src/debug/glusterfs-3.
> > 11dev/xlators/meta/src/meta.c:231)
> > ==2450==    by 0x528EFD5: __xlator_init (/usr/src/debug/glusterfs-3.
> > 11dev/libglusterfs/src/xlator.c:472)
> > ==2450==    by 0x528F105: xlator_init (/usr/src/debug/glusterfs-3.
> > 11dev/libglusterfs/src/xlator.c:498)
> > ==2450==    by 0x52D9D8B: glusterfs_graph_init (/usr/src/debug/glusterfs-3.
> > 11dev/libglusterfs/src/graph.c:321)
> > ...
> > ```
> >
> > In the above example, the `init` function in `xlators/meta/src/meta.c`
> > does a
> > memory allocation on line 231. This memory is never free'd again, and hence
> > Valgrind logs this call stack. When looking in the code, it seems that the
> > allocation of `priv` is assigned to the `this->private` member of the
> > `xlator_t` structure. Because the allocation is done in `init()`, free'ing
> > is
> > expected to happen in `fini()`. Both functions are shown below, with the
> > inclusion of the empty `fini()`:
> >
> >
> > ```
> > 226 int
> > 227 init (xlator_t *this)
> > 228 {
> > 229         meta_priv_t *priv = NULL;
> > 230
> > 231         priv = GF_CALLOC (sizeof(*priv), 1, gf_meta_mt_priv_t);
> > 232         if (!priv)
> > 233                 return -1;
> > 234
> > 235         GF_OPTION_INIT ("meta-dir-name", priv->meta_dir_name, str,
> > out);
> > 236
> > 237         this->private = priv;
> > 238 out:
> > 239         return 0;
> > 240 }
> > 241
> > 242
> > 243 int
> > 244 fini (xlator_t *this)
> > 245 {
> > 246         return 0;
> > 247 }
> > ```
> >
> > In this case, the resource leak can be addressed by adding a single line
> > to the
> > `fini()` function:
> >
> > ```
> > 243 int
> > 244 fini (xlator_t *this)
> > 245 {
> > 246         GF_FREE (this->private);
> > 247         return 0;
> > 248 }
> > ```
> >
> > Running the same Valgrind command and comparing the output will show that
> > the
> > memory leak in `xlators/meta/src/meta.c:init` is not reported anymore.
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-devel
> >
> 
> 
> 
> -- 
> Amar Tumballi (amarts)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170425/fc7d7bc5/attachment.sig>


More information about the Gluster-devel mailing list