[Bugs] [Bug 1373630] New: Unable to profile GlusterFS FUSE client with Valgrind' s Massif tool
bugzilla at redhat.com
bugzilla at redhat.com
Tue Sep 6 19:28:32 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1373630
Bug ID: 1373630
Summary: Unable to profile GlusterFS FUSE client with
Valgrind's Massif tool
Product: GlusterFS
Version: 3.7.15
Component: fuse
Severity: low
Assignee: bugs at gluster.org
Reporter: oleksandr at natalenko.name
CC: bugs at gluster.org
Description of problem:
In order to find out why GlusterFS FUSE client leaks I would like to use
Valgrind's Massif tool (because Memcheck does not show any reasonable leaks).
So, I install GlusterFS packages + debug packages and run the following:
===
valgrind --tool=massif --smc-check=all --trace-children=yes
--sim-hints=fuse-compatible /usr/sbin/glusterfs -N
--volfile-server=glusterfs.example.com --volfile-id=some_volume
/mnt/net/glusterfs/test
===
This command produces instant output:
===
==25482== Massif, a heap profiler
==25482== Copyright (C) 2003-2013, and GNU GPL'd, by Nicholas Nethercote
==25482== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==25482== Command: /usr/sbin/glusterfs -N --volfile-server=glusterfs.la.net.ua
--volfile-id=mail_boxes /mnt/net/glusterfs/test
==25482==
==25483==
==25484==
===
Immediately after this I also get 2 files generated by Valgrind:
===
-rw------- 1 root root 20K вер 6 22:17 massif.out.25483
-rw------- 1 root root 9.1K вер 6 22:17 massif.out.25484
===
(both files are attached).
Then I start to manipulate files within mounted volume provoking memory to
leak. After dancing around and assuming I see memory leaking in top/htop
output, I finally decide to unmount volume to get my memory profile:
===
umount /mnt/net/glusterfs/test
===
Right after this command is executed, Valgrind shows me the following:
===
valgrind: m_mallocfree.c:304 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi'
failed. valgrind: Heap block
lo/hi size mismatch: lo = 1, hi = 0.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata. If you fix any
invalid writes reported by Memcheck, this assertion failure will
probably go away. Please try that before reporting this as a bug.
host stacktrace:
==25482== at 0x3802FC56: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482== by 0x3802FD84: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482== by 0x3802FF06: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482== by 0x3803D5E1: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482== by 0x3807F6C5: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482== by 0x380349EF: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482== by 0x38034D53: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482== by 0x3808E2D4: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482== by 0x3808E55A: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482== by 0x380B5B0D: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482== by 0xDEADBEEFDEADBEEE: ???
==25482== by 0xDEADBEEFDEADBEEE: ???
==25482== by 0xDEADBEEFDEADBEEE: ???
sched status:
running_tid=3
Thread 3: status = VgTs_Runnable
==25482== at 0x4C29037: free (in
/usr/lib64/valgrind/vgpreload_massif-amd64-linux.so)
==25482== by 0x67CE63B: __libc_freeres (in /usr/lib64/libc-2.17.so)
==25482== by 0x4A246B4: _vgnU_freeres (in
/usr/lib64/valgrind/vgpreload_core-amd64-linux.so)
==25482== by 0x66A2E2A: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==25482== by 0x66A2EB4: exit (in /usr/lib64/libc-2.17.so)
==25482== by 0x1117E9: cleanup_and_exit (glusterfsd.c:1308)
==25482== by 0x111914: glusterfs_sigwaiter (glusterfsd.c:2029)
==25482== by 0x606DDC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==25482== by 0x6760CEC: clone (in /usr/lib64/libc-2.17.so)
Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.
If that doesn't help, please report this bug to: www.valgrind.org
In the bug report, send all the above text, the valgrind
version, and what OS and version you are using. Thanks.
===
I clearly see some crap happening within 25482 PID. Okay, lets check Valgrind's
output:
===
-rw------- 1 root root 20K вер 6 22:17 massif.out.25483
-rw------- 1 root root 9.1K вер 6 22:17 massif.out.25484
===
No changes! Files didn't get updated, and output for misbehaving PID 25482 did
not appear!
I see 0xDEADBEEFDEADBEEE pattern in Valgrind's output, and that means some
memory gets corrupted.
Okay, let's re-run Valgrind with Memcheck tool, because this is what output
above suggests:
===
valgrind --leak-check=full --show-leak-kinds=all --log-file="valgrind_fuse.log"
/usr/sbin/glusterfs -N --volfile-server=glusterfs.example.com
--volfile-id=some_volume /mnt/net/glusterfs/test
===
valgrind_fuse.log is attached as well. I've noticed there the following
warnings/errors for main PID:
===
==26441== Thread 7:
==26441== Syscall param writev(vector[...]) points to uninitialised byte(s)
==26441== at 0x675FEA0: writev (in /usr/lib64/libc-2.17.so)
==26441== by 0xE664795: send_fuse_iov (fuse-bridge.c:158)
==26441== by 0xE6649B9: send_fuse_data (fuse-bridge.c:197)
==26441== by 0xE666F7A: fuse_attr_cbk (fuse-bridge.c:753)
==26441== by 0xE6671A6: fuse_root_lookup_cbk (fuse-bridge.c:783)
==26441== by 0x1451A937: io_stats_lookup_cbk (io-stats.c:1512)
==26441== by 0x14301B3E: mdc_lookup_cbk (md-cache.c:867)
==26441== by 0x13EEA226: qr_lookup_cbk (quick-read.c:446)
==26441== by 0x13CD9B66: ioc_lookup_cbk (io-cache.c:260)
==26441== by 0x1346515D: dht_revalidate_cbk (dht-common.c:985)
==26441== by 0x1320F0F0: afr_discover_done (afr-common.c:2429)
==26441== by 0x1320F0F0: afr_discover_cbk (afr-common.c:2474)
==26441== by 0x12F9B6F8: client3_3_lookup_cbk (client-rpc-fops.c:2988)
==26441== Address 0x168b538c is on thread 7's stack
==26441== in frame #3, created by fuse_attr_cbk (fuse-bridge.c:723)
==26441==
==26441== Warning: invalid file descriptor -1 in syscall close()
==26441== Thread 3:
==26441== Invalid free() / delete / delete[] / realloc()
==26441== at 0x4C2AD17: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==26441== by 0x67D663B: __libc_freeres (in /usr/lib64/libc-2.17.so)
==26441== by 0x4A246B4: _vgnU_freeres (in
/usr/lib64/valgrind/vgpreload_core-amd64-linux.so)
==26441== by 0x66AAE2A: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==26441== by 0x66AAEB4: exit (in /usr/lib64/libc-2.17.so)
==26441== by 0x1117E9: cleanup_and_exit (glusterfsd.c:1308)
==26441== by 0x111914: glusterfs_sigwaiter (glusterfsd.c:2029)
==26441== by 0x6075DC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==26441== by 0x6768CEC: clone (in /usr/lib64/libc-2.17.so)
==26441== Address 0x6a2d3d0 is 0 bytes inside data symbol "noai6ai_cached"
===
Could this be the reason for Massif to fail?
Version-Release number of selected component (if applicable):
GlusterFS 3.7.15, CentOS 7.2.
How reproducible:
Always.
Steps to Reproduce:
See above.
Actual results:
Massif tool does not provide reasonable output.
Expected results:
I want my memory to be profiled.
Additional info:
Feel free to ask me for any additional info.
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list