[Gluster-devel] debugging ping timeouts

Tue Mar 25 23:37:52 UTC 2014

On Fri, Mar 21, 2014 at 05:25:29AM -0400, Pranith Kumar Karampuri wrote:
> hi,
>     I do not think glusterfs at the moment could tell why 
>     a ping-timeout happened. And by the time a user learns that such 
>     an event happened, client would have disconnected and reconnected, 
>     so we can not debug the issue any more. One of the reasons why 
>     ping-timeouts may happen is because epoll thread is busy doing 
>     something, most probably waiting on a mutex lock. So I am thinking 
>     may be we should add some extra information before and after 
>     acquiring locks and duration of critical section executions and 
>     report them at the time of disconnect.
> 
> pseudo code:
> 
> PTHREAD_MUTEX_LOCK(lock) {
>      get the current time to T1;
>      pthread_mutex_lock (lock);
>      get the current time T2;
>      if T2-T2 is greather than already recorded time update it //may be we should also remember the xlator in which it happened.
> }
> 
> PTHREAD_MUTEX_UNLOCK(lock) {
>      get the current time to T3;
>      pthread_mutex_unlock (lock);
>      if T3-T2 is greather than already recorded time update it
> }
> 
> Something similar should be done for spin_locks as well.
> 
> When a disconnect event comes this information will be logged along 
> with disconnect messages.
> 
> If you could think of anything else please add it to the thread and we 
> will make a call after a while to see what all can be done to debug 
> such issues further.

When I would be asked on how to debug this ;-) , I would write 
a systemtap script. It obviously is not part of the glusterfs code-base, 
but it is one of the options that we as support engineers can run 
without modifying existing deployments.

The attached script should be a good start to debug any locking issues 
in glusterfs processes. It has been tested on Fedora 20, but should 
likely work on any RHEL (or derivate) system too. You need to install 
kernel-devel for the running kernel (systemtap builds a kernel module 
for the tracing), glusterfs-debuginfo and possibly the -debuginfo for 
dependent libraries. The script itself is plain text, and the comments 
in it should be clear enough to use it.

Good luck!

-- 
Niels de Vos
Sr. Software Maintenance Engineer
Support Engineering Group
Red Hat Global Support Services
-------------- next part --------------
/*
 * Goal:
 *
 *   When some event occurs (a specific function gets called) the script should
 *   exit and print all the stacktraces of the threads waiting on
 *   pthread_mutexes.
 *
 *   See: http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/5901
 *
 *
 * Usage:
 *
 * - install systemtap-devel and dependencies
 * - make sure that all needed debuginfo packages have been installed, at least
 *   glusterfs-debuginfo and kernel-debuginfo-$(uname -r) should be there
 * - run with this command (the awk script add paramters for all libs):
 *     # stap -x $PID \
 *       $(awk '/\.so/ {print " -d " $NF }' /proc/$PID/maps | sort -u) \
 *       log-locks.stp
 *   replace $PID with the glusterfs process that needs debugging
 * 
 *
 * Result:
 *
 *   The output will look something like this, depending on the number of
 *   threads that are active and all that:
 *
 *     ...
 *     13 glusterfs(27217):  <- unlocked the mutex at 0x7f5e55f376f0
 *     0 glusterfs(27225): -- is trying to lock the mutex at 0x7f5e55f376f0
 *     0 glusterfs(27217): -- is trying to lock the mutex at 0x7f5e55f376f0
 *     0 glusterfs(27225):  -> locked the mutex at 0x7f5e55f376f0
 *     42 glusterfs(27225):  <- unlocked the mutex at 0x7f5e55f376f0
 *     Exit requested...
 *     tid 27217 is waiting for 22850us on mutex 0x7f5e55f376f0, stack:
 *     __pthread_mutex_lock+0x0 [libpthread-2.18.so]
 *     inode_table_prune+0x54 [libglusterfs.so.0.0.1]
 *     inode_unref+0x34 [libglusterfs.so.0.0.1]
 *     loc_wipe+0x12 [libglusterfs.so.0.0.1]
 *     dht_local_wipe+0x1e [dht.so]
 *     dht_revalidate_cbk+0x29f [dht.so]
 *     client3_3_lookup_cbk+0x703 [client.so]
 *     rpc_clnt_handle_reply+0x90 [libgfrpc.so.0.0.1]
 *     rpc_clnt_notify+0x171 [libgfrpc.so.0.0.1]
 *     rpc_transport_notify+0x23 [libgfrpc.so.0.0.1]
 *     socket_event_poll_in+0x480 [socket.so]
 *     socket_event_handler+0x164 [socket.so]
 *     event_dispatch_epoll+0x1
 *
 *
 * Extending, adding probes, ...
 *
 * - exit on __fuse_fd_ctx_check_n_create is used for testing, change the probe
 *   for a more suitable use-case, some function that is called on ping-timeout
 * - list the probe points for libpthread and pthread_mutex functions:
 *     # stap -L 'process("/usr/lib64/libpthread-*.so").function("*pthread_mutex*lock")'
 *
 *
 * Date: Tue 25 Mar 2014
 * Author: Niels de Vos <ndevos at redhat.com>
 */

global mutexes, waiting, tid_stacks;

probe process("/usr/lib64/libpthread-*.so").function("__pthread_mutex_lock")
{
        printf("%s -- is trying to lock the mutex at %p\n", thread_indent(0), $mutex);

	waiting[$mutex, tid()] = gettimeofday_us();
	tid_stacks[tid()] = sprint_ubacktrace();
}

probe process("/usr/lib64/libpthread-*.so").function("__pthread_mutex_lock").return
{
        printf("%s -> locked the mutex at %p\n", thread_indent(2), $mutex);

	delete waiting[$mutex, tid()];
	delete tid_stacks[tid()];
}

probe process("/usr/lib64/libpthread-*.so").function("__pthread_mutex_unlock").return
{
        printf("%s <- unlocked the mutex at %p\n", thread_indent(-2), $mutex);

	if (@count(mutexes[$mutex]) == 0)
		delete mutexes[$mutex];
}

probe process("/usr/lib64/glusterfs/*/xlator/mount/fuse.so").function("__fuse_fd_ctx_check_n_create")
{
	/* exit whenever __fuse_fd_ctx_check_n_create() is called */
	exit();
}

probe begin
{
        printf("Tracing started, press CTRL+c to exit...\n")
}

probe end
{
        printf("Exit requested...\n")

	now = gettimeofday_us();

	foreach ([mutex, tid] in waiting) {
		since = waiting[mutex, tid];

		printf("tid %d is waiting for %dus on mutex %p, stack:\n%s\n\n", tid, now - since, mutex, tid_stacks[tid]);
		delete tid_stacks[tid];
	}

	delete waiting;
}