[Gluster-devel] debugging ping timeouts
Pranith Kumar Karampuri
pkarampu at redhat.com
Fri Mar 21 09:25:29 UTC 2014
hi,
I do not think glusterfs at the moment could tell why a ping-timeout happened. And by the time a user learns that such an event happened, client would have disconnected and reconnected, so we can not debug the issue any more. One of the reasons why ping-timeouts may happen is because epoll thread is busy doing something, most probably waiting on a mutex lock. So I am thinking may be we should add some extra information before and after acquiring locks and duration of critical section executions and report them at the time of disconnect.
pseudo code:
PTHREAD_MUTEX_LOCK(lock) {
get the current time to T1;
pthread_mutex_lock (lock);
get the current time T2;
if T2-T2 is greather than already recorded time update it //may be we should also remember the xlator in which it happened.
}
PTHREAD_MUTEX_UNLOCK(lock) {
get the current time to T3;
pthread_mutex_unlock (lock);
if T3-T2 is greather than already recorded time update it
}
Something similar should be done for spin_locks as well.
When a disconnect event comes this information will be logged along with disconnect messages.
If you could think of anything else please add it to the thread and we will make a call after a while to see what all can be done to debug such issues further.
Pranith
More information about the Gluster-devel
mailing list