[Bugs] [Bug 1432542] Glusterd crashes when restarted with many volumes

bugzilla at redhat.com bugzilla at redhat.com
Wed Mar 15 22:49:06 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1432542



--- Comment #11 from Jeff Darcy <jdarcy at redhat.com> ---
Down the rabbit hole we go.

Moving the retries to a separate thread made the attach_brick/send_attach_req
crashes go away, but the __synclock_unlock/list_del_init crashes were still
there.  It's hard to tell from a small sample, but if anything they seemed to
be worse.  After looking at several cores and then running some experiments, I
now strongly suspect that there's a bug in our synclock_lock/synclock_unlock
code.  I'm sure that will come as a shock to just about nobody.

One of the hallmarks of these crashes is that a synclock's waitq - specifically
the one for glusterd's big_lock - has one task on it, but that task has clearly
been freed.  0xdeadcode all over it, indicating that not only has it been freed
but parts of it have since been reallocated.  Somehow the problem seems to be
related to the fact that __synclock_unlock will wake up both a waiting thread
and a waiting synctask if both exist.  If I change it to wake up only one,
giving preference to synctasks, then the crashes go away but I hit some sort of
deadlock (hard to diagnose precisely when none of this has ever been
reproducible other than on our regression machines).  If I change it to wake up
only one, preferring the actual thread, then things seem much better.  The
locking etc. in the synclock_lock/synclock_unlock code *looks* reasonable, but
there must be either some gap or some piece of code that's going around it.

For now, the threads-get-priority change as implemented in patchset 10 seems to
be avoiding the problem, but if we don't finish tracking it down then there's a
high probability we'll just hit it again in some other context some day.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=nj0UFKL2sS&a=cc_unsubscribe


More information about the Bugs mailing list