[Bugs] [Bug 1475255] New: [Geo-rep]: Geo-rep hangs in changelog mode
bugzilla at redhat.com
bugzilla at redhat.com
Wed Jul 26 09:56:05 UTC 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1475255
Bug ID: 1475255
Summary: [Geo-rep]: Geo-rep hangs in changelog mode
Product: GlusterFS
Version: mainline
Component: geo-replication
Severity: high
Assignee: bugs at gluster.org
Reporter: khiremat at redhat.com
CC: bugs at gluster.org
Description of problem:
Geo-replication worker hangs and doesn't switch to 'Changelog Crawl'.
No data from master gets synced to slave. But the geo-rep works fine
with xsync as change_detector.
Version-Release number of selected component (if applicable):
mainline
How reproducible:
Always
Steps to Reproduce:
1. Setup geo-replication session between two gluster volumes
Actual results:
Geo-rep hangs and no data is synced
Expected results:
Geo-rep should not hang and should sync data
Additional info:
Analysis:
It was found out that the culprit was patch
"https://review.gluster.org/#/c/17779/" which got into mainline. This patch is
causing a hang in 'libgfchangelog' library which the geo-replication uses to
get the changelogs from brick back end.
The back trace of the hang is as below.
Thread 1 (Thread 0x7ffff7fe5700 (LWP 11895)):
#0 pthread_spin_lock () at ../sysdeps/x86_64/nptl/pthread_spin_lock.S:32
#1 0x00007ffff7911af4 in mem_get (mem_pool=0x7ffff7bc5588 <pools+200>) at
mem-pool.c:758
#2 0x00007ffff7911791 in mem_get0 (mem_pool=0x7ffff7bc5588 <pools+200>) at
mem-pool.c:657
#3 0x00007ffff78dcd80 in log_buf_new () at logging.c:284
#4 0x00007ffff78e0c0a in _gf_msg_internal (domain=0x602b80 "gfchangelog",
file=0x7ffff7bd50cd "gf-changelog.c",
function=0x7ffff7bd52a0 <__FUNCTION__.17176>
"gf_changelog_register_generic", line=552, level=GF_LOG_INFO, errnum=0,
msgid=132028, appmsgstr=0x7fffffffd018, callstr=0x0, graph_id=0)
at logging.c:1961
#5 0x00007ffff78e110c in _gf_msg (domain=0x602b80 "gfchangelog",
file=0x7ffff7bd50cd "gf-changelog.c", function=0x7ffff7bd52a0
<__FUNCTION__.17176> "gf_changelog_register_generic",
line=552, level=GF_LOG_INFO, errnum=0, trace=0, msgid=132028,
fmt=0x7ffff7bd51c8 "Registering brick: %s [notify filter: %d]") at
logging.c:2077
#6 0x00007ffff7bcd8dd in gf_changelog_register_generic (bricks=0x7fffffffd1c0,
count=0, ordered=1, logfile=0x400c0d "/tmp/change.log", lvl=9, xl=0x0) at
gf-changelog.c:549
#7 0x00007ffff7bcda84 in gf_changelog_register (brick_path=0x400c2a
"/bricks/brick0/b0", scratch_dir=0x400c1d "/tmp/scratch", log_file=0x400c0d
"/tmp/change.log", log_level=9,
max_reconnects=5) at gf-changelog.c:623
#8 0x00000000004009ff in main (argc=1, argv=0x7fffffffe328) at
get-changes.c:49
The call flow of first mem_get is as below
mem_get-->mem_get_pool_list--> pthread_getspecific(pool_key)
pthread_getspecific should have returned NULL as the pool_key is not set
because
mem_pools_init_early/mem_pools_init_late is not called in this code path. But
it returned some value and hence spin lock initialization didn't happen causing
this hang.
According to man page of pthread_getspecific
"The effect of calling pthread_getspecific() or pthread_setspecific() with
a key value not obtained from pthread_key_create() or after key has been
deleted with pthread_key_delete() is undefined."
So we should not be having this if condition below ?
mem_pools_init_early (void)
{
pthread_mutex_lock (&init_mutex);
/* Use a pthread_key destructor to clean up when a thread exits.
*
* We won't increase init_count here, that is only done when the
* pool_sweeper thread is started too.
*/
if (pthread_getspecific (pool_key) == NULL) {
/* key has not been created yet */
if (pthread_key_create (&pool_key, pool_destructor) != 0) {
gf_log ("mem-pool", GF_LOG_CRITICAL,
"failed to initialize mem-pool key");
}
}
pthread_mutex_unlock (&init_mutex);
}
And now is it mandatory to do mem_pool_init_early in all the code paths like
libgfchangelog?
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list