[Bugs] [Bug 1623749] New: Geo-rep: Few workers fails to start with out any failure
bugzilla at redhat.com
bugzilla at redhat.com
Thu Aug 30 06:07:01 UTC 2018
https://bugzilla.redhat.com/show_bug.cgi?id=1623749
Bug ID: 1623749
Summary: Geo-rep: Few workers fails to start with out any
failure
Product: Red Hat Gluster Storage
Version: 3.4
Component: geo-replication
Assignee: khiremat at redhat.com
Reporter: atumball at redhat.com
QA Contact: rhinduja at redhat.com
CC: avishwan at redhat.com, bugs at gluster.org,
csaba at redhat.com, khiremat at redhat.com,
rhs-bugs at redhat.com, sankarshan at redhat.com,
storage-qa-internal at redhat.com
Depends On: 1614799
+++ This bug was initially created as a clone of Bug #1614799 +++
Description of problem:
Few workers fails to start with out any failure.
Version-Release number of selected component (if applicable):
mainline
How reproducible:
Seen only while running upstream regression test case
prove -v tests/00-geo-rep/georep-basic-dr-rsync.t
Steps to Reproduce:
1. Get upstream gluster source code
2. source install gluster
3. prove -v tests/00-geo-rep/georep-basic-dr-rsync.t
Actual results:
one of the worker fails to start without any log
Expected results:
No worker should fail to start
Additional info:
--- Additional comment from Worker Ant on 2018-08-10 08:41:14 EDT ---
REVIEW: https://review.gluster.org/20704 (geo-rep: Fix deadlock during worker
start) posted (#1) for review on master by Kotresh HR
--- Additional comment from Worker Ant on 2018-08-12 23:52:34 EDT ---
COMMIT: https://review.gluster.org/20704 committed in master by "Amar Tumballi"
<amarts at redhat.com> with a commit message- geo-rep: Fix deadlock during worker
start
Analysis:
Monitor process spawns monitor threads (one per brick).
Each monitor thread, forks worker and agent processes.
Each monitor thread, while intializing, updates the
monitor status file. It is synchronized using flock.
The race is that, some thread can fork worker while
other thread opened the status file resulting in
holding the reference of fd in worker process.
Cause:
flock gets unlocked either by specifically unlocking it
or by closing all duplicate fds referring to the file.
The code was relying on fd close, hence a reference
in worker/agent process by fork could cause the deadlock.
Fix:
1. flock is unlocked specifically.
2. Also made sure to update status file in approriate places so that
the reference is not leaked to worker/agent process.
With this fix, both the deadlock and possible fd
leaks is solved.
fixes: bz#1614799
Change-Id: I0d1ce93072dab07d0dbcc7e779287368cd9f093d
Signed-off-by: Kotresh HR <khiremat at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1614799
[Bug 1614799] Geo-rep: Few workers fails to start with out any failure
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=BtAgTSW1ao&a=cc_unsubscribe
More information about the Bugs
mailing list