[Bugs] [Bug 1623749] Geo-rep: Few workers fails to start with out any failure

bugzilla at redhat.com bugzilla at redhat.com
Sat Sep 22 12:06:29 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1623749

Kotresh HR <khiremat at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Doc Type|If docs needed, set a value |Bug Fix


--- Doc Text *updated* ---
Cause: 
When monitor starts the workers, they update the status file using flock to synchronize. When worker one opened the status file to update, worker two can be forked causing the fd to be referenced by worker two. Since it was relied on closing the fd to unlock the lock, worker one failed to unlock as the reference existed in worker two causing a deadlock for worker 2 to come up.


Consequence: 
Worker fails to come up waiting for the lock.

Fix: 
1. flock is unlocked specifically.
2. Also made sure to update status file in approriate places so that the reference is not leaked to worker/agent process.

Result:
All workers comes up without fail.


-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=8iJJJhRxWX&a=cc_unsubscribe


More information about the Bugs mailing list