[Bugs] [Bug 1420623] [RHV-RHGS]: Application VM paused after add brick operation and VM didn' t comeup after power cycle.

bugzilla at redhat.com bugzilla at redhat.com
Thu Feb 9 06:14:56 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1420623



--- Comment #1 from Raghavendra G <rgowdapp at redhat.com> ---
Pursuing further the RCA of fuse-bridge not waiting till the new graph is up
before directing fops to it, I do see the code where fuse_graph_sync is waiting
on priv->sync_cond after initializing a new graph as active_subvol. However,
"notify" function which broadcasts a signal on priv->sync_cond whenever it
receives a CHILD_DOWN/CHILD_UP doesn't check for on which graph the event was
received before setting priv->event_recvd to true. For eg., consider the
scenario:

* fuse_graph_sync is waiting for a CHILD_UP/CHILD_DOWN on new graph by doing
pthread_cond_wait on priv->sync_cond
* notify receives a CHILD_DOWN on old-graph and signals priv->sync_cond

In the above scenario, fuse_graph_sync wakes up even though no
CHILD_UP/CHILD_DOWN was received on new graph and starts directing ops to new
graph, which will fail eventually till new graph is up.

There is some evidence in the logs too. Note that we started seeing failed fops
immediately after CHILD_DOWN event from afr to its parents (fuse-bridge) and
there were no errors before that.

[2017-02-08 10:25:21.404411] E [MSGID: 108006] [afr-common.c:4681:afr_notify]
0-andromeda-replicate-0: All subvolumes are down. Going offline until atleast
one of them comes back up.
[2017-02-08 10:25:21.438033] W [fuse-bridge.c:2312:fuse_writev_cbk]
0-glusterfs-fuse: 38318: WRITE => -1 gfid=d04d7083-bdfe-4424-be50-a8ce01caa8a1
fd=0x7f83c804b0f8 (Input/output error)
[2017-02-08 10:25:21.438541] W [fuse-bridge.c:2312:fuse_writev_cbk]
0-glusterfs-fuse: 38320: WRITE => -1 gfid=d04d7083-bdfe-4424-be50-a8ce01caa8a1
fd=0x7f83c804b0f8 (Input/output error)
[2017-02-08 10:25:21.455715] W [fuse-bridge.c:767:fuse_attr_cbk]
0-glusterfs-fuse: 38290: STAT() <gfid:8dad6ee2-a57f-47b8-ac06-648931200375> =>
-1 (Input/output error)
[2017-02-08 10:25:21.455821] W [fuse-bridge.c:2312:fuse_writev_cbk]
0-glusterfs-fuse: 38312: WRITE => -1 gfid=8dad6ee2-a57f-47b8-ac06-648931200375
fd=0x7f83c804b06c (Input/output error)
[2017-02-08 10:25:21.456344] W [fuse-bridge.c:2312:fuse_writev_cbk]
0-glusterfs-fuse: 38324: WRITE => -1 gfid=8dad6ee2-a57f-47b8-ac06-648931200375
fd=0x7f83c804b06c (Input/output error)
[2017-02-08 10:25:21.456692] W [fuse-bridge.c:2312:fuse_writev_cbk]
0-glusterfs-fuse: 38326: WRITE => -1 gfid=8dad6ee2-a57f-47b8-ac06-648931200375
fd=0x7f83c804b06c (Input/output error)


I'll wait for debug logs from sas to confirm the above RCA. If it turns out the
RCA above is correct, the fix would be to make fuse_graph_sync wait till a
CHILD_UP/CHILD_DOWN event on "new-graph" it just set as active-subvol instead
of waking up on receiving CHILD_UP/CHILD_DOWN on _any_ graph.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list