[Gluster-devel] Race with volfile notification and stopping of brick

Hari Gowtham hgowtham at redhat.com
Mon Jul 3 06:50:46 UTC 2017


Hi,

I came across a situation where there were a few IOs going to the subvolume
which was not available. The situation happens due to the following.

During the remove brick commit the following things happen, the brick stop,
volfile creation, and volfile change notification to client.

The order in which this happens is
1) the brick is stopped.
2) the volfile are created and then the notification go to the client.
This way there is a window between the brick stop and the clients being
notified that the brick has been stopped.

The brick is unavailable and the IO is coming to the stopped brick as the
client is unaware of the volfile change for a while. And this results in an
IO failure.

So I feel its better to do it in the following order:
1) create the volfile.
2) notify the client.
3) stop the brick.

This way the clients are notified and the IO starts going to the right
subvol and the brick is available till then and as the brick is stopped
after this the condition is resolved.

As this change is on the basic functionality, I thought of bringing it up
here to everyones notice.
If you find anything that could break because of this change, or feel if
there is a better way to handle this, Do let me know.

Thanks to Du, Atin, Kaushal and Nithya for helping me with this.

Regards,
Hari.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170703/d39cf46d/attachment.html>


More information about the Gluster-devel mailing list