[Gluster-devel] Implementing multiplexing for self heal client.

Sankarshan Mukhopadhyay sankarshan.mukhopadhyay at gmail.com
Fri Dec 21 13:26:31 UTC 2018


On Fri, Dec 21, 2018 at 6:30 PM RAFI KC <rkavunga at redhat.com> wrote:
>
> Hi All,
>
> What is the problem?
> As of now self-heal client is running as one daemon per node, this means
> even if there are multiple volumes, there will only be one self-heal
> daemon. So to take effect of each configuration changes in the cluster,
> the self-heal has to be reconfigured. But it doesn't have ability to
> dynamically reconfigure. Which means when you have lot of volumes in the
> cluster, every management operation that involves configurations changes
> like volume start/stop, add/remove brick etc will result in self-heal
> daemon restart. If such operation is executed more often, it is not only
> slow down self-heal for a volume, but also increases the slef-heal logs
> substantially.

What is the value of the number of volumes when you write "lot of
volumes"? 1000 volumes, more etc

>
>
> How to fix it?
>
> We are planning to follow a similar procedure as attach/detach graphs
> dynamically which is similar to brick multiplex. The detailed steps is
> as below,
>
>
>
>
> 1) First step is to make shd per volume daemon, to generate/reconfigure
> volfiles per volume basis .
>
>    1.1) This will help to attach the volfiles easily to existing shd daemon
>
>    1.2) This will help to send notification to shd daemon as each
> volinfo keeps the daemon object
>
>    1.3) reconfiguring a particular subvolume is easier as we can check
> the topology better
>
>    1.4) With this change the volfiles will be moved to workdir/vols/
> directory.
>
> 2) Writing new rpc requests like attach/detach_client_graph function to
> support clients attach/detach
>
>    2.1) Also functions like graph reconfigure, mgmt_getspec_cbk has to
> be modified
>
> 3) Safely detaching a subvolume when there are pending frames to unwind.
>
>    3.1) We can mark the client disconnected and make all the frames to
> unwind with ENOTCONN
>
>    3.2) We can wait all the i/o to unwind until the new updated subvol
> attaches
>
> 4) Handle scenarios like glusterd restart, node reboot, etc
>
>
>
> At the moment we are not planning to limit the number of heal subvolmes
> per process as, because with the current approach also for every volume
> heal was doing from a single process. We have not heared any major
> complains on this?

Is the plan to not ever limit or, have a throttle set to a default
high(er) value? How would system resources be impacted if the proposed
design is implemented?


More information about the Gluster-devel mailing list