[Gluster-devel] Implementing multiplexing for self heal client.

RAFI KC rkavunga at redhat.com
Tue Jan 8 06:53:04 UTC 2019

I have completed the patches and pushed for reviews. Please feel free to 
raise your review concerns/suggestions.






Rafi KC

On 12/24/18 3:58 PM, RAFI KC wrote:
> On 12/21/18 6:56 PM, Sankarshan Mukhopadhyay wrote:
>> On Fri, Dec 21, 2018 at 6:30 PM RAFI KC <rkavunga at redhat.com> wrote:
>>> Hi All,
>>> What is the problem?
>>> As of now self-heal client is running as one daemon per node, this 
>>> means
>>> even if there are multiple volumes, there will only be one self-heal
>>> daemon. So to take effect of each configuration changes in the cluster,
>>> the self-heal has to be reconfigured. But it doesn't have ability to
>>> dynamically reconfigure. Which means when you have lot of volumes in 
>>> the
>>> cluster, every management operation that involves configurations 
>>> changes
>>> like volume start/stop, add/remove brick etc will result in self-heal
>>> daemon restart. If such operation is executed more often, it is not 
>>> only
>>> slow down self-heal for a volume, but also increases the slef-heal logs
>>> substantially.
>> What is the value of the number of volumes when you write "lot of
>> volumes"? 1000 volumes, more etc
> Yes, more than 1000 volumes. It also depends on how often you execute 
> glusterd management operations (mentioned above). Each time self heal 
> daemon is restarted, it prints the entire graph. This graph traces in 
> the log will contribute the majority it's size.
>>> How to fix it?
>>> We are planning to follow a similar procedure as attach/detach graphs
>>> dynamically which is similar to brick multiplex. The detailed steps is
>>> as below,
>>> 1) First step is to make shd per volume daemon, to generate/reconfigure
>>> volfiles per volume basis .
>>>     1.1) This will help to attach the volfiles easily to existing 
>>> shd daemon
>>>     1.2) This will help to send notification to shd daemon as each
>>> volinfo keeps the daemon object
>>>     1.3) reconfiguring a particular subvolume is easier as we can check
>>> the topology better
>>>     1.4) With this change the volfiles will be moved to workdir/vols/
>>> directory.
>>> 2) Writing new rpc requests like attach/detach_client_graph function to
>>> support clients attach/detach
>>>     2.1) Also functions like graph reconfigure, mgmt_getspec_cbk has to
>>> be modified
>>> 3) Safely detaching a subvolume when there are pending frames to 
>>> unwind.
>>>     3.1) We can mark the client disconnected and make all the frames to
>>> unwind with ENOTCONN
>>>     3.2) We can wait all the i/o to unwind until the new updated subvol
>>> attaches
>>> 4) Handle scenarios like glusterd restart, node reboot, etc
>>> At the moment we are not planning to limit the number of heal subvolmes
>>> per process as, because with the current approach also for every volume
>>> heal was doing from a single process. We have not heared any major
>>> complains on this?
>> Is the plan to not ever limit or, have a throttle set to a default
>> high(er) value? How would system resources be impacted if the proposed
>> design is implemented?
> The plan is to implement in a way that it can support more than one 
> multiplexed self-heal daemon. The throttling function as of now 
> returns the same process to multiplex, but it can be easily modified 
> to create a new process.
> This multiplexing logic won't utilize any additional resources that it 
> currently does.
> Rafi KC
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20190108/2bb9d09f/attachment.html>

More information about the Gluster-devel mailing list