[Bugs] [Bug 1605066] RFE: Need to optimize on time taken by heal info to display o/ p when large number of entries exist

Mon Nov 19 04:16:40 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1605066

Vijay Bellur <vbellur at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vbellur at redhat.com

--- Comment #1 from Vijay Bellur <vbellur at redhat.com> ---
Description of problem:
=======================
Originally by nchilaka at redhat dot com:

on a simple 1x(4+2) ec volume if there are about 60k files to be healed, the
o/p takes quite some time to display the total number of entries.
The problem is for knowing the total number of entries, we got to wait for all
the scanning in xattrops to get over
I feel there can be some optimization here.
Below are very crude approaches, take them as more of a hint to work on the
problems
1) tag and hardlink the files requiring heal into another directory, so that
everytime we don't have to scan the xattrops for fetching heal info list
(yes, there will be corner cases and we may even have a better approach on
discussion)
2)note down the heal info list by some background mechanism every few minutes
and display the o/p to the user when requested. Yes the data can be stale or
not   realtime, but that can be displayed by giving the user a warn saying the
o/p is a few minutes old. But that would give the admin a context of the approx
no. of files requiring heal
3)When we know a brick is down , it means the xattrops of all bricks up are
marked for the those files modified during that time and hence heal info will
mention all of the files, but if we can capture the state of bricks down at one
time, we don't have to scan all bricks xattrops
Eg: In below case b1 and b2 are down and the IOs were going on and completed.
That means all the bricks have same heal pendings list. 
We don't have to scan to get the same o/p for all bricks and hence save on
time.
I understand that this is a happy case and we need to consider many other
options, like cyclic brick down fashion and other things such as the list need
not be same. But we can work on such optimization
[root at dhcp35-45 ~]# gluster v heal ecv info|grep ntries
Number of entries: -
Number of entries: -
Number of entries: 61943
Number of entries: 61943
Number of entries: 61943
Number of entries: 61943

It took me 10min for getting below o/p with no IOs
[root at dhcp35-45 ~]# time gluster v heal ecv info|grep ntries
Number of entries: -
Number of entries: -
Number of entries: 61943
Number of entries: 61943
Number of entries: 61943
Number of entries: 61943

real    10m8.552s
user    2m35.365s
sys    3m17.962s

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.