[Gluster-devel] counters in tiering / request for comments

Mon Aug 29 14:01:41 UTC 2016

Below is a write-up on tiering counters (bz 1275917) I give three options, and I think option (1) and (3) are doable. (2) is harder and would need more discussion.

Currently counters give limited information on tiering behavior. They are just a raw count of the number of files moved each direction. The overall feature is much less usable as a result.

Generally counters should work with future tiering use cases, i.e. tier according to location or some other policy.

$ gluster volume tier vol1 status
Node                 Promoted files       Demoted files        Status              
---------            ---------            ---------            ---------           
localhost            20                   30                   in progress         
172.17.60.18         0                    0                    in progress         
172.17.60.19         0                    0                    in progress         
172.17.60.20         0                    0                    in progress   

(1)

Customers want to know the total number of files / MB on a tier at any one time. I propose we query the database on the bricks for each tier, to get a count of the number of files. 

$ gluster volume tier vol1 status
Node                 Promoted files /hot count       Demoted files / cold count        Status              
---------            ---------                       ---------                         ---------           
localhost            20 / 500                        30 /2000                          in progress         
172.17.60.18         0                               0                                 in progress         
172.17.60.19         0                               0                                 in progress         
172.17.60.20         0                               0                                 in progress   

(2)

People need to know the ratio of I/Os served by the hot tier to the cold tier. For an administrator, if 90% of your I/Os go to the hot tier, this is good. If only 20% are served by the hot tier, this is bad, and there is a misconfiguration.

Something like this is what we want:

$ gluster volume tier vol1 status
Node                 Promoted files       Demoted files        Read Hit rate   Write Hit Rate     Status              
---------            ---------            ---------            ---------       -------            --------
localhost            0                    0                    80%             75%                in progress   

The difficulty is how to capture that. When we read a large file, it is broken up into multiple individual reads. Each piece is a single read FOP. Should we consider each FOP individually? Or does only the first "hit" to the hot tier count?  

Also, when an FOP comes in, it will first look on one tier, and then the other tier. The callback to the FOP checks success or failure. It is only when the file is found on none of the subvolumes that the FOP returns an error. New code needs to deal with this complexity. If there is failure on the cold tier but success on the hot tier, the "hit count" should be bumped.

We probably do not want to update the "hit rate" on all FOPs. 

(3)

A simpler new counter to implement is the #MB promoted or demoted. I think that could be satisfied in a separate patch and could be done quicker. 

This output with (2) and (3):

$ gluster volume tier vol1 status
Node                 Promoted files/MB    Demoted files/MB     Read Hit rate   Write Hit Rate     Status              
---------            ---------            ---------            ---------       -------            --------
localhost            120/2033MB           50/1044MB            80%             75%                in progress