[Gluster-devel] Improving Geo-replication Status and Checkpoints

Wed Apr 1 09:20:54 UTC 2015

On 04/01/2015 02:30 PM, Aravinda wrote:
> Hi,
>
> In each node of Master Cluster one Monitor process and one or more 
> worker process for each brick in that node.
> Monitor will have status file, which will be updated by glusterd. 
> Possible Status values in monitor_status file are Created, Started, 
> Paused, Stopped.
>
> Geo-rep can not be paused if monitor status is not "Started".
>
> Based on monitor_status, we need to hide other information from brick 
> status file from showing it to user. For example, If monitor status is 
> "Stopped", it will not make sense to show "Crawl Status" in Geo-rep 
> Status output. Created a Matrix of possible status values based on 
> Status of Monitor. VALUE represents actual unchanged value from Brick 
> status file.
>
> Monitor Status --->        Created        Started Paused Stopped
> -------------------------------------------------------------------------- 
>
> session                    VALUE          VALUE VALUE       VALUE
> brick                      VALUE          VALUE VALUE       VALUE
> node                       VALUE          VALUE VALUE       VALUE
> node_uuid                  VALUE          VALUE VALUE       VALUE
> volume                     VALUE          VALUE VALUE       VALUE
> slave_user                 VALUE          VALUE VALUE       VALUE
> slave_node                 N/A            VALUE VALUE       N/A
> status                     Created        VALUE Paused Stopped
> last_synced                N/A            VALUE VALUE       VALUE
> crawl_status               N/A            VALUE N/A         N/A
> entry                      N/A            VALUE N/A         N/A
> data                       N/A            VALUE N/A         N/A
> meta                       N/A            VALUE N/A         N/A
> failures                   N/A            VALUE VALUE       VALUE
> checkpoint_completed       N/A            VALUE VALUE       VALUE
> checkpoint_time            N/A            VALUE VALUE       VALUE
> checkpoint_completed_time  N/A            VALUE VALUE       VALUE
>
> Where:
> session - only in XML output, Complete session URL which is used in 
> Create command
> brick - Master Brick Node
> node - Master Node
> node_uuid - Master Node UUID, Only in XML output
> volume - Master Volume
> slave_user - Slave User
> slave_node - Slave node to which respective master worker is connected.
> status - Created/Initializing../Active/Passive/Faulty/Paused/Stopped
> last_synced - Last synced Time
> crawl_status - Hybrid/History/Changelog
> entry - Number of entry ops pending(per session, resets counter if 
> worker restart)
> data - Number of data ops pending(per session, resets counter if 
> worker restart)
> meta - Number of meta ops pending(per session, resets counter if 
> worker restart)
> failures - Number of failures. (If count more than 0, then action item 
> for admin to look in log files)
> checkpoint_completed - Checkpoint Status Yes/No/ N/A
> checkpoint_time - Checkpoint Set time or N/A
> checkpoint_completed_time - Checkpoint Completed Time or N/A
>
> Along with the monitor_status, if brick status is Faulty, following 
> fields will be displayed as N/A.
> active, paused, slave_node, crawl_status, entry, data, metadata

Some questions -

* Would monitor status also have "Initializing" state?
* What's the difference between brick and master node above?
* Is the last_synced time returned in UTC?
* active and paused - are these fields or status values?

>
> Let me know your thoughts.
>
> -- 
> regards
> Aravinda
>
>
> On 02/03/2015 11:00 PM, Aravinda wrote:
>> Today we discussed about Geo-rep Status design, summary of the 
>> discussion.
>>
>> - No usecase for "Deletes pending" column, should we retain it?
>> - No separate column for Active/Passive. Worker can be Active/Passive 
>> only when worker is Stable(It can't be Faulty and Active)
>> - Rename "Not Started" status as "Created"
>> - Checkpoint columns will be retained in the Status output till we 
>> support Multiple checkpoints. Three columns instead of Single 
>> column(Completed, Checkpoint time and Completion time)
>> - Still we have confusion about "Files Pending" and "Files Synced", 
>> What numbers it has to show. Georep can't map the number to exact 
>> count on disk.
>>   Venky suggested to show Entry, Data and Metadata pending as three 
>> columns. (Remove "Files Pending" and "Files Synced")
>> - Rename "Files Skipped" to "Failures"
>>
>> Status output proposed:
>> -----------------------
>> MASTER NODE - Master node hostname/IP
>> MASTER VOL - Master volume name
>> MASTER BRICK - Master brick path
>> SLAVE USER - Slave user to which geo-rep is established.
>> SLAVE - Slave host and Volume name(HOST::VOL format)
>> STATUS - Created/Initializing../Started/Active/Passive/Stopped/Faulty
>> LAST SYNCED - Last synced time(Based on stime xattr)
>> CRAWL STATUS - Hybrid/History/Changelog
>> CHECKPOINT STATUS - Yes/No/ N/A
>> CHECKPOINT TIME - Checkpoint Set Time
>> CHECKPOINT COMPLETED - Checkpoint Completion Time
>>
>> Not yet decided
>> ---------------
>> FILES SYNCD - Number of Files Synced
>> FILES PENDING - Number of Files Pending
>> DELETES PENDING- Number of Deletes Pending
>> FILES SKIPPED - Number of Files skipped
>> ENTRIES - Create/Delete/MKDIR/RENAME etc
>> DATA - Data operations
>> METADATA - SETATTR, SETXATTR etc
>>
>> Let me know your suggestions.
>>
>> -- 
>> regards
>> Aravinda
>>
>>
>> On 02/02/2015 04:51 PM, Aravinda wrote:
>>> Thanks Sahina, replied inline.
>>>
>>> -- 
>>> regards
>>> Aravinda
>>>
>>> On 02/02/2015 12:55 PM, Sahina Bose wrote:
>>>>
>>>> On 01/28/2015 04:07 PM, Aravinda wrote:
>>>>> Background
>>>>> ----------
>>>>> We have `status` and `status detail` commands for GlusterFS 
>>>>> geo-replication, This mail is to fix the existing issues in these 
>>>>> command outputs. Let us know if we need any other columns which 
>>>>> helps users to get meaningful status.
>>>>>
>>>>> Existing output
>>>>> ---------------
>>>>> Status command output
>>>>>     MASTER NODE - Master node hostname/IP
>>>>>     MASTER VOL - Master volume name
>>>>>     MASTER BRICK - Master brick path
>>>>>     SLAVE - Slave host and Volume name(HOST::VOL format)
>>>>>     STATUS - Stable/Faulty/Active/Passive/Stopped/Not Started
>>>>>     CHECKPOINT STATUS - Details about Checkpoint completion
>>>>>     CRAWL STATUS - Hybrid/History/Changelog
>>>>>
>>>>> Status detail -
>>>>>     MASTER NODE - Master node hostname/IP
>>>>>     MASTER VOL - Master volume name
>>>>>     MASTER BRICK - Master brick path
>>>>>     SLAVE - Slave host and Volume name(HOST::VOL format)
>>>>>     STATUS - Stable/Faulty/Active/Passive/Stopped/Not Started
>>>>>     CHECKPOINT STATUS - Details about Checkpoint completion
>>>>>     CRAWL STATUS - Hybrid/History/Changelog
>>>>>     FILES SYNCD - Number of Files Synced
>>>>>     FILES PENDING - Number of Files Pending
>>>>>     BYTES PENDING - Bytes pending
>>>>>     DELETES PENDING - Number of Deletes Pending
>>>>>     FILES SKIPPED - Number of Files skipped
>>>>>
>>>>>
>>>>> Issues with existing status and status detail:
>>>>> ----------------------------------------------
>>>>>
>>>>> 1. Active/Passive and Stable/faulty status is mixed up - Same 
>>>>> column is used to show both active/passive status as well as 
>>>>> Stable/faulty status. If Active node goes faulty then by looking 
>>>>> at the status it is difficult to understand Active node is faulty 
>>>>> or the passive one.
>>>>> 2. Info about last synced time, unless we set checkpoint it is 
>>>>> difficult to understand till what time data is synced to slave. 
>>>>> For example, if a admin want's to know all the files synced which 
>>>>> are created 15 mins ago, it is not possible without setting 
>>>>> checkpoint.
>>>>> 3. Wrong values in metrics.
>>>>> 4. When multiple bricks present in same node. Status shows Faulty 
>>>>> when one of the worker is faulty in that node.
>>>>>
>>>>> Changes:
>>>>> --------
>>>>> 1. Active nodes will be prefixed with * to identify it is a active 
>>>>> node.(In xml output active tag will be introduced with values 0 or 1)
>>>>> 2. New column will show the last synced time, which minimizes the 
>>>>> use of checkpoint feature. Checkpoint status will be shown only in 
>>>>> status detail.
>>>>> 3. Checkpoint Status is removed, Separate Checkpoint command will 
>>>>> be added to gluster cli(We can introduce multiple Checkpoint 
>>>>> feature with this change)
>>>>> 4. Status values will be "Not 
>>>>> Started/Initializing/Started/Faulty/Stopped". Stable is changed to 
>>>>> "Started"
>>>>> 5. Slave User column will be introduced to show to which user 
>>>>> geo-rep session is established.(Useful in Non root geo-rep)
>>>>> 6. Bytes pending column will be removed. It is not possible to 
>>>>> identify the delta without simulating sync. For example, we are 
>>>>> using rsync to sync data from master to slave, If we need to know 
>>>>> how much data to be transferred then we have to run the rsync 
>>>>> command with --dry-run flag before running actual command. With 
>>>>> tar-ssh we have to stat all the files which are identified to be 
>>>>> synced to calculate the total bytes to be synced. Both are costly 
>>>>> operations which degrades the geo-rep performance.(In Future we 
>>>>> can include these columns)
>>>>> 7. Files pending, Synced, deletes pending are only session 
>>>>> information of the worker, these numbers will not match with the 
>>>>> number of files present in Filesystem. If worker restarts, counter 
>>>>> will reset to zero. When worker restarts, it logs previous session 
>>>>> stats before resetting it.
>>>>> 8. Files Skipped is persistent status across sessions, Shows exact 
>>>>> count of number of files skipped(Can get list of GFIDs skipped 
>>>>> from log file)
>>>>> 9. "Deletes Pending" column can be removed?
>>>>
>>>> Is there any way to know if there are errors syncing any of the 
>>>> files? Which column would that reflect in?
>>> "Skipped" Column shows number of files failed to sync to Slave.
>>>
>>>> Is the last synced time - the least of the synced time across the 
>>>> nodes?
>>> Status output will have one entry for each brick, so we are planning 
>>> to display last synced time from that brick.
>>>>
>>>>
>>>>>
>>>>> Example output
>>>>>
>>>>>     MASTER NODE  MASTER VOL  MASTER BRICK  SLAVE USER 
>>>>> SLAVE             STATUS    LAST SYNCED           CRAWL
>>>>> ---------------------------------------------------------------------------------------------------------------- 
>>>>>
>>>>>     * fedoravm1  gvm         /gfs/b1       root fedoravm3::gvs 
>>>>> Started   2014-05-10 03:07 pm   Changelog
>>>>>       fedoravm2  gvm         /gfs/b2       root fedoravm4::gvs 
>>>>> Started   2014-05-10 03:07 pm   Changelog
>>>>>
>>>>> New Status columns
>>>>>
>>>>>     ACTIVE_PASSIVE - * if Active else none.
>>>>>     MASTER NODE - Master node hostname/IP
>>>>>     MASTER VOL - Master volume name
>>>>>     MASTER BRICK - Master brick path
>>>>>     SLAVE USER - Slave user to which geo-rep is established.
>>>>>     SLAVE - Slave host and Volume name(HOST::VOL format)
>>>>>     STATUS - Stable/Faulty/Active/Passive/Stopped/Not Started
>>>>>     LAST SYNCED - Last synced time(Based on stime xattr)
>>>>>     CHECKPOINT STATUS - Details about Checkpoint completion
>>>>>     CRAWL STATUS - Hybrid/History/Changelog
>>>>>     FILES SYNCD - Number of Files Synced
>>>>>     FILES PENDING - Number of Files Pending
>>>>>     DELETES PENDING- Number of Deletes Pending
>>>>>     FILES SKIPPED - Number of Files skipped
>>>>>
>>>>>
>>>>> XML output
>>>>>     active
>>>>>     master_node
>>>>>     master_node_uuid
>>>>>     master_brick
>>>>>     slave_user
>>>>>     slave
>>>>>     status
>>>>>     last_synced
>>>>>     crawl_status
>>>>>     files_syncd
>>>>>     files_pending
>>>>>     deletes_pending
>>>>>     files_skipped
>>>>>
>>>>>
>>>>> Checkpoints
>>>>> ===========
>>>>> New set of Gluster CLI commands will be introduced for Checkpoints.
>>>>>
>>>>>     gluster volume geo-replication <VOLNAME> 
>>>>> <SLAVEHOST>::<SLAVEVOL> checkpoint create <NAME> <DATE>
>>>>> gluster volume geo-replication <VOLNAME> <SLAVEHOST>::<SLAVEVOL> 
>>>>> checkpoint delete <NAME>
>>>>>     gluster volume geo-replication <VOLNAME> 
>>>>> <SLAVEHOST>::<SLAVEVOL> checkpoint delete all
>>>>>     gluster volume geo-replication <VOLNAME> 
>>>>> <SLAVEHOST>::<SLAVEVOL> checkpoint status [<NAME>]
>>>>>     gluster volume geo-replication <VOLNAME> checkpoint status # 
>>>>> For all geo-rep sessions for that volume
>>>>>     gluster volume geo-replication checkpoint status # For all 
>>>>> geo-rep sessions for all volumes
>>>>>
>>>>>
>>>>> Checkpoint Status:
>>>>>
>>>>>     SESSION                    NAME      Completed Checkpoint 
>>>>> Time        Completion Time
>>>>> ----------------------------------------------------------------------------------------- 
>>>>>
>>>>>     gvm->root at fedoravm3::gvs   Chk1      Yes 2014-11-30 11:30 
>>>>> pm    2014-12-01 02:30 pm
>>>>>     gvm->root at fedoravm3::gvs   Chk2      No 2014-12-01 10:00 pm    
>>>>> N/A
>>>>
>>>> Can the time information have the timezone information as well? Or 
>>>> is this UTC time?
>>>> (Same comment for last synced time)
>>> Sure. Will have UTC time in Status output.
>>>>
>>>>>
>>>>> XML output:
>>>>>     session
>>>>>     master_uuid
>>>>>     name
>>>>>     completed
>>>>>     checkpoint_time
>>>>>     completion_time
>>>>>
>>>>>
>>>>> -- 
>>>>> regards
>>>>> Aravinda
>>>>> _______________________________________________
>>>>> Gluster-devel mailing list
>>>>> Gluster-devel at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel