[Gluster-devel] Improving Geo-replication Status and Checkpoints
Aravinda
avishwan at redhat.com
Wed Jan 28 10:37:21 UTC 2015
Background
----------
We have `status` and `status detail` commands for GlusterFS
geo-replication, This mail is to fix the existing issues in these
command outputs. Let us know if we need any other columns which helps
users to get meaningful status.
Existing output
---------------
Status command output
MASTER NODE - Master node hostname/IP
MASTER VOL - Master volume name
MASTER BRICK - Master brick path
SLAVE - Slave host and Volume name(HOST::VOL format)
STATUS - Stable/Faulty/Active/Passive/Stopped/Not Started
CHECKPOINT STATUS - Details about Checkpoint completion
CRAWL STATUS - Hybrid/History/Changelog
Status detail -
MASTER NODE - Master node hostname/IP
MASTER VOL - Master volume name
MASTER BRICK - Master brick path
SLAVE - Slave host and Volume name(HOST::VOL format)
STATUS - Stable/Faulty/Active/Passive/Stopped/Not Started
CHECKPOINT STATUS - Details about Checkpoint completion
CRAWL STATUS - Hybrid/History/Changelog
FILES SYNCD - Number of Files Synced
FILES PENDING - Number of Files Pending
BYTES PENDING - Bytes pending
DELETES PENDING - Number of Deletes Pending
FILES SKIPPED - Number of Files skipped
Issues with existing status and status detail:
----------------------------------------------
1. Active/Passive and Stable/faulty status is mixed up - Same column is
used to show both active/passive status as well as Stable/faulty status.
If Active node goes faulty then by looking at the status it is difficult
to understand Active node is faulty or the passive one.
2. Info about last synced time, unless we set checkpoint it is difficult
to understand till what time data is synced to slave. For example, if a
admin want's to know all the files synced which are created 15 mins ago,
it is not possible without setting checkpoint.
3. Wrong values in metrics.
4. When multiple bricks present in same node. Status shows Faulty when
one of the worker is faulty in that node.
Changes:
--------
1. Active nodes will be prefixed with * to identify it is a active
node.(In xml output active tag will be introduced with values 0 or 1)
2. New column will show the last synced time, which minimizes the use of
checkpoint feature. Checkpoint status will be shown only in status detail.
3. Checkpoint Status is removed, Separate Checkpoint command will be
added to gluster cli(We can introduce multiple Checkpoint feature with
this change)
4. Status values will be "Not
Started/Initializing/Started/Faulty/Stopped". Stable is changed to "Started"
5. Slave User column will be introduced to show to which user geo-rep
session is established.(Useful in Non root geo-rep)
6. Bytes pending column will be removed. It is not possible to identify
the delta without simulating sync. For example, we are using rsync to
sync data from master to slave, If we need to know how much data to be
transferred then we have to run the rsync command with --dry-run flag
before running actual command. With tar-ssh we have to stat all the
files which are identified to be synced to calculate the total bytes to
be synced. Both are costly operations which degrades the geo-rep
performance.(In Future we can include these columns)
7. Files pending, Synced, deletes pending are only session information
of the worker, these numbers will not match with the number of files
present in Filesystem. If worker restarts, counter will reset to zero.
When worker restarts, it logs previous session stats before resetting it.
8. Files Skipped is persistent status across sessions, Shows exact count
of number of files skipped(Can get list of GFIDs skipped from log file)
9. "Deletes Pending" column can be removed?
Example output
MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE
STATUS LAST SYNCED CRAWL
----------------------------------------------------------------------------------------------------------------
* fedoravm1 gvm /gfs/b1 root fedoravm3::gvs
Started 2014-05-10 03:07 pm Changelog
fedoravm2 gvm /gfs/b2 root fedoravm4::gvs
Started 2014-05-10 03:07 pm Changelog
New Status columns
ACTIVE_PASSIVE - * if Active else none.
MASTER NODE - Master node hostname/IP
MASTER VOL - Master volume name
MASTER BRICK - Master brick path
SLAVE USER - Slave user to which geo-rep is established.
SLAVE - Slave host and Volume name(HOST::VOL format)
STATUS - Stable/Faulty/Active/Passive/Stopped/Not Started
LAST SYNCED - Last synced time(Based on stime xattr)
CHECKPOINT STATUS - Details about Checkpoint completion
CRAWL STATUS - Hybrid/History/Changelog
FILES SYNCD - Number of Files Synced
FILES PENDING - Number of Files Pending
DELETES PENDING- Number of Deletes Pending
FILES SKIPPED - Number of Files skipped
XML output
active
master_node
master_node_uuid
master_brick
slave_user
slave
status
last_synced
crawl_status
files_syncd
files_pending
deletes_pending
files_skipped
Checkpoints
===========
New set of Gluster CLI commands will be introduced for Checkpoints.
gluster volume geo-replication <VOLNAME> <SLAVEHOST>::<SLAVEVOL>
checkpoint create <NAME> <DATE>
gluster volume geo-replication <VOLNAME> <SLAVEHOST>::<SLAVEVOL>
checkpoint delete <NAME>
gluster volume geo-replication <VOLNAME> <SLAVEHOST>::<SLAVEVOL>
checkpoint delete all
gluster volume geo-replication <VOLNAME> <SLAVEHOST>::<SLAVEVOL>
checkpoint status [<NAME>]
gluster volume geo-replication <VOLNAME> checkpoint status # For
all geo-rep sessions for that volume
gluster volume geo-replication checkpoint status # For all geo-rep
sessions for all volumes
Checkpoint Status:
SESSION NAME Completed Checkpoint
Time Completion Time
-----------------------------------------------------------------------------------------
gvm->root at fedoravm3::gvs Chk1 Yes 2014-11-30 11:30
pm 2014-12-01 02:30 pm
gvm->root at fedoravm3::gvs Chk2 No 2014-12-01 10:00
pm N/A
XML output:
session
master_uuid
name
completed
checkpoint_time
completion_time
--
regards
Aravinda
More information about the Gluster-devel
mailing list