[Gluster-devel] New project on the Forge - gstatus

Sun Feb 9 20:30:56 UTC 2014

Hi, 

I've started a new project on the forge, called gstatus.- wiki page is https://forge.gluster.org/gstatus/pages/Home 

The idea is to provide admins with a single command to assess the state of the components of a cluster - nodes, bricks and volume states - together with capacity information. 

It's the kind of feature that would be great (IMO) as a sub command of gluster i.e. gluster status - but as a stop gap here's the python project (we could even use this as a prototype!) 

On the wiki page, you'll find some additional volume status definitions that I've dreamt up - online-degraded, online-partial, to describe the effect brick down events have on a volume's data availability. There are output examples on the wiki, but here's some examples to show you what you currently get from the tool 

On my test 4-way cluster, this is what a healthy state looks like 

[root at rhs1-1 gstatus]# ./gstatus.py 
Analysis complete 

Cluster Summary: 
Version - 3.4.0.44rhs Nodes - 4/ 4 Bricks - 4/ 4 Volumes - 1/ 1 

Volume Summary 
myvol ONLINE (4/4 bricks online) - Distributed-Replicate 
Capacity: 64.53 MiB/19.97 GiB (used,total) 

Status Messages 
Cluster is healthy, all checks successful 

And then if I take a two nodes down, that provide bricks to the same replica set , I see; 

Analysis complete 

Cluster Summary: 
Version - 3.4.0.44rhs Nodes - 2/ 4 Bricks - 2/ 4 Volumes - 0/ 1 

Volume Summary 
myvol ONLINE_PARTIAL (2/4 bricks online) - Distributed-Replicate 
Capacity: 32.27 MiB/9.99 GiB (used,total) 

Status Messages 
- rhs1-4 is down 
- rhs1-2 is down 
- Brick rhs1-4:/gluster/brick1 is down/unavailable 
- Brick rhs1-2:/gluster/brick1 is down/unavailable 

Pretty much all the data for the volume,bricks and nodes, gets mapped into objects within the code so other checks can easily be added for things like 

- filesystem type recommendations - not using XFS, or not using LVM ... make a recommendation 

- check the brick mount options are correct and best practice 

- show volume info in more detail - raw, and usable with a raw vs usable ratio, brick size stats - are they all the same? 

- show volume layout (like lsgvt does), illustrating replica set relationships 

- you could add a message based on space usage on the brick (high watermark warning, or overpopulated brick - please run rebalance type stuff) 

etc etc 

- add an option to write the data out in compact form, and then run it at interval through cron to create a log file - the log file could then be picked up by Justin's analytics tool to give volume space usage and component availabilty over time - a bit quick and dirty, I know ;) 

At the moment testing consists of vm's on my laptop - so who knows what bugs you may find :) 

Any way if it's of interest give it a go. 

Cheers, 

Paul C 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140209/ac3c1080/attachment-0001.html>