[Gluster-devel] Troubleshooting and Diagnostic tools for Gluster

Aravinda avishwan at redhat.com
Mon Oct 26 08:03:42 UTC 2015


regards
Aravinda

On 10/23/2015 11:50 PM, Shyam wrote:
>
>
> On 10/23/2015 06:46 AM, Aravinda wrote:
>> Hi Gluster developers,
>>
>> In this mail I am proposing troubleshooting documentation and
>> Gluster Tools infrastructure.
>>
>> Tool to search in documentation
>> ===============================
>> We recently added message Ids to each error messages in Gluster. Some
>> of the error messages are self explanatory. But some error messages
>> requires manual intervention to fix the issue. How about identifying
>> the error messages which requires more explanation and creating
>> documentation for the same. Even though the information about some
>> errors available in documentation, it is very difficult to search and
>> relate to the error message. It will be very useful if we create a
>> tool which looks for documentation and tells us exactly what to do.
>>
>> For example,(Illustrativepurpose only)
>> glusterdoc --explain GEOREP0003
>>
>>      SSH configuration issue. This error is seen when Pem keys from all
>>      master nodes are not distributed properly to Slave
>>      nodes. Use Geo-replication create command with force option to
>>      redistribute the keys. If issue stillpersists, look for any errors
>>      while running hook scripts inGlusterd log file.
>>
>>
>> Note: Inspired from rustc --explain command
>> https://twitter.com/jaredforsyth/status/626960244707606528
>>
>> If we don't know the message id, we can still search from the
>> available documentation like,
>>
>>      glusterdoc --search <SEARCH_KEY_WORD>
>>
>> These commands can be programmatically consumed, for example
>> `--json` will return the output in JSON format. This enables UI
>> developers to automatically show help messages when they display
>> errors.
>
> The message ID based logging was created for this exact purpose (maybe 
> not so elegant a purpose, but leaning towards this :) ). So I am all 
> for it.
>
> (suggestion) The intention of documenting messages with text in 
> DOxygen format, was to be able extract this information from the 
> headers, and create a catalog, that can then be searched etc. This 
> catalog can be processed and shipped as part of the gluster RPMs, 
> which the tool above can use.
I am also in favor of documentation staying with code. It makes easy to 
change the documentation whenever code/algorithm changes.
As you mentioned we can parse the documentation from header files. 
Geo-replication Python code is yet to adopt new MSGID changes.

>
>>
>> Gluster Tools infrastructure
>> ============================
>> Are our Gluster log files sufficient for root causing the issues? Is
>> that error caused due to miss configuration? Geo-replication status is
>> showing faulty. Where to find the reason for Faulty?
>>
>> Sac(surs AT redhat.com) mentioned that heis working on gdeploy and many
>> developers
>> are using their owntools. How about providing common infrastructure(say
>> gtool/glustertool) to host all these tools.
>>
>>  From my toolkit, following tools are available, planning to create
>> more such tools for Geo-replication and Gluster.
>>
>>      volinfo [<VOLNAME>] - Enhanced version of Gluster Volume info
>>      command (http://aravindavk.in/blog/glusterfs-tools/ )
>>
>>      df - df for Gluster Volumes
>> (http://aravindavk.in/blog/glusterdf-df-for-gluster-volumes/ )
>>
>>      georepsetup - A tool to Create Geo-replication session
>> easily(http://aravindavk.in/blog/introducing-georepsetup/ )
>>
>>      gdash - A simple Dashboard for
>>      Gluster(http://aravindavk.in/blog/introducing-gdash/ )
>>
>>      gfid <PATH>   - To get GFID of a file, works both in Mount and
>> Backend(https://github.com/aravindavk/gluster_georep_scripts )
>>
>>      clparser <PATH> - Parse the backend Changelog and print in human
>>      readable 
>> format(https://github.com/aravindavk/gluster_georep_scripts )
>>
>>      xtime <PATH>  - To get XTIME xattr from given
>> path(https://github.com/aravindavk/gluster_georep_scripts )
>>
>>      stime <PATH> - To get STIME xattr from given path(Used by
>>      Geo-replication 
>> https://github.com/aravindavk/gluster_georep_scripts )
>>
>>      volmark <VOLNAME> - To get Volume Mark of given Volume(Used by
>>      Geo-replication 
>> https://github.com/aravindavk/gluster_georep_scripts )
>>
>>
>> Geo-replication developers are already using some tools like Changelog
>> parser, `arequal-checksum` etc.
>>
>> Initial idea for Tools Framework:
>> ---------------------------------
>> A Shell/Python script which looks for the tool in plugins sub 
>> directory, if
>> exists pass all the arguments and call that script.
>>
>> `glustertool help` triggers a python Script plugins/help.py which reads
>> plugins.yml file to get the list of tools and help messages associated
>> with it.
>>
>> No restrictions on the choice of programming language to create
>> tool. It can be bash, Python, Go, Rust, awk, sed etc.
>>
>> Challenges:
>> - Each plugin may have different dependencies, installing all tools
>> may install all the dependencies.
>> - Multiple programming languages, may be difficult to maintain/build.
>> - Maintenance of Third party tools.
>> - Creating Plugins registry to discover tools created by other 
>> developers.
>>
>> Tool Ideas:
>> -----------
>> If you are interested in working on tools for Gluster, I am listing a
>> few ideas to start with, feel free to add your ideas to the list.
>>
>> - A tool to analyze the log file and identify issues. For example,
>>    glustertool georep_log_analize <LOG FILE PATH> --after-date 
>> <TIMESTAMP>
>>
>>    Example output: (Illustrative purpose only)
>>
>>    Number of workers in this node: 2
>>    Number of restarts: 5
>>    Errors: 10
>>    Python Tracebacks: 5
>>    Last state: Active
>>    Files Skipped: 0
>>    Setup issue: No
>>
>> - Extract skipped GFIDs from Geo-replication logs and re-trigger sync.
>>
>>    For example,
>>    glustertool georep_extract_skipped <LOG_FILE> --after-date<TIMESTAMP>
>>
>>    This command will
>>    1. extract Skipped GFIDs list,
>>    2. Mounts Master Volume
>>    3. converts GFID to Path
>>    4. Set Virtual xattr to re-trigger the sync
>>
>> - A tool to detect Split brain
>>
>> - A tool to convert GFID to Path
>>
>> Created a etherpad to record the available tools and ideas
>> https://public.pad.fsfe.org/p/gluster-tools
>> Will update once the I make some progress in creating infrastructure.
>>
>> Comments and Suggestions welcome.
>>



More information about the Gluster-devel mailing list