[Gluster-devel] Tips and Tricks for Gluster Developer

Tue Jan 26 00:04:28 UTC 2016

On 01/22/2016 09:13 AM, Raghavendra Talur wrote:
> HI All,
>
> I am sure there are many tricks hidden under sleeves of many Gluster
> developers.
> I realized this when speaking to new developers. It would be good have a
> searchable thread of such tricks.
>
> Just reply back on this thread with the tricks that you have and I
> promise I will collate them and add them to developer guide.
>
>
> Looking forward to be amazed!
>

Things that I normally do:

1. Visualizing flow through the stack is one of the first steps that I 
use in debugging. Tracing a call from the origin (application) to the 
underlying filesystem is usually helpful in isolating most problems. 
Looking at logs emanating from the endpoints of each stack (fuse/nfs 
etc. + client protocols, server + posix) helps in identifying the stack 
that might be the source of a problem. For understanding the nature of 
fops happening, you can use the wireshark plugin or the trace translator 
at appropriate locations in the graph.

2. Use statedump/meta for understanding internal state. For servers 
statedump is the only recourse, on fuse clients you can get a meta view 
of the filesystem by

cd /mnt/point/.meta

and you get to view the statedump information in a hierarchical fashion 
(including information from individual xlators).

3. Reproduce a problem by minimizing the number of nodes in a graph. 
This can be done by disabling translators that can be disabled through 
volume set interface and by having custom volume files.

4. Use error-gen while developing new code to simulate fault injection 
for fops.

5. If a problem happens only at scale, try reproducing the problem by 
reducing default limits in code (timeouts, inode table limits etc.). 
Some of them do require re-compilation of code.

6. Use the wealth of tools available on *nix systems for understanding a 
performance problem better. This infographic [1] and page [2] by Brendan 
Gregg is quite handy for using the right tool at the right layer.

7. For isolating regression test failures:

      - use tests/utils/testn.sh to quickly identify the failing test
      - grep for the last "Test Summary Report" in the jenkins report 
for a failed regression run. That usually provides a pointer to the 
failing test.
      - In case of a failure due to a core, the gdb command provided in 
the jenkins report is quite handy to get a backtrace after downloading 
the core and its runtime to your laptop.

8. Get necessary information to debug a problem as soon as a new bug is 
logged (a day or two is ideal). If we miss that opportunity, users could 
have potentially moved on to other things and obtaining information can 
prove to be difficult.

9. Be paranoid about any code that you write ;-). Anything that is not 
tested by us will come back to haunt us sometime else in the future.

10. Use terminator [3] for concurrently executing the same command on 
multiple nodes.

Will fill in more when I recollect something useful.

-Vijay

[1] http://www.brendangregg.com/Perf/linux_observability_tools.png

[2] http://www.brendangregg.com/perf.html

[3] https://launchpad.net/terminator