[Gluster-users] Meta-discussion

Whit Blauvelt whit.gluster at transpect.com
Wed Jan 2 13:38:30 UTC 2013


There's a strong trend against documentation of software, and not just in
open source. I'm old enough to remember when anything modestly complex came
with hundreds of pages of manuals, often over several volumes. Now, I can
understand why commercial software with constrained GUIs wants to pretend
that what's underneath is as simple as the GUI suggests, so as not to scare
away customers. And I can understand why some open source projects might
want to withhold knowledge to motivate consulting contracts, as cynical as
that may be.

But something on the scale of Gluster should have someone hired full time to
do nothing but continuously write and update documentation. If you need a
business model for that, print the results in a set of thick books, and sell
it for $250 or so. Print JIT so you can track point releases. What Brian
asks for should be the core of it. Even when stuff breaks for people who
have paid for their RedHat Solution Architect, it will give that architect a
place to look up the fix quickly, rather than having to go bother the
development team, who are more profitably deployed in development.

Best,
Whit


On Wed, Jan 02, 2013 at 01:19:17PM +0100, Fred van Zwieten wrote:
> +1 for 2b.
> 
> I am in de planning stages for an RHS 2.0 deployement and I too have suggested
> a "cookbook" style guide for step-by-step procedures to my RedHat Solution
> Architect.
> 
> What can I do to have this upped in the prio-list?
> 
> Cheers,
> Fred
> 
> 
> On Wed, Jan 2, 2013 at 12:49 PM, Brian Candler <B.Candler at pobox.com> wrote:
> 
>     On Thu, Dec 27, 2012 at 06:53:46PM -0500, John Mark Walker wrote:
>     > I invite all sorts of disagreeable comments, and I'm all for public
>     > discussion of things - as can be seen in this list's archives.  But, for
>     > better or worse, we've chosen the approach that we have.  Anyone who
>     would
>     > like to challenge that approach is welcome to take up that discussion
>     with
>     > our developers on gluster-devel.  This list is for those who need help
>     > using glusterfs.
>     >
>     > I am sorry that you haven't been able to deploy glusterfs in production.
>     > Discussing how and why glusterfs works - or doesn't work - for particular
>     > use cases is welcome on this list.  Starting off a discussion about how
>     > the entire approach is unworkable is kind of counter-productive and not
>     > exactly helpful to those of us who just want to use the thing.
> 
>     For me, the biggest problems with glusterfs are not in its design, feature
>     set or performance; they are around what happens when something goes wrong.
>     As I perceive them, the issues are:
> 
>     1. An almost total lack of error reporting, beyond incomprehensible entries
>     in log files on a completely different machine, made very difficult to find
>     because they are mixed in with all sorts of other incomprehensible log
>     entries.
> 
>     2. Incomplete documentation. This breaks down further as:
> 
>     2a. A total lack of architecture and implementation documentation - such as
>     what the translators are and how they work internally, what a GFID is, what
>     xattrs are stored where and what they mean, and all the on-disk states you
>     can expect to see during replication and healing.  Without this level of
>     documentation, it's impossible to interpret the log messages from (1) short
>     of reverse-engineering the source code (which is also very minimalist when
>     it comes to comments); and hence it's impossible to reason about what has
>     happened when the system is misbehaving, and what would be the correct and
>     safe intervention to make.
> 
>     glusterfs 2.x actually had fairly comprehensive internals documentation,
>     but
>     this has all been stripped out in 3.x to turn it into a "black box".
>     Conversely, development on 3.x has diverged enough from 2.x to make the 2.x
>     documentation unusable.
> 
>     2b. An almost total lack of procedural documentation, such as "to replace a
>     failed server with another one, follow these steps" (which in that case
>     involves manually copying peer UUIDs from one server to another), or "if
>     volume rebalance gets stuck, do this".  When you come across any of these
>     issues you end up asking the list, and to be fair the list generally
>     responds promptly and helpfully - but that approach doesn't scale, and
>     doesn't necessarily help if you have a storage problem at 3am.
> 
>     For these reasons, I am holding back from deploying any of the more
>     interesting features of glusterfs, such as replicated volumes and
>     distributed volumes which might grow and need rebalancing.  And without
>     those, I may as well go back to standard NFS and rsync.
> 
>     And yes, I have raised a number of bug reports for specific issues, but
>     reporting a bug whenever you come across a problem in testing or production
>     is not the right answer.  It seems to me that all these edge and error
>     cases
>     and recovery procedures should already have been developed and tested *as a
>     matter of course*, for a service as critical as storage.
> 
>     I'm not saying there is no error handling in glusterfs, because that's
>     clearly not true.  What I'm saying is that any complex system is bound to
>     have states where processes cannot proceed without external assistance, and
>     these cases all need to be tested, and you need to have good error
>     reporting
>     and good documentation.
> 
>     I know I'm not the only person to have been affected, because there is a
>     steady stream of people on this list who are asking for help with how to
>     cope with replication and rebalancing failures.
> 
>     Please don't consider the above as non-constructive. I count myself amongst
>     "those of us who just want to use the thing".  But right now, I cannot
>     wholeheartedly recommend it to my colleagues, because I cannot confidently
>     say that I or they would be able to handle the failure scenarios I have
>     already experienced, or other ones which may occur in the future.
> 
>     Regards,
> 
>     Brian.
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org
>     http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
> 

> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users




More information about the Gluster-users mailing list