[Gluster-devel] Performance Translators' Stability and Usefulness

Sat Jul 4 18:07:13 UTC 2009

Hi Gordan,

> There is an argument somewhere in there about deploying things that
> aren't production ready at time of deployment. But that's a different
> story.

GlusterFS was represented as being stable, with its 1.x version number and 
website spiel. Before the original deployment, I tested extensively and 
exhaustively on test platforms identical to my production system without 
issue. (I did my due care and diligence.)

However, as GlusterFS was 'improved', backwards-incompatible and unstable 
changes crept in. Once the stability issues were noticeable (and seemingly 
untractable in my situation), it was too late to go back to a stable version 
without manually migrating hundreds of gigabytes of client data.

Given how lengthy and disruptive a task that would have proven to the client 
base, it was deemed best to hope that GlusterFS would improve in the long 
run.

So I made the system as stable as possible, and scheduled potential 
crash-causing behaviour (necessary from a business perspective) to times when 
it would least disrupt clients.

I have also made repeated efforts to improve the software myself, as you can 
see in the archives. The resulting patches of numerous potential and actual 
issues were basically ignored, because the devs seem to dislike my code just 
on the grounds of it containing code comments. (Figure that one out.)

After a few months of trying to maintain my patches relative to the current 
development, I gave up. It was simply too much effort for too little reward, 
and there were money-making tasks at greater priorities expected from me.

> It is the only feature of it that I am looking into using, too, but it
> is plausible that somebody with a large distributed server farm focused
> on performance rather than redundancy may see it differently.

The story of DHT is not much better than AFR's, if you read through the 
archives.

> There's a strong argument there for implementing syslog based logging.
> How do you do log rotation, BTW? Do you have to issue a HUP? Or restart
> the glusterfsd process? As I said, I have seen issues with restarting
> server processes in different orders. Sometimes things will lock up and
> the glusterfsd process has to be killed and restarted. It seems to work
> when servers come up in priority order, but other orderings can be hit
> and miss.

I have tried various logrotate and GlusterFS server restart methods for the 
log issue. This was not effective at stopping the issue - especially since it 
apparently happens at random during high I/O load.

> Why do you namespaces for straight AFR?

This is a hangover from 1.3. It used to be mandatory back then. (I'm frankly 
not too game to experiment using GlusterFS without it, given how hard it 
would be to go back afterwards.)

> Sounds like a lot of effort and micro-downtime compared to a migration
> to something else. Have you explored other options like PeerFS, GFS and
> SeznamFS? Or NFS exports with failover rather than Gluster clients, with
> Gluster only server-to-server?

These options are not production ready (as I believe has been pointed out 
already to the list) for what I need; or in the case of NFS, defeating the 
point of redundancy in the first place. (Also, GFS is also not compatible 
with the kernel patchset I need to use.)

I have tried AFR on the server side and the client side. Both display similar 
issues.

An older version of GlusterFS - as buggy as it is for me - is unfortunately 
still the best option. 

(That doesn't mean I can't complain about the lack of progress towards 
stability and reliability, though :)

> OK, I haven't actually checked. A "make test" feature listing all bugs
> by bugzilla ID as it goes through the testing process would go a long
> way toward providing some quality reassurance.

Agreed. We have both asked for this before, if I recall correctly.

> One of the problems is that some tests in this case are impossible to
> carry out without having multiple nodes up and running, as a number of
> bugs have been arising in cases where nodes join/leave or cause race
> conditions. It would require a distributed test harness which would be
> difficult to implement so that they run on any client that builds the
> binaries. Just because the test harness doesn't ship with the sources
> doesn't mean it doesn't exist on a test rig the developers use

Okay, so what about the volume of test cases that can be tested without a 
distributed test harness? I don't see any sign of testing mechanisms for 
that.

And wouldn't it be prudent anyway - giving how often the GlusterFS devs do not 
have access to the platform with the reported problem - to provide this 
harness so that people can generate the appropriate test results the devs 
need for themselves? (Giving a complete stranger from overseas root access is 
a legal minefield to those who have to work with data held in-confidence.)

It really makes me wonder about the existence of said testing tools.

> As I explained before, you can't sensibly come up with QA tests for
> timing based issues and race conditions, because those will always be
> heisenbuggy to some extent. I'm not saying such tests should exist, and
> at least perform some hammering for extended periods that was known to
> trigger the known issues, but that only counts statistically, it won't
> provide conclusive evidence of absence of the bug.

I'm well aware of the maxims concerning proving the absence of bugs with 
testing. (I used to teach formal methods subjects at Uni.)

However, long running tests under real-world load on multiple platforms, care 
with locking protocols, different input data lengths, use of code quality 
testing tools, requests at random intervals, and various other defensive 
programming techniques in general will go a long way towards catching the 
majority of cases that users will experience in regular (or even certain 
irregular) circumstances.

It's been my impression, though, that the relevant bugs are not heisenbugs or 
race conditions.

(I'm judging that on the speed of the follow up patch, by the way - race 
conditions notoriously can take a long time to track down.)

Geoff.

On Sun, 5 Jul 2009, Gordan Bobic wrote:
> Geoff Kassel wrote:
> > (If it wasn't for that migrating to another solution would cause
> > considerable, business-destroying downtime for my client base, I would
> > have done so quite some time ago.)
>
> There is an argument somewhere in there about deploying things that
> aren't production ready at time of deployment. But that's a different
> story.
>
> > All I see instead is this constant drive towards new features, with
> > little to no signs that functionality that should be complete by now is
> > actually so.
>
> I can understand your point of view, but at the same time I'm assuming
> that the feature expansion is being done at the request of the paying
> customers they have, whose priorities and use cases may well be
> sufficiently different that the issues we are running into aren't as
> critical for them.
>
> > AFR is *the* key feature of GlusterFS in my mind - and the only point (I
> > feel) for using it. Yet it's still this unstable after two plus years of
> > development?
>
> It is the only feature of it that I am looking into using, too, but it
> is plausible that somebody with a large distributed server farm focused
> on performance rather than redundancy may see it differently.
>
> >>> I have been using GlusterFS since the v1.3.x days, and I have yet to
> >>> see a version since then that doesn't crash at least once a day from
> >>> just load on even the simplest configurations.
> >>
> >> I wouldn't say daily, but occasionally, I have seen lock-ups recently
> >> during multiple glusterfs resyncs (separate volumes) on the new/target
> >> machine. I have only seen it once, however, forcefully killing the
> >> processes fixed it and it didn't re-occur. I have a suspicion that this
> >> was related to the mounting order. I have seen weirdness happen when
> >> changing the server order cluster-wide, and when servers rejoin the
> >> cluster.
> >
> > Well, I see one to two crashes nightly, when I rotate logs or perform
> > backups that are stored on the GlusterFS exported drive. (It's hit and
> > miss which processes run to completion on the first go before the crash,
> > which should never be an issue with a reliable storage medium.)
>
> There's a strong argument there for implementing syslog based logging.
> How do you do log rotation, BTW? Do you have to issue a HUP? Or restart
> the glusterfsd process? As I said, I have seen issues with restarting
> server processes in different orders. Sometimes things will lock up and
> the glusterfsd process has to be killed and restarted. It seems to work
> when servers come up in priority order, but other orderings can be hit
> and miss.
>
> > The only common factor identifiable is higher-than-average I/O load.
> >
> > I don't run any performance translators, because they make the situation
> > much worse. It's just a straight AFR/posix-locks/dataspace/namespace
> > setup, as I've posted quite a few times before.
>
> Why do you namespaces for straight AFR?
>
> > I've had to institute server scripting to restart GlusterFS and any
> > processes that touches replicated files (i.e. nearly everything running
> > on my servers) because of these crashes to try to minimise the downtime
> > to my clients.
>
> Sounds like a lot of effort and micro-downtime compared to a migration
> to something else. Have you explored other options like PeerFS, GFS and
> SeznamFS? Or NFS exports with failover rather than Gluster clients, with
> Gluster only server-to-server?
>
> >> Yes, that was bad, 2.0.2 is pretty good. Sure, there is still that
> >> annoying settle-time bug that consistently fails the first attempt to
> >> access the file system immediately after mounting (the time gap is
> >> pretty tight, but if you script it, it is 100% reproducible). But other
> >> than that I'm finding that all the other issues I had with it have been
> >> resolved.
> >
> > After two major data integrity bugs in two major releases in a row, I'm
> > taking very much a wait-and-see attitude with any and all GlusterFS
> > releases.
>
> My use-case is somewhat unusual because I'm working on shared-rootfs
> clusters, and I need WAN functionality which cripples solutions like
> DRBD+GFS. But for data-only storage, there are probably alternatives out
> there. I'm intending to implement SeznamFS for bulk data, for example,
> because it's MySQL-like round-robin file replication distributes the
> bandwidth usage much more effectively (at the expense of having no
> locking capability and the replication ring being cut off if any one
> node fails). I'll probably stick with Gluster for /home for now because
> SeznamFS seemed to cause X and/or KDE to fail to start when /home was on
> SeznamFS.
>
> >> What exactly do you mean by "regression test"? Regression testing means
> >> putting in a test case to check for all the bugs that were previously
> >> discovered and fixed to make sure a further change doesn't re-introduce
> >> the bug. I haven't seen the test suite, so have no reason to doubt that
> >> there is regression testing being carried out for each release. Perhaps
> >> the developers can clarify the situation on the testing?
> >
> > I meant it in the same sense that you do. I have not seen any framework -
> > automated or otherwise - in the repository or release files to run
> > through tests for previous and/or forseeable bugs and corner cases.
>
> OK, I haven't actually checked. A "make test" feature listing all bugs
> by bugzilla ID as it goes through the testing process would go a long
> way toward providing some quality reassurance.
>
> > A test to compare cryptographic hashes of files before, after, and during
> > storage/transfer between GlusterFS clients and backends should surely
> > exist if there's any half-serious attempt at regression testing going on.
>
> One of the problems is that some tests in this case are impossible to
> carry out without having multiple nodes up and running, as a number of
> bugs have been arising in cases where nodes join/leave or cause race
> conditions. It would require a distributed test harness which would be
> difficult to implement so that they run on any client that builds the
> binaries. Just because the test harness doesn't ship with the sources
> doesn't mean it doesn't exist on a test rig the developers use.
>
> > Surely, though, if tests like these existed and were being used, after
> > the debacle with 2.0.0, they would have picked up at least the issue
> > reported in 2.0.1 before release?
>
> That depends. There are always going to be borderline or unusual use
> cases that wouldn't have been foreseen. For example, I tripped several
> issues with my usage of it for the root file system that would have been
> unlikely to arise for most people. The most odd one was the fact that
> glusterfsd wouldn't start without /tmp existing and being writable even
> though it doesn't seem to keep anything in there after startup. I only
> twigged that was what was happening when I was working on debuging it
> with Harha and for him the mounting worked when he mounted under /tmp,
> when I was mounting under /mnt. He thought it was something about /mnt
> having some kind of weird permissions issue, but then I twigged that I
> didn't actually have /tmp on my initrd bootstrap where this was being
> done on my setup. To this day I haven't seen an explanation of why /tmp
> is required and if it is a fuse requirement or gluster requirement or
> something else entirely.
>
> > That leads me to ask - where's the unit tests that are meant to exist,
> > according to http://www.gluster.org/docs/index.php/GlusterFS_QA? If they
> > exist, why (apparently) aren't tests like these still not part of them?
>
> As I explained before, you can't sensibly come up with QA tests for
> timing based issues and race conditions, because those will always be
> heisenbuggy to some extent. I'm not saying such tests should exist, and
> at least perform some hammering for extended periods that was known to
> trigger the known issues, but that only counts statistically, it won't
> provide conclusive evidence of absence of the bug.
>
> Gordan
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel