[Gluster-users] Replication not working on server hang

Stephan von Krawczynski skraw at ithnet.com
Sun Aug 30 11:23:25 UTC 2009


On Sun, 30 Aug 2009 01:00:13 -0700
Anand Avati <avati at gluster.com> wrote:

> > I'm wondering if there's some way for glusterfs to detect the flaws of the
> > underlying operating system.  I believe there's no bug-free file systems in
> > the universe, so I believe it is the job of the glusterfs developer to
> > specify which underlying filesystem is tested and supported.  It's not good
> > to simply say that glusterfs works on all real-world approximations to an
> > imaginary bug-free posix  filesystem.
> 
> I would be genuinely interested to know about another project which is
> geared up to be resilient against kernel hangs so that we can borrow
> some ideas on how to reliably detect kernel soft lockups or syscall
> hangs. As far as I know, even mature projects like Apache have not
> bothered fixing such hangs (or even detecting this kind of underlying
> OS flaw).

Apache is no software thats' primary use is to overcome hardware (and
software) issues leading to offline filesystems.
You cannot compare two applications with totally different usage patterns.
And, just to say that clearly, nobody expects you to _solve_ or fix a hang.
The users only expect to _recognise_ a problem and just shut down. It is far
better to shut down without a real problem than to continue while having
one and hang. First one leads to more work at max, but second one leads to
offline service. And thats exactly why we are all here, to prevent an offline
file service.

> Avati

-- 
Regards,
Stephan



More information about the Gluster-users mailing list