[Gluster-devel] Blocking client feature request
Geoff Kassel
gkassel at users.sourceforge.net
Sun Jan 18 05:38:36 UTC 2009
Hi,
> Do you have logs/cores which can help us?
I'll try to produce some for you soon. I've been busy trying to stabilise the
affected production systems for during our high demand period, so this will
have to wait until it's a safe time to incur a deliberate outage.
(The configuration I'm running now, while prone to less crashes, is not one I
intend to keep running long-term, as only one daemon is in use across two
client machines, and without any performance translators. So I have to wait
until after peak time to try to debug the performance enhanced one
daemon-per-server configuration I normally run, that I want to work again.)
Oh - one thing I have noticed is that post-upgrade from 1.3 TLA 646, there's
been a large number of (for want of a better word) 'unhealable' files - files
that I know were present previously on at least one dataspace block, but are
now only present in the namespace block.
I mention this as there seems to be some correlation between deleting these
files and increasing the time between crashes. It doesn't seem to be as clear
cut as 'self-heal is causing the crash', as processes accessing the affected
files through the GlusterFS export doesn't cause a crash right there and
then.
It just seems to increase the risk of a crash over time. Perhaps it's some
sort of resource leak in self heal?
Anyway, hopefully the logs - when I can safely produce them - will be able to
resolve the true cause.
> Given the fact that there is a reasonably high demand for it, I think
> we should be adding this support as an option in our protocol. There
> are a few challenges with the current design (like having stateful fd)
> which will need some trickery to accommodate them across reconnects.
> So it may not be implemented immediately, but maybe in 2.1.x or 2.2.x.
Thanks for considering this.
If I had a wish list for GlusterFS, this feature would be at the top of it.
Kind regards,
Geoff Kassel.
On Sat, 17 Jan 2009, Anand Avati wrote:
> > What I've realized is that a blocking GlusterFS client would solve this
> > negative visibility problem for me while I look again at the crash
> > issues. (I've just upgraded to the latest 1.4/2.0 TLA, so my experiences
> > are relevant to the majority again. Yes, I'm still getting crashes.)
>
> Do you have logs/cores which can help us?
>
> > That way, I'd just have to restart the GlusterFS daemon(s), and my
> > running services would block, but not have to be restarted. My clients
> > would see a lack of responsiveness for up to 20 seconds, not a five to
> > ten minute outage.
> >
> > Is there any possibility of this feature being added to GlusterFS?
>
> Given the fact that there is a reasonably high demand for it, I think
> we should be adding this support as an option in our protocol. There
> are a few challenges with the current design (like having stateful fd)
> which will need some trickery to accommodate them across reconnects.
> So it may not be implemented immediately, but maybe in 2.1.x or 2.2.x.
>
> avati
More information about the Gluster-devel
mailing list