[Gluster-devel] Choice of Translator question

Gareth Bult gareth at encryptec.net
Thu Dec 27 18:59:04 UTC 2007


>I'm not sure.  It could very well depend on which version you are using, and where you read that.  I'm sure some features listed in the wiki are only implemented in the TLA releases until they put out the next point release.

Sure, however I have noticed that the documentation is not sync'd to the actual code very well.
It would be *really* nice to be pointed at a document that (for example) lists changes between 1.3.7 and TLA(s).
(so I could see if I needed to go to TLA to get a working config ..)

>Agreed, which is why I just showed the single file self-heal method, since in your case targeted self heal (maybe before a full filesystem self heal) might be more useful.

Sorry, I was mixing moans .. on the one hand there's no log hence no automatic detection of out of date files (which means you need a manual scan), and secondly, doing a full self-heal on a large file-system "can" be prohibitively "expensive" ...

I'm vaguely wondering if it would be possible to have a "log" translator that wrote changes to a namespace volume for quick recovery following a node restart. (as an option of course)

>I would expect AFR over stripe to replicate the whole file on inconsistent AFR versions, but I would have though stripe over AFR would work, as the AFR should only be seeing chunks of files.

Well .. it doesn't "seem" to, unless my config is horribly wrong to the extent that works and behaves normally, until going into self-heal mode ... (!)

>I don't see how the AFR could even be aware the chunks belong to the same file, so how it would know to replicate all the chunks of a file is a bit of a mystery to me.  I will admit I haven't done much with the stripe translator though, so my understanding of it's operation may wrong.

Mmm, trouble is there's nothing definitive in the documentation either way .. I'm wondering whether it's a known critical omission which is why it's not been documented (!) At the moment stripe is pretty useless without self-heal (i.e. AFR). AFR is pretty useless without stripe for anyone with large files. (which I'm guessing is why stripe was implemented after all the "stripe is bad" documentation) If the the two don't play well and a self-heal on a large file means a 1TB network data transfer - this would strike me as a show stopper.

>Do you mean that a change to a stripe replicates the entire file?

During "normal" operation, stripe updates seem to work fine.
The problem is they don't seem to know what to update on a self heal, and as a result an entire stripe is copied on a self heal. I guess if you could generate a config file with '000's of stripes, this might produce "Ok" results with regards to self-heal times, i.e. generate '000's of stripes .. but for GB's over two stripes is a nightmare .. restarting glusterfsd following a config change means GB's of copy when the self-heal kicks in ...

>Understood.  I'll have to actually try this when I have some time, instead of just doing some armchair theorizing.

Sure .. I think my tests were "proper" .. although I might try them on TLA just to make sure.

Just thinking logically for a second, for AFR to do chunk level self-heal, there must be a chunk level signature store somewhere.
... where would this be ?

>Well, it depends on your goal.  I only suggested rsync for when a node was offline for quite a while, which meant a large number of stripe components would have needed to be updates, requiring a long sync time.   If it was a quick outage (glusterfs restart or system reboot), it wouldn't be needed.  Think of it as a jumpstart on the self-heal process without blocking.

Ok, goal #1 is not to have specific "rsync" configurations for each node (!)

>This, of course, was assuming that the stripe of AFR setup works.

Which I don't believe it does in this context ...

>Because I'm not a dev, and have no control over this.  ;)  Yes, I would like this feature as well, although I can imagine a couple of snags that can make it problematic to implement.

:)
It's one of those .. if we crash a node once a month I can take a 50Gb self-heal hit .. but we're going to be changing the configs daily (or potentially more frequently) hence it's unsustainable to run without it.

I guess it comes down to the developer's aims .. (a) clustered fs for research or (b) clustered fs for day-to-day usage.

>Was this on AFR over stripe or stripe over AFR?

Logic told me it must be AFR over stipe, but I tries it both ways round ..

>The GlusterFS provided fuse is supposed to have some better default values for certain variables relating to transfer block size or some such that optimize it for glusterfs, and it's probably what they test against, so it's what I've been using.

Sure, but I think this is performance rather than stability (?)

Gareth.







More information about the Gluster-devel mailing list