[Gluster-users] 2.0.3

Tue Jul 14 11:49:32 UTC 2009

On Tue, 14 Jul 2009 05:48:11 -0500 (CDT)
Vikas Gorur <vikas at gluster.com> wrote:

> 
> ----- "Stephan von Krawczynski" <skraw at ithnet.com> wrote:
> 
> > Maybe I should more precisely explain what I am doing, in case there
> > are related problems.
> > 
> > I am using a copy of the opensuse 11.1 DVD for that test. I copy the
> > whole DVD files into a directory called "suse" on the first server. The
> > directory "suse" is located on the top level of the gluster-exported "/p3". Then I
> > start "ls -lR" on one of two connected clients and watch it flow. In the
> > meantime you can see the directories and files being created on the second server.
> 
> > After some minutes the "ls" exits correctly. Then looking at the second
> > servers' "suse" directory reveals it has incorrect mtime at least on the "suse"
> > dir. My personal opinion is that this derives from the files being added
> > _inside_ the directory during the healing process. Nevertheless glusterfs should
> > handle that case like rsync does, too.
> 
> > Another thing you can see in this example: if you repeat "ls -lR" you
> > will notice quite some directories inside the healed "suse" tree have wrong
> > mtime (from the healing) displayed during ls. This proves that mtime
> > displayed on the client does not always come from the first server. And it proves
> > that this really is a production no-go, because your mtimes are indeed flapping
> > after this healing process.
> 
> Ah, finally! Thanks for the detailed explanation. I wasn't reproducing the problem
> because I just tried it with empty directories, and not with directories which contained
> files inside them.
> 
> The problem in a nutshell is this:
> 
> Whenever replicate self-heal creates an entry (file or directory), it does not sync
> the parent directory's mtime. The fix is to sync it after every create.

Ok, that sounds like significant overhead.
If your dir contains 10.000 files you sync the parents' mtime 10.000 times,
whereas the optimal solution would sync it only once (well, obviously).
So this cannot be the optimal strategy to solve the problem. 
Additionally one can not be content with the datapath during healing right
through the client, because the data has to be processed twice server->client,
client->server. If you are healing a huge amount of data your healing client
is quite stressed by the healing, with merely no (network) performance left
for its real job.
For sure your idea of implementation is straight forward for the case of
"healing-by-single-stating". But I heard on the list you are planning for
"auto-healing" anyways, and that would possibly be a good chance to implement
a simple user-space tool that does healing in an optimized way.

> However, we've chosen to defer the fix, since there are some changes planned
> for 2.1 that will make it very easy to fix this (it is planned for all create
> operations to return stat info for the parent directory). Fixing it now on the other hand
> will involve significant work.
> 
> You can track the progress of this issue at:
> http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=137
> Vikas

That sound like taking quite some time. Are we talking about days, weeks,
months here?

-- 
Regards,
Stephan