[Gluster-users] glusterfs performance issues

Tue Jan 8 12:04:48 UTC 2013

On 1/7/13 8:06 PM, Stephan von Krawczynski wrote:
> Joe Julian <joe at julianfamily.org> wrote:
>> Your app wants to append to the file again. It calls stat on the file. 
>> Brick2 answers first stating that the file is 4k long. Your app seeks to 
>> 4k and writes. Now the data you wrote before is gone.
> 
> Forgive my ignorance, but it obvious that this implementation of a stat on a
> replicating fs is shit. Of course a stat should await _all_ returning local
> stats and should choose the stat of the _latest_ file version and note that
> the file needs self heal.

Ignorance is fine, but your rudeness is (still) unwelcome.  If O_APPEND is set,
that is passed through so we don't need a stat at all to ensure that data is
written at EOF.  If you actually do a stat/write combo without O_APPEND as Joe
suggests, then the there's an inherent race between those two separate
operations and the neither doing the stat on all replicas nor anything else in
POSIX (other than locking) will avoid it.  Your "obvious" answer is wrong.

> self-heal is no answer to this question. The only valid answer is choosing the
> _latest_ file version no matter if self heal is necessary or not.

Timestamps are totally unreliable as a conflict resolution mechanism.  Even if
one were to accept the dependency on time synchronization, there's still the
possibility of drift as yet uncorrected by the synchronization protocol.  The
change logs used by self heal are the *only* viable solution here.  If you want
to participate constructively, we could have a discussion about how those
change logs should be set and checked, and whether a brick should be allowed to
respond to requests for a file between coming up and completion of at least one
self-heal check (Mario's example would be a good one to follow), but insisting
on even less reliable methods isn't going to help.