[Gluster-devel] Wrong behavior on fsync of md-cache ?
Xavier Hernandez
xhernandez at datalab.es
Tue Nov 25 08:35:25 UTC 2014
On 11/25/2014 07:38 AM, Raghavendra Gowdappa wrote:
> ----- Original Message -----
>> From: "Xavier Hernandez" <xhernandez at datalab.es>
>> To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
>> Cc: "Gluster Devel" <gluster-devel at gluster.org>, "Emmanuel Dreyfus" <manu at netbsd.org>
>> Sent: Tuesday, November 25, 2014 12:49:03 AM
>> Subject: Re: Wrong behavior on fsync of md-cache ?
>>
>> I think the problem is here: the first thing wb_fsync()
>> checks is if there's an error in the fd (wd_fd_err()). If that's the
>> case, the call is immediately unwinded with that error. The error seems
>> to be set in wb_fulfill_cbk(). I don't know the internals of write-back
>> xlator, but this seems to be the problem.
>
> Yes, your analysis is correct. Once the error is hit, fsync is not
> queued behind unfulfilled writes. Whether it can be considered as a bug
> is debatable. Since there is already an error in one of the writes which
> was written-behind fsync should return the error. I am not sure whether
> it should wait till we try to flush _all_ the writes that were written
> behind. Any suggestions on what is the expected behaviour here?
>
I think that it should wait for all pending writes. In the test case I
used, all pending writes will fail the same way that the first one, but
in other situations it's possible to have a write failing (for example
due to a damaged block in disk) and following writes succeeding.
From the man page of fsync:
fsync() transfers ("flushes") all modified in-core data of (i.e.,
modified buffer cache pages for) the file referred to by the file
descriptor fd to the disk device (or other permanent storage
device) so that all changed information can be retrieved even after
the system crashed or was rebooted. This includes writing through
or flushing a disk cache if present. The call blocks until the
device reports that the transfer has completed. It also flushes
metadata information associated with the file (see stat(2)).
As I understand it, when fsync is received all queued writes must be
sent to the device (regardless if a previous write has failed or not).
It also says that the call blocks until the device has finished all the
operations.
However it's not clear to me how to control file consistency because
this allows some writes to succeed after a failed one. I assume that
controlling this is the responsibility of the calling application that
should issue fsyncs on critical points to guarantee consistency.
Anyway it seems that there's a difference between linux and NetBSD
because this test only fails on NetBSD. Is it possible that linux's fuse
implementation delays the fsync request until all pending writes have
been answered ? this would explain why this problem has not manifested
till now. NetBSD seems to send fsync (probably as the first step of a
close() call) when the first write fails.
Xavi
More information about the Gluster-devel
mailing list