[Gluster-devel] Wrong behavior on fsync of md-cache ?

Tue Nov 25 06:38:20 UTC 2014

----- Original Message -----
> From: "Xavier Hernandez" <xhernandez at datalab.es>
> To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>, "Emmanuel Dreyfus" <manu at netbsd.org>
> Sent: Tuesday, November 25, 2014 12:49:03 AM
> Subject: Re: Wrong behavior on fsync of md-cache ?
> 
> 
> 
> On 24.11.2014 18:53, Raghavendra Gowdappa wrote:
> 
> > ----- Original
> Message -----
> > 
> >> From: "Xavier Hernandez" <xhernandez at datalab.es [1]>
> To: "Gluster Devel" <gluster-devel at gluster.org [2]>, "Raghavendra
> Gowdappa" <rgowdapp at redhat.com [3]> Cc: "Emmanuel Dreyfus"
> <manu at netbsd.org [4]> Sent: Monday, November 24, 2014 11:05:57 PM
> Subject: Wrong behavior on fsync of md-cache ? Hi, I have an issue in ec
> caused by what seems an incorrect behavior in md-cache, at least in
> NetBSD (on linux this doesn't seem to happen). The problem happens when
> multiple writes are sent in parallel and one of them fails with an
> error. After the error, an fsync is issued, before all pending writes
> are completed. The problem is that this fsync request is not propagated
> through the xlator stack: md-cache automatically answers it with the
> same error code returned by the last write, but it does not wait for all
> pending writes to finish.
> > 
> > Are you sure that fsync is
> short-circuited in md-cache. Looking at mdc_fsync I can see that fsync
> is wound down the xlator stack unconditionally.
> 
> Well, I didn't looked
> at the code. I assumed that since disabling md-stat it worked
> (performace.stat-prefetch off), the problem was there. Sorry.
> 
> >
> write-behind flushes all pending writes before fsync is wound down the
> xlator stack.
> 
> I think the problem is here: the first thing wb_fsync()
> checks is if there's an error in the fd (wd_fd_err()). If that's the
> case, the call is immediately unwinded with that error. The error seems
> to be set in wb_fulfill_cbk(). I don't know the internals of write-back
> xlator, but this seems to be the problem.

Yes, your analysis is correct. Once the error is hit, fsync is not queued behind unfulfilled writes. Whether it can be considered as a bug is debatable. Since there is already an error in one of the writes which was written-behind fsync should return the error. I am not sure whether it should wait till we try to flush _all_ the writes that were written behind. Any suggestions on what is the expected behaviour here?

> 
> I'm not sure why disabling
> md-cache the problem disappeared. Maybe I've made a mistake and I
> disabled write-back instead. I'll check it again tomorrow.
> 
> Are you sure
> fsync is sent by kernel to glusterfs? May be because of a stale stat
> information kernel never issues fsync? You can load a debug/trace xlator
> just above io-stats and check whether you get fsync call (you can also
> dump fuse to glust> .
> > 
> > I've seen this lines in log file:
> > 
> >
> [2014-11-24 16:18:29.348552] T [fuse-bridge.c:2457:fuse_fsync_resume]
> 0-glusterfs-fuse: 395: FSYNC 0xbb242268
> > [2014-11-24 16:18:29.348663] W
> [fuse-bridge.c:1261:fuse_err_cbk] 0-glusterfs-fuse: 395: FSYNC() ERR =>
> -1 (Disc quota exceeded)
> > 
> > Th
>  in between. I assume that this means
> that the kernel has sent the FSYNC request and someone has returned
> EDQUOT error immediately (I log a message if FSYNC reaches ec).
> 
> Xavi
> 
> 
> 
> Links:
> ------
> [1] mailto:xhernandez at datalab.es
> [2]
> mailto:gluster-devel at gluster.org
> [3] mailto:rgowdapp at redhat.com
> [4]
> mailto:manu at netbsd.org
>