[Gluster-devel] Handling Failed flushes in write-behind

Tue Sep 29 11:26:33 UTC 2015

+ gluster-devel

> 
> On Tuesday 29 September 2015 04:45 PM, Raghavendra Gowdappa wrote:
> > Hi All,
> >
> > Currently on failure of flushing of writeback cache, we mark the fd bad.
> > The rationale behind this is that since the application doesn't know which
> > of the writes that are cached failed, fd is in a bad state and cannot
> > possibly do a meaningful/correct read. However, this approach (though
> > posix-complaint) is not acceptable for long standing applications like
> > QEMU [1]. So, a two part solution was decided:
> >
> > 1. No longer mark the fd bad during failures while flushing data to backend
> > from write-behind cache.
> > 2. retry the writes
> >
> > As for as 2, goes, application can checkpoint by doing fsync and on write
> > failures, roll-back to last checkpoint and replay writes from that
> > checkpoint. Or, glusterfs can retry the writes on behalf of the
> > application. However, glusterfs retrying writes cannot be a complete
> > solution as the error-condition we've run into might never get resolved
> > (For eg., running out of space). So, glusterfs has to give up after some
> > time.
> >
> > It would be helpful if you give your inputs on how other writeback systems
> > (Eg., kernel page-cache, nfs, samba, ceph, lustre etc) behave in this
> > scenario and what would be a sane policy for glusterfs.
> >
> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1200862
> >
> > regards,
> > Raghavendra
> >
> 
>