[Gluster-devel] Barrier design issues wrt volume snapshot

Krishnan Parthasarathi kparthas at redhat.com
Fri Mar 7 10:24:58 UTC 2014


To summarise the proposed solutions of the open issues, 

1) For changelog xlator to have snapshot consistent, consumable changelogs (ie., those that are rotated), it needs to 
block unlink/rename/rmdir FOPs in their call path, drain in-flight operations and ensure that the to be rotated changelog 
has the corresponding changelog entries, before acknowledging to glusterd that the brick is ready to be snapshotted. 

2) The solution for the inconsistency problem in a pure distribute problem is to be solved at server xlator's resolver code, 
where dentry operations on a given dentry are serialized. This is to be solved independent of barrier xlator. 

Feel free to add things that the summary fails to capture. 

thanks, 
Krish 

----- Original Message -----

> ----- Original Message -----

> > From: "Anand Avati" <avati at gluster.org>
> 
> > To: "Vijay Bellur" <vbellur at redhat.com>
> 
> > Cc: "Krishnan Parthasarathi" <kparthas at redhat.com>, "Anand Avati"
> > <aavati at redhat.com>, "Raghavendra Gowdappa" <rgowdapp at redhat.com>, "Varun
> > Shastry" <vshastry at redhat.com>, "Pranith Kumar Karampuri"
> > <pkarampu at redhat.com>, "Venky Shankar" <vshankar at redhat.com>, "Kaushal M"
> > <kaushal at redhat.com>, "Rajesh Joseph" <rjoseph at redhat.com>, "Kotresh
> > Hiremath Ravishankar" <khiremat at redhat.com>, gluster-devel at nongnu.org
> 
> > Sent: Friday, March 7, 2014 12:21:54 AM
> 
> > Subject: Re: [Gluster-devel] Barrier design issues wrt volume snapshot
> 

> > On Thu, Mar 6, 2014 at 12:21 AM, Vijay Bellur < vbellur at redhat.com > wrote:
> 

> > > Adding gluster-devel.
> > 
> 

> > > On 03/06/2014 01:15 PM, Krishnan Parthasarathi wrote:
> > 
> 

> > > > All,
> > > 
> > 
> 

> > > > In recent discussions around design (and implementation) of the barrier
> > > 
> > 
> 
> > > > feature, couple of things came to light.
> > > 
> > 
> 

> > > > 1) changelog xlator needs barrier xlator to block unlink and rename
> > > > FOPs
> > > 
> > 
> 
> > > > in the call path. This is apart from the current list of FOPs that are
> > > > blocked
> > > 
> > 
> 
> > > > in their call back path.
> > > 
> > 
> 
> > > > This is to make sure that the changelog has a bounded queue of unlink
> > > > and
> > > > rename FOPs,
> > > 
> > 
> 
> > > > from the time barriering is enabled, to be drained, committed to
> > > > changelog
> > > > file and published.
> > > 
> > 
> 

> > Why is this necessary?
> 

> FOPs that are still coming through after enabling barrier (assuming that
> barrier is done in the call path) would end up in a non-consumable
> changelog. For these operations, geo-rep would resort to FS crawl based on
> xtime which does not handle unlinks and renames.

> > > > 2) It is possible in a pure distribute volume that the following
> > > > sequence
> > > > of
> > > > FOPs could result
> > > 
> > 
> 
> > > > in snapshots of bricks disagreeing on inode type for a file or
> > > > directory.
> > > 
> > 
> 

> > > > t1: snap b1
> > > 
> > 
> 
> > > > t2: unlink /a
> > > 
> > 
> 
> > > > t3: mkdir /a
> > > 
> > 
> 
> > > > t4: snap b2
> > > 
> > 
> 

> > > > where, b1 and b2 are bricks of a pure distribute volume V.
> > > 
> > 
> 

> > > > The above sequence can happen with the current barrier xlator design,
> > > > since
> > > > we allow unlink FOPs
> > > 
> > 
> 
> > > > to go through to the disk and only block their acknowledgement to the
> > > > application. This implies
> > > 
> > 
> 
> > > > a concurrent mkdir on the same name could succeed, since DHT doesn't
> > > > serialize unlink and mkdir FOPs,
> > > 
> > 
> 
> > > > unlike AFR.
> > > 
> > 
> 

> > > > Avati,
> > > 
> > 
> 

> > > > I hear that you have a solution for problem 2). Could you please start
> > > > the
> > > > discussion on this thread?
> > > 
> > 
> 
> > > > It would help us to decide how to go about with the barrier xlator
> > > > implementation.
> > > 
> > 
> 

> > The solution is really a long pending implementation of dentry
> > serialization
> > in the resolver of protocol server. Today we allow multiple FOPs to happen
> > in parallel which modify the same dentry. This results in hairy races
> > (including non atomicity of rename) and has been kept open for a while now.
> > Implementing the dentry serialization in the resolver will "solve" 2 as a
> > side effect. Hence that is a better approach than making changes in the
> > barrier translator.
> 

> > Avati
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140307/c5222ded/attachment-0001.html>


More information about the Gluster-devel mailing list