[Gluster-devel] Gluster Sharding and Geo-replication

Thu Sep 3 06:33:23 UTC 2015

----- Original Message -----

> From: "Venky Shankar" <vshankar at redhat.com>
> To: "Aravinda" <avishwan at redhat.com>
> Cc: "Shyam" <srangana at redhat.com>, "Krutika Dhananjay" <kdhananj at redhat.com>,
> "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Thursday, September 3, 2015 8:29:37 AM
> Subject: Re: [Gluster-devel] Gluster Sharding and Geo-replication

> On Wed, Sep 2, 2015 at 11:39 PM, Aravinda <avishwan at redhat.com> wrote:
> >
> > On 09/02/2015 11:13 PM, Shyam wrote:
> >>
> >> On 09/02/2015 10:47 AM, Krutika Dhananjay wrote:
> >>>
> >>>
> >>>
> >>> ------------------------------------------------------------------------
> >>>
> >>> *From: *"Shyam" <srangana at redhat.com>
> >>> *To: *"Aravinda" <avishwan at redhat.com>, "Gluster Devel"
> >>> <gluster-devel at gluster.org>
> >>> *Sent: *Wednesday, September 2, 2015 8:09:55 PM
> >>> *Subject: *Re: [Gluster-devel] Gluster Sharding and Geo-replication
> >>>
> >>> On 09/02/2015 03:12 AM, Aravinda wrote:
> >>> > Geo-replication and Sharding Team today discussed about the
> >>> approach
> >>> > to make Sharding aware Geo-replication. Details are as below
> >>> >
> >>> > Participants: Aravinda, Kotresh, Krutika, Rahul Hinduja, Vijay
> >>> Bellur
> >>> >
> >>> > - Both Master and Slave Volumes should be Sharded Volumes with
> >>> same
> >>> > configurations.
> >>>
> >>> If I am not mistaken, geo-rep supports replicating to a non-gluster
> >>> local FS at the slave end. Is this correct? If so, would this
> >>> limitation
> >>> not make that problematic?
> >>>
> >>> When you state *same configuration*, I assume you mean the sharding
> >>> configuration, not the volume graph, right?
> >>>
> >>> That is correct. The only requirement is for the slave to have shard
> >>> translator (for, someone needs to present aggregated view of the file to
> >>> the READers on the slave).
> >>> Also the shard-block-size needs to be kept same between master and
> >>> slave. Rest of the configuration (like the number of subvols of DHT/AFR)
> >>> can vary across master and slave.
> >>
> >>
> >> Do we need to have the sharded block size the same? As I assume the file
> >> carries an xattr that contains the size it is sharded with
> >> (trusted.glusterfs.shard.block-size), so if this is synced across, it
> >> would
> >> do. If this is true, what it would mean is that "a sharded volume needs a
> >> shard supported slave to ge-rep to".
> >
> > Yes. Number of bricks and replica count can be different. But sharded block
> > size should be same. Only the first file will have
> > xattr(trusted.glusterfs.shard.block-size), Geo-rep should sync this xattr
> > also to Slave. Only Gsyncd can read/write the sharded chunks. Sharded Slave
> > Volume is required to understand these chunks when read(non Gsyncd clients)

> Even if this works I am very much is disagreement with this mechanism
> of synchronization (not that I have a working solution in my head as
> of now).

Hi Venky, 

It is not apparent to me what issues you see with approach 2. If you could lay them out here, it would be helpful in taking the discussions further. 

-Krutika 

> >
> >>
> >>>
> >>> -Krutika
> >>>
> >>>
> >>>
> >>> > - In Changelog record changes related to Sharded files also. Just
> >>> like
> >>> > any regular files.
> >>> > - Sharding should allow Geo-rep to list/read/write Sharding
> >>> internal
> >>> > Xattrs if Client PID is gsyncd(-1)
> >>> > - Sharding should allow read/write of Sharded files(that is in
> >>> .shards
> >>> > directory) if Client PID is GSYNCD
> >>> > - Sharding should return actual file instead of returning the
> >>> > aggregated content when the Main file is requested(Client PID
> >>> > GSYNCD)
> >>> >
> >>> > For example, a file f1 is created with GFID G1.
> >>> >
> >>> > When the file grows it gets sharded into chunks(say 5 chunks).
> >>> >
> >>> > f1 G1
> >>> > .shards/G1.1 G2
> >>> > .shards/G1.2 G3
> >>> > .shards/G1.3 G4
> >>> > .shards/G1.4 G5
> >>> >
> >>> > In Changelog, this is recorded as 5 different files as below
> >>> >
> >>> > CREATE G1 f1
> >>> > DATA G1
> >>> > META G1
> >>> > CREATE G2 PGS/G1.1
> >>> > DATA G2
> >>> > META G1
> >>> > CREATE G3 PGS/G1.2
> >>> > DATA G3
> >>> > META G1
> >>> > CREATE G4 PGS/G1.3
> >>> > DATA G4
> >>> > META G1
> >>> > CREATE G5 PGS/G1.4
> >>> > DATA G5
> >>> > META G1
> >>> >
> >>> > Where PGS is GFID of .shards directory.
> >>> >
> >>> > Geo-rep will create these files independently in Slave Volume and
> >>> > syncs Xattrs of G1. Data can be read only when all the chunks are
> >>> > synced to Slave Volume. Data can be read partially if main/first
> >>> file
> >>> > and some of the chunks synced to Slave.
> >>> >
> >>> > Please add if I missed anything. C & S Welcome.
> >>> >
> >>> > regards
> >>> > Aravinda
> >>> >
> >>> > On 08/11/2015 04:36 PM, Aravinda wrote:
> >>> >> Hi,
> >>> >>
> >>> >> We are thinking different approaches to add support in
> >>> Geo-replication
> >>> >> for Sharded Gluster Volumes[1]
> >>> >>
> >>> >> *Approach 1: Geo-rep: Sync Full file*
> >>> >> - In Changelog only record main file details in the same brick
> >>> >> where it is created
> >>> >> - Record as DATA in Changelog whenever any addition/changes
> >>> to the
> >>> >> sharded file
> >>> >> - Geo-rep rsync will do checksum as a full file from mount and
> >>> >> syncs as new file
> >>> >> - Slave side sharding is managed by Slave Volume
> >>> >> *Approach 2: Geo-rep: Sync sharded file separately*
> >>> >> - Geo-rep rsync will do checksum for sharded files only
> >>> >> - Geo-rep syncs each sharded files independently as new files
> >>> >> - [UNKNOWN] Sync internal xattrs(file size and block count)
> >>> in the
> >>> >> main sharded file to Slave Volume to maintain the same state as
> >>> in Master.
> >>> >> - Sharding translator to allow file creation under .shards
> >>> dir for
> >>> >> gsyncd. that is Parent GFID is .shards directory
> >>> >> - If sharded files are modified during Geo-rep run may end up
> >>> stale
> >>> >> data in Slave.
> >>> >> - Files on Slave Volume may not be readable unless all sharded
> >>> >> files sync to Slave(Each bricks in Master independently sync
> >>> files to
> >>> >> slave)
> >>> >>
> >>> >> First approach looks more clean, but we have to analize the Rsync
> >>> >> checksum performance on big files(Sharded in backend, accessed
> >>> as one
> >>> >> big file from rsync)
> >>> >>
> >>> >> Let us know your thoughts. Thanks
> >>> >>
> >>> >> Ref:
> >>> >> [1]
> >>> >>
> >>>
> >>> http://www.gluster.org/community/documentation/index.php/Features/sharding-xlator
> >>> >> --
> >>> >> regards
> >>> >> Aravinda
> >>> >>
> >>> >>
> >>> >> _______________________________________________
> >>> >> Gluster-devel mailing list
> >>> >> Gluster-devel at gluster.org
> >>> >> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>> >
> >>> >
> >>> >
> >>> > _______________________________________________
> >>> > Gluster-devel mailing list
> >>> > Gluster-devel at gluster.org
> >>> > http://www.gluster.org/mailman/listinfo/gluster-devel
> >>> >
> >>> _______________________________________________
> >>> Gluster-devel mailing list
> >>> Gluster-devel at gluster.org
> >>> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>>
> >>>
> >
> > regards
> > Aravinda
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20150903/d182ca33/attachment-0001.html>