[Gluster-devel] Gluster Sharding and Geo-replication
Aravinda
avishwan at redhat.com
Wed Sep 2 18:09:23 UTC 2015
On 09/02/2015 11:13 PM, Shyam wrote:
> On 09/02/2015 10:47 AM, Krutika Dhananjay wrote:
>>
>>
>> ------------------------------------------------------------------------
>>
>> *From: *"Shyam" <srangana at redhat.com>
>> *To: *"Aravinda" <avishwan at redhat.com>, "Gluster Devel"
>> <gluster-devel at gluster.org>
>> *Sent: *Wednesday, September 2, 2015 8:09:55 PM
>> *Subject: *Re: [Gluster-devel] Gluster Sharding and Geo-replication
>>
>> On 09/02/2015 03:12 AM, Aravinda wrote:
>> > Geo-replication and Sharding Team today discussed about the
>> approach
>> > to make Sharding aware Geo-replication. Details are as below
>> >
>> > Participants: Aravinda, Kotresh, Krutika, Rahul Hinduja, Vijay
>> Bellur
>> >
>> > - Both Master and Slave Volumes should be Sharded Volumes with
>> same
>> > configurations.
>>
>> If I am not mistaken, geo-rep supports replicating to a non-gluster
>> local FS at the slave end. Is this correct? If so, would this
>> limitation
>> not make that problematic?
>>
>> When you state *same configuration*, I assume you mean the sharding
>> configuration, not the volume graph, right?
>>
>> That is correct. The only requirement is for the slave to have shard
>> translator (for, someone needs to present aggregated view of the file to
>> the READers on the slave).
>> Also the shard-block-size needs to be kept same between master and
>> slave. Rest of the configuration (like the number of subvols of DHT/AFR)
>> can vary across master and slave.
>
> Do we need to have the sharded block size the same? As I assume the
> file carries an xattr that contains the size it is sharded with
> (trusted.glusterfs.shard.block-size), so if this is synced across, it
> would do. If this is true, what it would mean is that "a sharded
> volume needs a shard supported slave to ge-rep to".
Yes. Number of bricks and replica count can be different. But sharded
block size should be same. Only the first file will have
xattr(trusted.glusterfs.shard.block-size), Geo-rep should sync this
xattr also to Slave. Only Gsyncd can read/write the sharded chunks.
Sharded Slave Volume is required to understand these chunks when
read(non Gsyncd clients)
>
>>
>> -Krutika
>>
>>
>>
>> > - In Changelog record changes related to Sharded files also. Just
>> like
>> > any regular files.
>> > - Sharding should allow Geo-rep to list/read/write Sharding
>> internal
>> > Xattrs if Client PID is gsyncd(-1)
>> > - Sharding should allow read/write of Sharded files(that is in
>> .shards
>> > directory) if Client PID is GSYNCD
>> > - Sharding should return actual file instead of returning the
>> > aggregated content when the Main file is requested(Client PID
>> > GSYNCD)
>> >
>> > For example, a file f1 is created with GFID G1.
>> >
>> > When the file grows it gets sharded into chunks(say 5 chunks).
>> >
>> > f1 G1
>> > .shards/G1.1 G2
>> > .shards/G1.2 G3
>> > .shards/G1.3 G4
>> > .shards/G1.4 G5
>> >
>> > In Changelog, this is recorded as 5 different files as below
>> >
>> > CREATE G1 f1
>> > DATA G1
>> > META G1
>> > CREATE G2 PGS/G1.1
>> > DATA G2
>> > META G1
>> > CREATE G3 PGS/G1.2
>> > DATA G3
>> > META G1
>> > CREATE G4 PGS/G1.3
>> > DATA G4
>> > META G1
>> > CREATE G5 PGS/G1.4
>> > DATA G5
>> > META G1
>> >
>> > Where PGS is GFID of .shards directory.
>> >
>> > Geo-rep will create these files independently in Slave Volume and
>> > syncs Xattrs of G1. Data can be read only when all the chunks are
>> > synced to Slave Volume. Data can be read partially if
>> main/first file
>> > and some of the chunks synced to Slave.
>> >
>> > Please add if I missed anything. C & S Welcome.
>> >
>> > regards
>> > Aravinda
>> >
>> > On 08/11/2015 04:36 PM, Aravinda wrote:
>> >> Hi,
>> >>
>> >> We are thinking different approaches to add support in
>> Geo-replication
>> >> for Sharded Gluster Volumes[1]
>> >>
>> >> *Approach 1: Geo-rep: Sync Full file*
>> >> - In Changelog only record main file details in the same
>> brick
>> >> where it is created
>> >> - Record as DATA in Changelog whenever any addition/changes
>> to the
>> >> sharded file
>> >> - Geo-rep rsync will do checksum as a full file from mount
>> and
>> >> syncs as new file
>> >> - Slave side sharding is managed by Slave Volume
>> >> *Approach 2: Geo-rep: Sync sharded file separately*
>> >> - Geo-rep rsync will do checksum for sharded files only
>> >> - Geo-rep syncs each sharded files independently as new files
>> >> - [UNKNOWN] Sync internal xattrs(file size and block count)
>> in the
>> >> main sharded file to Slave Volume to maintain the same state as
>> in Master.
>> >> - Sharding translator to allow file creation under .shards
>> dir for
>> >> gsyncd. that is Parent GFID is .shards directory
>> >> - If sharded files are modified during Geo-rep run may end up
>> stale
>> >> data in Slave.
>> >> - Files on Slave Volume may not be readable unless all
>> sharded
>> >> files sync to Slave(Each bricks in Master independently sync
>> files to
>> >> slave)
>> >>
>> >> First approach looks more clean, but we have to analize the
>> Rsync
>> >> checksum performance on big files(Sharded in backend, accessed
>> as one
>> >> big file from rsync)
>> >>
>> >> Let us know your thoughts. Thanks
>> >>
>> >> Ref:
>> >> [1]
>> >>
>> http://www.gluster.org/community/documentation/index.php/Features/sharding-xlator
>> >> --
>> >> regards
>> >> Aravinda
>> >>
>> >>
>> >> _______________________________________________
>> >> Gluster-devel mailing list
>> >> Gluster-devel at gluster.org
>> >> http://www.gluster.org/mailman/listinfo/gluster-devel
>> >
>> >
>> >
>> > _______________________________________________
>> > Gluster-devel mailing list
>> > Gluster-devel at gluster.org
>> > http://www.gluster.org/mailman/listinfo/gluster-devel
>> >
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>>
regards
Aravinda
More information about the Gluster-devel
mailing list