[Gluster-devel] Gluster Sharding and Geo-replication
Venky Shankar
vshankar at redhat.com
Thu Sep 3 02:59:37 UTC 2015
On Wed, Sep 2, 2015 at 11:39 PM, Aravinda <avishwan at redhat.com> wrote:
>
> On 09/02/2015 11:13 PM, Shyam wrote:
>>
>> On 09/02/2015 10:47 AM, Krutika Dhananjay wrote:
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> *From: *"Shyam" <srangana at redhat.com>
>>> *To: *"Aravinda" <avishwan at redhat.com>, "Gluster Devel"
>>> <gluster-devel at gluster.org>
>>> *Sent: *Wednesday, September 2, 2015 8:09:55 PM
>>> *Subject: *Re: [Gluster-devel] Gluster Sharding and Geo-replication
>>>
>>> On 09/02/2015 03:12 AM, Aravinda wrote:
>>> > Geo-replication and Sharding Team today discussed about the
>>> approach
>>> > to make Sharding aware Geo-replication. Details are as below
>>> >
>>> > Participants: Aravinda, Kotresh, Krutika, Rahul Hinduja, Vijay
>>> Bellur
>>> >
>>> > - Both Master and Slave Volumes should be Sharded Volumes with
>>> same
>>> > configurations.
>>>
>>> If I am not mistaken, geo-rep supports replicating to a non-gluster
>>> local FS at the slave end. Is this correct? If so, would this
>>> limitation
>>> not make that problematic?
>>>
>>> When you state *same configuration*, I assume you mean the sharding
>>> configuration, not the volume graph, right?
>>>
>>> That is correct. The only requirement is for the slave to have shard
>>> translator (for, someone needs to present aggregated view of the file to
>>> the READers on the slave).
>>> Also the shard-block-size needs to be kept same between master and
>>> slave. Rest of the configuration (like the number of subvols of DHT/AFR)
>>> can vary across master and slave.
>>
>>
>> Do we need to have the sharded block size the same? As I assume the file
>> carries an xattr that contains the size it is sharded with
>> (trusted.glusterfs.shard.block-size), so if this is synced across, it would
>> do. If this is true, what it would mean is that "a sharded volume needs a
>> shard supported slave to ge-rep to".
>
> Yes. Number of bricks and replica count can be different. But sharded block
> size should be same. Only the first file will have
> xattr(trusted.glusterfs.shard.block-size), Geo-rep should sync this xattr
> also to Slave. Only Gsyncd can read/write the sharded chunks. Sharded Slave
> Volume is required to understand these chunks when read(non Gsyncd clients)
Even if this works I am very much is disagreement with this mechanism
of synchronization (not that I have a working solution in my head as
of now).
>
>>
>>>
>>> -Krutika
>>>
>>>
>>>
>>> > - In Changelog record changes related to Sharded files also. Just
>>> like
>>> > any regular files.
>>> > - Sharding should allow Geo-rep to list/read/write Sharding
>>> internal
>>> > Xattrs if Client PID is gsyncd(-1)
>>> > - Sharding should allow read/write of Sharded files(that is in
>>> .shards
>>> > directory) if Client PID is GSYNCD
>>> > - Sharding should return actual file instead of returning the
>>> > aggregated content when the Main file is requested(Client PID
>>> > GSYNCD)
>>> >
>>> > For example, a file f1 is created with GFID G1.
>>> >
>>> > When the file grows it gets sharded into chunks(say 5 chunks).
>>> >
>>> > f1 G1
>>> > .shards/G1.1 G2
>>> > .shards/G1.2 G3
>>> > .shards/G1.3 G4
>>> > .shards/G1.4 G5
>>> >
>>> > In Changelog, this is recorded as 5 different files as below
>>> >
>>> > CREATE G1 f1
>>> > DATA G1
>>> > META G1
>>> > CREATE G2 PGS/G1.1
>>> > DATA G2
>>> > META G1
>>> > CREATE G3 PGS/G1.2
>>> > DATA G3
>>> > META G1
>>> > CREATE G4 PGS/G1.3
>>> > DATA G4
>>> > META G1
>>> > CREATE G5 PGS/G1.4
>>> > DATA G5
>>> > META G1
>>> >
>>> > Where PGS is GFID of .shards directory.
>>> >
>>> > Geo-rep will create these files independently in Slave Volume and
>>> > syncs Xattrs of G1. Data can be read only when all the chunks are
>>> > synced to Slave Volume. Data can be read partially if main/first
>>> file
>>> > and some of the chunks synced to Slave.
>>> >
>>> > Please add if I missed anything. C & S Welcome.
>>> >
>>> > regards
>>> > Aravinda
>>> >
>>> > On 08/11/2015 04:36 PM, Aravinda wrote:
>>> >> Hi,
>>> >>
>>> >> We are thinking different approaches to add support in
>>> Geo-replication
>>> >> for Sharded Gluster Volumes[1]
>>> >>
>>> >> *Approach 1: Geo-rep: Sync Full file*
>>> >> - In Changelog only record main file details in the same brick
>>> >> where it is created
>>> >> - Record as DATA in Changelog whenever any addition/changes
>>> to the
>>> >> sharded file
>>> >> - Geo-rep rsync will do checksum as a full file from mount and
>>> >> syncs as new file
>>> >> - Slave side sharding is managed by Slave Volume
>>> >> *Approach 2: Geo-rep: Sync sharded file separately*
>>> >> - Geo-rep rsync will do checksum for sharded files only
>>> >> - Geo-rep syncs each sharded files independently as new files
>>> >> - [UNKNOWN] Sync internal xattrs(file size and block count)
>>> in the
>>> >> main sharded file to Slave Volume to maintain the same state as
>>> in Master.
>>> >> - Sharding translator to allow file creation under .shards
>>> dir for
>>> >> gsyncd. that is Parent GFID is .shards directory
>>> >> - If sharded files are modified during Geo-rep run may end up
>>> stale
>>> >> data in Slave.
>>> >> - Files on Slave Volume may not be readable unless all sharded
>>> >> files sync to Slave(Each bricks in Master independently sync
>>> files to
>>> >> slave)
>>> >>
>>> >> First approach looks more clean, but we have to analize the Rsync
>>> >> checksum performance on big files(Sharded in backend, accessed
>>> as one
>>> >> big file from rsync)
>>> >>
>>> >> Let us know your thoughts. Thanks
>>> >>
>>> >> Ref:
>>> >> [1]
>>> >>
>>>
>>> http://www.gluster.org/community/documentation/index.php/Features/sharding-xlator
>>> >> --
>>> >> regards
>>> >> Aravinda
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> Gluster-devel mailing list
>>> >> Gluster-devel at gluster.org
>>> >> http://www.gluster.org/mailman/listinfo/gluster-devel
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > Gluster-devel mailing list
>>> > Gluster-devel at gluster.org
>>> > http://www.gluster.org/mailman/listinfo/gluster-devel
>>> >
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>>>
>
> regards
> Aravinda
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list