[Gluster-devel] Gluster Sharding and Geo-replication

Thu Sep 3 02:59:37 UTC 2015

On Wed, Sep 2, 2015 at 11:39 PM, Aravinda <avishwan at redhat.com> wrote:
>
> On 09/02/2015 11:13 PM, Shyam wrote:
>>
>> On 09/02/2015 10:47 AM, Krutika Dhananjay wrote:
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>     *From: *"Shyam" <srangana at redhat.com>
>>>     *To: *"Aravinda" <avishwan at redhat.com>, "Gluster Devel"
>>>     <gluster-devel at gluster.org>
>>>     *Sent: *Wednesday, September 2, 2015 8:09:55 PM
>>>     *Subject: *Re: [Gluster-devel] Gluster Sharding and Geo-replication
>>>
>>>     On 09/02/2015 03:12 AM, Aravinda wrote:
>>>      > Geo-replication and Sharding Team today discussed about the
>>> approach
>>>      > to make Sharding aware Geo-replication. Details are as below
>>>      >
>>>      > Participants: Aravinda, Kotresh, Krutika, Rahul Hinduja, Vijay
>>> Bellur
>>>      >
>>>      > - Both Master and Slave Volumes should be Sharded Volumes with
>>> same
>>>      >    configurations.
>>>
>>>     If I am not mistaken, geo-rep supports replicating to a non-gluster
>>>     local FS at the slave end. Is this correct? If so, would this
>>>     limitation
>>>     not make that problematic?
>>>
>>>     When you state *same configuration*, I assume you mean the sharding
>>>     configuration, not the volume graph, right?
>>>
>>> That is correct. The only requirement is for the slave to have shard
>>> translator (for, someone needs to present aggregated view of the file to
>>> the READers on the slave).
>>> Also the shard-block-size needs to be kept same between master and
>>> slave. Rest of the configuration (like the number of subvols of DHT/AFR)
>>> can vary across master and slave.
>>
>>
>> Do we need to have the sharded block size the same? As I assume the file
>> carries an xattr that contains the size it is sharded with
>> (trusted.glusterfs.shard.block-size), so if this is synced across, it would
>> do. If this is true, what it would mean is that "a sharded volume needs a
>> shard supported slave to ge-rep to".
>
> Yes. Number of bricks and replica count can be different. But sharded block
> size should be same. Only the first file will have
> xattr(trusted.glusterfs.shard.block-size), Geo-rep should sync this xattr
> also to Slave. Only Gsyncd can read/write the sharded chunks. Sharded Slave
> Volume is required to understand these chunks when read(non Gsyncd clients)

Even if this works I am very much is disagreement with this mechanism
of synchronization (not that I have a working solution in my head as
of now).

>
>>
>>>
>>> -Krutika
>>>
>>>
>>>
>>>      > - In Changelog record changes related to Sharded files also. Just
>>>     like
>>>      >    any regular files.
>>>      > - Sharding should allow Geo-rep to list/read/write Sharding
>>> internal
>>>      >    Xattrs if Client PID is gsyncd(-1)
>>>      > - Sharding should allow read/write of Sharded files(that is in
>>>     .shards
>>>      >    directory) if Client PID is GSYNCD
>>>      > - Sharding should return actual file instead of returning the
>>>      >    aggregated content when the Main file is requested(Client PID
>>>      >    GSYNCD)
>>>      >
>>>      > For example, a file f1 is created with GFID G1.
>>>      >
>>>      > When the file grows it gets sharded into chunks(say 5 chunks).
>>>      >
>>>      >      f1   G1
>>>      >      .shards/G1.1   G2
>>>      >      .shards/G1.2   G3
>>>      >      .shards/G1.3   G4
>>>      >      .shards/G1.4   G5
>>>      >
>>>      > In Changelog, this is recorded as 5 different files as below
>>>      >
>>>      >      CREATE G1 f1
>>>      >      DATA G1
>>>      >      META G1
>>>      >      CREATE G2 PGS/G1.1
>>>      >      DATA G2
>>>      >      META G1
>>>      >      CREATE G3 PGS/G1.2
>>>      >      DATA G3
>>>      >      META G1
>>>      >      CREATE G4 PGS/G1.3
>>>      >      DATA G4
>>>      >      META G1
>>>      >      CREATE G5 PGS/G1.4
>>>      >      DATA G5
>>>      >      META G1
>>>      >
>>>      > Where PGS is GFID of .shards directory.
>>>      >
>>>      > Geo-rep will create these files independently in Slave Volume and
>>>      > syncs Xattrs of G1. Data can be read only when all the chunks are
>>>      > synced to Slave Volume. Data can be read partially if main/first
>>> file
>>>      > and some of the chunks synced to Slave.
>>>      >
>>>      > Please add if I missed anything. C & S Welcome.
>>>      >
>>>      > regards
>>>      > Aravinda
>>>      >
>>>      > On 08/11/2015 04:36 PM, Aravinda wrote:
>>>      >> Hi,
>>>      >>
>>>      >> We are thinking different approaches to add support in
>>>     Geo-replication
>>>      >> for Sharded Gluster Volumes[1]
>>>      >>
>>>      >> *Approach 1: Geo-rep: Sync Full file*
>>>      >>    - In Changelog only record main file details in the same brick
>>>      >> where it is created
>>>      >>    - Record as DATA in Changelog whenever any addition/changes
>>>     to the
>>>      >> sharded file
>>>      >>    - Geo-rep rsync will do checksum as a full file from mount and
>>>      >> syncs as new file
>>>      >>    - Slave side sharding is managed by Slave Volume
>>>      >> *Approach 2: Geo-rep: Sync sharded file separately*
>>>      >>    - Geo-rep rsync will do checksum for sharded files only
>>>      >>    - Geo-rep syncs each sharded files independently as new files
>>>      >>    - [UNKNOWN] Sync internal xattrs(file size and block count)
>>>     in the
>>>      >> main sharded file to Slave Volume to maintain the same state as
>>>     in Master.
>>>      >>    - Sharding translator to allow file creation under .shards
>>>     dir for
>>>      >> gsyncd. that is Parent GFID is .shards directory
>>>      >>    - If sharded files are modified during Geo-rep run may end up
>>>     stale
>>>      >> data in Slave.
>>>      >>    - Files on Slave Volume may not be readable unless all sharded
>>>      >> files sync to Slave(Each bricks in Master independently sync
>>>     files to
>>>      >> slave)
>>>      >>
>>>      >> First approach looks more clean, but we have to analize the Rsync
>>>      >> checksum performance on big files(Sharded in backend, accessed
>>>     as one
>>>      >> big file from rsync)
>>>      >>
>>>      >> Let us know your thoughts. Thanks
>>>      >>
>>>      >> Ref:
>>>      >> [1]
>>>      >>
>>>
>>> http://www.gluster.org/community/documentation/index.php/Features/sharding-xlator
>>>      >> --
>>>      >> regards
>>>      >> Aravinda
>>>      >>
>>>      >>
>>>      >> _______________________________________________
>>>      >> Gluster-devel mailing list
>>>      >> Gluster-devel at gluster.org
>>>      >> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>      >
>>>      >
>>>      >
>>>      > _______________________________________________
>>>      > Gluster-devel mailing list
>>>      > Gluster-devel at gluster.org
>>>      > http://www.gluster.org/mailman/listinfo/gluster-devel
>>>      >
>>>     _______________________________________________
>>>     Gluster-devel mailing list
>>>     Gluster-devel at gluster.org
>>>     http://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>>>
>
> regards
> Aravinda
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel