[Gluster-devel] Gluster Sharding and Geo-replication

Wed Sep 2 18:09:23 UTC 2015

On 09/02/2015 11:13 PM, Shyam wrote:
> On 09/02/2015 10:47 AM, Krutika Dhananjay wrote:
>>
>>
>> ------------------------------------------------------------------------
>>
>>     *From: *"Shyam" <srangana at redhat.com>
>>     *To: *"Aravinda" <avishwan at redhat.com>, "Gluster Devel"
>>     <gluster-devel at gluster.org>
>>     *Sent: *Wednesday, September 2, 2015 8:09:55 PM
>>     *Subject: *Re: [Gluster-devel] Gluster Sharding and Geo-replication
>>
>>     On 09/02/2015 03:12 AM, Aravinda wrote:
>>      > Geo-replication and Sharding Team today discussed about the 
>> approach
>>      > to make Sharding aware Geo-replication. Details are as below
>>      >
>>      > Participants: Aravinda, Kotresh, Krutika, Rahul Hinduja, Vijay 
>> Bellur
>>      >
>>      > - Both Master and Slave Volumes should be Sharded Volumes with 
>> same
>>      >    configurations.
>>
>>     If I am not mistaken, geo-rep supports replicating to a non-gluster
>>     local FS at the slave end. Is this correct? If so, would this
>>     limitation
>>     not make that problematic?
>>
>>     When you state *same configuration*, I assume you mean the sharding
>>     configuration, not the volume graph, right?
>>
>> That is correct. The only requirement is for the slave to have shard
>> translator (for, someone needs to present aggregated view of the file to
>> the READers on the slave).
>> Also the shard-block-size needs to be kept same between master and
>> slave. Rest of the configuration (like the number of subvols of DHT/AFR)
>> can vary across master and slave.
>
> Do we need to have the sharded block size the same? As I assume the 
> file carries an xattr that contains the size it is sharded with 
> (trusted.glusterfs.shard.block-size), so if this is synced across, it 
> would do. If this is true, what it would mean is that "a sharded 
> volume needs a shard supported slave to ge-rep to".
Yes. Number of bricks and replica count can be different. But sharded 
block size should be same. Only the first file will have 
xattr(trusted.glusterfs.shard.block-size), Geo-rep should sync this 
xattr also to Slave. Only Gsyncd can read/write the sharded chunks. 
Sharded Slave Volume is required to understand these chunks when 
read(non Gsyncd clients)
>
>>
>> -Krutika
>>
>>
>>
>>      > - In Changelog record changes related to Sharded files also. Just
>>     like
>>      >    any regular files.
>>      > - Sharding should allow Geo-rep to list/read/write Sharding 
>> internal
>>      >    Xattrs if Client PID is gsyncd(-1)
>>      > - Sharding should allow read/write of Sharded files(that is in
>>     .shards
>>      >    directory) if Client PID is GSYNCD
>>      > - Sharding should return actual file instead of returning the
>>      >    aggregated content when the Main file is requested(Client PID
>>      >    GSYNCD)
>>      >
>>      > For example, a file f1 is created with GFID G1.
>>      >
>>      > When the file grows it gets sharded into chunks(say 5 chunks).
>>      >
>>      >      f1   G1
>>      >      .shards/G1.1   G2
>>      >      .shards/G1.2   G3
>>      >      .shards/G1.3   G4
>>      >      .shards/G1.4   G5
>>      >
>>      > In Changelog, this is recorded as 5 different files as below
>>      >
>>      >      CREATE G1 f1
>>      >      DATA G1
>>      >      META G1
>>      >      CREATE G2 PGS/G1.1
>>      >      DATA G2
>>      >      META G1
>>      >      CREATE G3 PGS/G1.2
>>      >      DATA G3
>>      >      META G1
>>      >      CREATE G4 PGS/G1.3
>>      >      DATA G4
>>      >      META G1
>>      >      CREATE G5 PGS/G1.4
>>      >      DATA G5
>>      >      META G1
>>      >
>>      > Where PGS is GFID of .shards directory.
>>      >
>>      > Geo-rep will create these files independently in Slave Volume and
>>      > syncs Xattrs of G1. Data can be read only when all the chunks are
>>      > synced to Slave Volume. Data can be read partially if 
>> main/first file
>>      > and some of the chunks synced to Slave.
>>      >
>>      > Please add if I missed anything. C & S Welcome.
>>      >
>>      > regards
>>      > Aravinda
>>      >
>>      > On 08/11/2015 04:36 PM, Aravinda wrote:
>>      >> Hi,
>>      >>
>>      >> We are thinking different approaches to add support in
>>     Geo-replication
>>      >> for Sharded Gluster Volumes[1]
>>      >>
>>      >> *Approach 1: Geo-rep: Sync Full file*
>>      >>    - In Changelog only record main file details in the same 
>> brick
>>      >> where it is created
>>      >>    - Record as DATA in Changelog whenever any addition/changes
>>     to the
>>      >> sharded file
>>      >>    - Geo-rep rsync will do checksum as a full file from mount 
>> and
>>      >> syncs as new file
>>      >>    - Slave side sharding is managed by Slave Volume
>>      >> *Approach 2: Geo-rep: Sync sharded file separately*
>>      >>    - Geo-rep rsync will do checksum for sharded files only
>>      >>    - Geo-rep syncs each sharded files independently as new files
>>      >>    - [UNKNOWN] Sync internal xattrs(file size and block count)
>>     in the
>>      >> main sharded file to Slave Volume to maintain the same state as
>>     in Master.
>>      >>    - Sharding translator to allow file creation under .shards
>>     dir for
>>      >> gsyncd. that is Parent GFID is .shards directory
>>      >>    - If sharded files are modified during Geo-rep run may end up
>>     stale
>>      >> data in Slave.
>>      >>    - Files on Slave Volume may not be readable unless all 
>> sharded
>>      >> files sync to Slave(Each bricks in Master independently sync
>>     files to
>>      >> slave)
>>      >>
>>      >> First approach looks more clean, but we have to analize the 
>> Rsync
>>      >> checksum performance on big files(Sharded in backend, accessed
>>     as one
>>      >> big file from rsync)
>>      >>
>>      >> Let us know your thoughts. Thanks
>>      >>
>>      >> Ref:
>>      >> [1]
>>      >>
>> http://www.gluster.org/community/documentation/index.php/Features/sharding-xlator
>>      >> --
>>      >> regards
>>      >> Aravinda
>>      >>
>>      >>
>>      >> _______________________________________________
>>      >> Gluster-devel mailing list
>>      >> Gluster-devel at gluster.org
>>      >> http://www.gluster.org/mailman/listinfo/gluster-devel
>>      >
>>      >
>>      >
>>      > _______________________________________________
>>      > Gluster-devel mailing list
>>      > Gluster-devel at gluster.org
>>      > http://www.gluster.org/mailman/listinfo/gluster-devel
>>      >
>>     _______________________________________________
>>     Gluster-devel mailing list
>>     Gluster-devel at gluster.org
>>     http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>>

regards
Aravinda