[Gluster-devel] Gluster Sharding and Geo-replication

Shyam srangana at redhat.com
Thu Sep 3 14:44:07 UTC 2015


On 09/03/2015 02:43 AM, Krutika Dhananjay wrote:
>
>
> ------------------------------------------------------------------------
>
>     *From: *"Shyam" <srangana at redhat.com>
>     *To: *"Krutika Dhananjay" <kdhananj at redhat.com>
>     *Cc: *"Aravinda" <avishwan at redhat.com>, "Gluster Devel"
>     <gluster-devel at gluster.org>
>     *Sent: *Wednesday, September 2, 2015 11:13:55 PM
>     *Subject: *Re: [Gluster-devel] Gluster Sharding and Geo-replication
>
>     On 09/02/2015 10:47 AM, Krutika Dhananjay wrote:
>      >
>      >
>      >
>     ------------------------------------------------------------------------
>      >
>      >     *From: *"Shyam" <srangana at redhat.com>
>      >     *To: *"Aravinda" <avishwan at redhat.com>, "Gluster Devel"
>      >     <gluster-devel at gluster.org>
>      >     *Sent: *Wednesday, September 2, 2015 8:09:55 PM
>      >     *Subject: *Re: [Gluster-devel] Gluster Sharding and
>     Geo-replication
>      >
>      >     On 09/02/2015 03:12 AM, Aravinda wrote:
>      >      > Geo-replication and Sharding Team today discussed about
>     the approach
>      >      > to make Sharding aware Geo-replication. Details are as below
>      >      >
>      >      > Participants: Aravinda, Kotresh, Krutika, Rahul Hinduja,
>     Vijay Bellur
>      >      >
>      >      > - Both Master and Slave Volumes should be Sharded Volumes
>     with same
>      >      >    configurations.
>      >
>      >     If I am not mistaken, geo-rep supports replicating to a
>     non-gluster
>      >     local FS at the slave end. Is this correct? If so, would this
>      >     limitation
>      >     not make that problematic?
>      >
>      >     When you state *same configuration*, I assume you mean the
>     sharding
>      >     configuration, not the volume graph, right?
>      >
>      > That is correct. The only requirement is for the slave to have shard
>      > translator (for, someone needs to present aggregated view of the
>     file to
>      > the READers on the slave).
>      > Also the shard-block-size needs to be kept same between master and
>      > slave. Rest of the configuration (like the number of subvols of
>     DHT/AFR)
>      > can vary across master and slave.
>
>     Do we need to have the sharded block size the same? As I assume the
>     file
>     carries an xattr that contains the size it is sharded with
>     (trusted.glusterfs.shard.block-size), so if this is synced across, it
>     would do. If this is true, what it would mean is that "a sharded volume
>     needs a shard supported slave to ge-rep to".
>
> Yep. Even I feel it should probably not be necessary to enforce
> same-shard-size-everywhere as long as shard translator on the slave
> takes care not to further "shard" the individual shards gsyncD would
> write to, on the slave volume.
> This is especially true if different files/images/vdisks on the master
> volume are associated with different block sizes.
> This logic has to be built into the shard translator based on parameters
> (client-pid, parent directory of the file being written to).
> What this means is that shard-block-size attribute on the slave would
> essentially be a don't-care parameter. I need to give all this some more
> thought though.

Understood thanks.

>
> -Krutika
>
>      >
>      > -Krutika
>      >
>      >
>      >
>      >      > - In Changelog record changes related to Sharded files
>     also. Just
>      >     like
>      >      >    any regular files.
>      >      > - Sharding should allow Geo-rep to list/read/write
>     Sharding internal
>      >      >    Xattrs if Client PID is gsyncd(-1)
>      >      > - Sharding should allow read/write of Sharded files(that is in
>      >     .shards
>      >      >    directory) if Client PID is GSYNCD
>      >      > - Sharding should return actual file instead of returning the
>      >      >    aggregated content when the Main file is
>     requested(Client PID
>      >      >    GSYNCD)
>      >      >
>      >      > For example, a file f1 is created with GFID G1.
>      >      >
>      >      > When the file grows it gets sharded into chunks(say 5 chunks).
>      >      >
>      >      >      f1   G1
>      >      >      .shards/G1.1   G2
>      >      >      .shards/G1.2   G3
>      >      >      .shards/G1.3   G4
>      >      >      .shards/G1.4   G5
>      >      >
>      >      > In Changelog, this is recorded as 5 different files as below
>      >      >
>      >      >      CREATE G1 f1
>      >      >      DATA G1
>      >      >      META G1
>      >      >      CREATE G2 PGS/G1.1
>      >      >      DATA G2
>      >      >      META G1
>      >      >      CREATE G3 PGS/G1.2
>      >      >      DATA G3
>      >      >      META G1
>      >      >      CREATE G4 PGS/G1.3
>      >      >      DATA G4
>      >      >      META G1
>      >      >      CREATE G5 PGS/G1.4
>      >      >      DATA G5
>      >      >      META G1
>      >      >
>      >      > Where PGS is GFID of .shards directory.
>      >      >
>      >      > Geo-rep will create these files independently in Slave
>     Volume and
>      >      > syncs Xattrs of G1. Data can be read only when all the
>     chunks are
>      >      > synced to Slave Volume. Data can be read partially if
>     main/first file
>      >      > and some of the chunks synced to Slave.
>      >      >
>      >      > Please add if I missed anything. C & S Welcome.
>      >      >
>      >      > regards
>      >      > Aravinda
>      >      >
>      >      > On 08/11/2015 04:36 PM, Aravinda wrote:
>      >      >> Hi,
>      >      >>
>      >      >> We are thinking different approaches to add support in
>      >     Geo-replication
>      >      >> for Sharded Gluster Volumes[1]
>      >      >>
>      >      >> *Approach 1: Geo-rep: Sync Full file*
>      >      >>    - In Changelog only record main file details in the
>     same brick
>      >      >> where it is created
>      >      >>    - Record as DATA in Changelog whenever any
>     addition/changes
>      >     to the
>      >      >> sharded file
>      >      >>    - Geo-rep rsync will do checksum as a full file from
>     mount and
>      >      >> syncs as new file
>      >      >>    - Slave side sharding is managed by Slave Volume
>      >      >> *Approach 2: Geo-rep: Sync sharded file separately*
>      >      >>    - Geo-rep rsync will do checksum for sharded files only
>      >      >>    - Geo-rep syncs each sharded files independently as
>     new files
>      >      >>    - [UNKNOWN] Sync internal xattrs(file size and block
>     count)
>      >     in the
>      >      >> main sharded file to Slave Volume to maintain the same
>     state as
>      >     in Master.
>      >      >>    - Sharding translator to allow file creation under .shards
>      >     dir for
>      >      >> gsyncd. that is Parent GFID is .shards directory
>      >      >>    - If sharded files are modified during Geo-rep run may
>     end up
>      >     stale
>      >      >> data in Slave.
>      >      >>    - Files on Slave Volume may not be readable unless all
>     sharded
>      >      >> files sync to Slave(Each bricks in Master independently sync
>      >     files to
>      >      >> slave)
>      >      >>
>      >      >> First approach looks more clean, but we have to analize
>     the Rsync
>      >      >> checksum performance on big files(Sharded in backend,
>     accessed
>      >     as one
>      >      >> big file from rsync)
>      >      >>
>      >      >> Let us know your thoughts. Thanks
>      >      >>
>      >      >> Ref:
>      >      >> [1]
>      >      >>
>      >
>     http://www.gluster.org/community/documentation/index.php/Features/sharding-xlator
>      >      >> --
>      >      >> regards
>      >      >> Aravinda
>      >      >>
>      >      >>
>      >      >> _______________________________________________
>      >      >> Gluster-devel mailing list
>      >      >> Gluster-devel at gluster.org
>      >      >> http://www.gluster.org/mailman/listinfo/gluster-devel
>      >      >
>      >      >
>      >      >
>      >      > _______________________________________________
>      >      > Gluster-devel mailing list
>      >      > Gluster-devel at gluster.org
>      >      > http://www.gluster.org/mailman/listinfo/gluster-devel
>      >      >
>      >     _______________________________________________
>      >     Gluster-devel mailing list
>      >     Gluster-devel at gluster.org
>      >     http://www.gluster.org/mailman/listinfo/gluster-devel
>      >
>      >
>
>


More information about the Gluster-devel mailing list