[Gluster-users] CHANGELOGs and new geo-replica sync taking forever

Wade Fitzpatrick wade.fitzpatrick at ladbrokes.com.au
Thu Nov 5 00:46:02 UTC 2015


I also had problems getting geo-replication working correctly and 
eventually gave it up due to project time constraints.

What version of gluster?
What is the topology of x, xx, and xxx/xxy/xxz?

I tried a 2x2 stripe-replica with geo-replication to a 2x1 stripe using 
3.7.4. Starting replication with 32 GB of small files never completed, 
it failed several times. Starting replication with an empty volume then 
filling it with a rate limit of 2000k/s managed to keep sync until 
completion but could not handle the rate of change under normal usage.

On 5/11/2015 3:30 AM, Brian Ericson wrote:
> tl;dr -- geo-replication of ~200,000 CHANGELOG files is killing me... 
> Help!
>
> I have about 125G spread over just shy of 5000 files that I'm 
> replicating with
> geo-replication to nodes around the world.  The content is fairly 
> stable and
> probably hasn't changed at all since I initially established the 
> GlusterFS
> nodes/network, which looks as follows:
> x -> xx -> [xxx, xxy] (x geo-replicates to xx, xx geo-replicates to 
> xxx/xxy)
>
> Latency & throughput are markedly different (x -> xx is the fastest, 
> xx -> xxx
> the slowest (at about 1G/hour)). That said, all nodes were synced with 
> 5 days
> of setting up the network.
>
> I have since added another node, xxz, which is also geo-replicated 
> from xx (xx
> -> xxz). Its latency/throughput is clearly better than xx -> xxx's, 
> but over 5
> days later, I'm still replicating CHANGELOGs and haven't gotten to any 
> real
> content (the replicated volumes' mounted filesystems are empty).
>
> Starting with x, you can see I have a "reasonable" number of CHANGELOGs:
> x # find /bricks/*/.glusterfs/changelogs -name CHANGELOG\* | wc -l
> 186
>
> However, xxz's source is xx, and I've got a real problem with xx:
> xx # find /bricks/*/.glusterfs/changelogs -name CHANGELOG\* | wc -l
> 193450
>
> 5+ days into this, and I've hardly managed to dent this on xxz:
> xxz # find /bricks/*/.glusterfs/changelogs -name CHANGELOG\* | wc -l
> 43211
>
> On top of that, xx is generating new CHANGELOGs at a rate of ~6/minute 
> (two
> volumes at ~3/minute each), so chasing CHANGELOGs is a (quickly) 
> moving target.
>
> And these files are small! The "I'm alive" file is 92 bytes long, I've 
> also
> seen them also average about 4k. Demonstrating latency/throughput, you 
> can see
> that small files (for me) are a real killer:
> ### x -> xx (fastest route)
> # for i in 1 10 100 1000; do file="$( dd if=/dev/urandom bs=1024 
> count=$((4000/i)) 2> /dev/null )"; echo "$i ($(( $( echo -n "$file" | 
> wc -c )/1024 ))k): $( ( time for i in $( seq 1 $i ); do echo -n 
> "$file" | ssh xx 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 
> }' )"; done
> 1 $i ); do echo -n "$file" | ssh $location 'cat > /dev/null'; done ) 
> |& awk '/^real/{ print $2 }' )"; done
> 1 (3984k): 0m4.777s
> 10 (398k): 0m10.737s
> 100 (39k): 0m53.286s
> 1000 (3k): 7m21.493s
>
> ### xx -> xxx (slowest route)
> # for i in 1 10 100 1000; do file="$( dd if=/dev/urandom bs=1024 
> count=$((4000/i)) 2> /dev/null )"; echo "$i ($(( $( echo -n "$file" | 
> wc -c )/1024 ))k): $( ( time for i in $( seq 1 $i ); do echo -n 
> "$file" | ssh xxx 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 
> }' )"; done
> 1 (3984k): 0m11.065s
> 10 (398k): 0m41.007s
> 100 (39k): 4m52.814s
> 1000 (3k): 39m23.009s
>
> ### xx -> xxz (the route I've added and am trying to sync)
> # for i in 1 10 100 1000; do file="$( dd if=/dev/urandom bs=1024 
> count=$((4000/i)) 2> /dev/null )"; echo "$i ($(( $( echo -n "$file" | 
> wc -c )/1024 ))k): $( ( time for i in $( seq 1 $i ); do echo -n 
> "$file" | ssh xxz 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 
> }' )"; done
> 1 (3984k): 0m2.673s
> 10 (398k): 0m16.333s
> 100 (39k): 2m0.676s
> 1000 (3k): 17m28.265s
>
> What you're looking at is the cost of transferring a total of 4000k: 1 
> transfer
> at 4000k, 10 at 400k, 100 at 40k, and 1000 at 4k. With 1 transfer at under 3s 
> and 1000
> transfers at nearly 17 1/2 minutes for xx -> xxz and for the same total
> transfer size, it's really a killer to transfer CHANGELOGs, especially 
> almost
> 200,000 of them.
>
> And, 92 byte files doesn't improve this:
> ### x -> xx (fastest route)
> # file="$( dd if=/dev/urandom bs=92 count=1 2> /dev/null )"; i=100; 
> echo "$i ($(( $( echo -n "$file" | wc -c ) ))): $( ( time for i in $( 
> seq 1 $i ); do echo -n "$file" | ssh xx 'cat > /dev/null'; done ) |& 
> awk '/^real/{ print $2 }' )"
> 100 (92): 0m34.164s
>
> ### xx -> xxx (slowest route)
> # file="$( dd if=/dev/urandom bs=92 count=1 2> /dev/null )"; i=100; 
> echo "$i ($(( $( echo -n "$file" | wc -c ) ))): $( ( time for i in $( 
> seq 1 $i ); do echo -n "$file" | ssh xxx 'cat > /dev/null'; done ) |& 
> awk '/^real/{ print $2 }' )"
> 100 (92): 3m53.388s
>
> ### xx -> xxz (the route I've added and am trying to sync)
> # file="$( dd if=/dev/urandom bs=92 count=1 2> /dev/null )"; i=100; 
> echo "$i ($(( $( echo -n "$file" | wc -c ) ))): $( ( time for i in $( 
> seq 1 $i ); do echo -n "$file" | ssh xxz 'cat > /dev/null'; done ) |& 
> awk '/^real/{ print $2 }' )"
> 100 (92): 1m43.389s
>
> Questions...:
> o Why so many CHANGELOGs?
>
> o Why so slow (in 5 days, I've transferred 43211 CHANGELOGs, so 
> 43211/5/24/60=6
>   implies a real transfer rate of about 6 CHANGELOG files per minute, 
> which
>   brings me back to xx's generating new ones at about that rate...)?
>
> o What can I do to "fix" this?
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151105/9349fd65/attachment.html>


More information about the Gluster-users mailing list