[Gluster-users] GlusterFS 3.5.3 - untar: very poor performance

Mon Jun 22 02:35:13 UTC 2015

Hi Geoffrey, 

1. Was self-heal also in progress while I/O was happening on the volume? 
2. Also, there seem to be quite a few fsyncs which could possibly have slowed things down a bit. Could you disable write-behind and try 
getting the time stats one more time to eliminate the possibility of write-behind's presence causing out-of-order writes to increase the number of fsyncs 
by the replication module. 

-Krutika 
----- Original Message -----

> From: "Geoffrey Letessier" <geoffrey.letessier at cnrs.fr>
> To: gluster-users at gluster.org
> Sent: Saturday, June 20, 2015 6:04:40 AM
> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance

> Re,

> For comparison, here is the output of the same script run on a distributed
> only volume (2 servers of the 4 previously described, 2 bricks each):
> #######################################################
> ################ UNTAR time consumed ################
> #######################################################

> real 1m44.698s
> user 0m8.891s
> sys 0m8.353s

> #######################################################
> ################# DU time consumed ##################
> #######################################################

> 554M linux-4.1-rc6

> real 0m21.062s
> user 0m0.100s
> sys 0m1.040s

> #######################################################
> ################# FIND time consumed ################
> #######################################################

> 52663

> real 0m21.325s
> user 0m0.104s
> sys 0m1.054s

> #######################################################
> ################# GREP time consumed ################
> #######################################################

> 7952

> real 0m43.618s
> user 0m0.922s
> sys 0m3.626s

> #######################################################
> ################# TAR time consumed #################
> #######################################################

> real 0m50.577s
> user 0m29.745s
> sys 0m4.086s

> #######################################################
> ################# RM time consumed ##################
> #######################################################

> real 0m41.133s
> user 0m0.171s
> sys 0m2.522s

> The performances are amazing different!

> Geoffrey
> -----------------------------------------------
> Geoffrey Letessier

> Responsable informatique & ingénieur système
> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr

> Le 20 juin 2015 à 02:12, Geoffrey Letessier < geoffrey.letessier at cnrs.fr > a
> écrit :

> > Dear all,
> 

> > I just noticed on my main volume of my HPC cluster my IO operations become
> > impressively poor..
> 

> > Doing some file operations above a linux kernel sources compressed file,
> > the
> > untar operation can take more than 1/2 hours for this file (roughly 80MB
> > and
> > 52 000 files inside) as you read below:
> 
> > #######################################################
> 
> > ################ UNTAR time consumed ################
> 
> > #######################################################
> 

> > real 32m42.967s
> 
> > user 0m11.783s
> 
> > sys 0m15.050s
> 

> > #######################################################
> 
> > ################# DU time consumed ##################
> 
> > #######################################################
> 

> > 557M linux-4.1-rc6
> 

> > real 0m25.060s
> 
> > user 0m0.068s
> 
> > sys 0m0.344s
> 

> > #######################################################
> 
> > ################# FIND time consumed ################
> 
> > #######################################################
> 

> > 52663
> 

> > real 0m25.687s
> 
> > user 0m0.084s
> 
> > sys 0m0.387s
> 

> > #######################################################
> 
> > ################# GREP time consumed ################
> 
> > #######################################################
> 

> > 7952
> 

> > real 2m15.890s
> 
> > user 0m0.887s
> 
> > sys 0m2.777s
> 

> > #######################################################
> 
> > ################# TAR time consumed #################
> 
> > #######################################################
> 

> > real 1m5.551s
> 
> > user 0m26.536s
> 
> > sys 0m2.609s
> 

> > #######################################################
> 
> > ################# RM time consumed ##################
> 
> > #######################################################
> 

> > real 2m51.485s
> 
> > user 0m0.167s
> 
> > sys 0m1.663s
> 

> > For information, this volume is a distributed replicated one and is
> > composed
> > by 4 servers with 2 bricks each. Each bricks is a 12-drives RAID6 vdisk
> > with
> > nice native performances (around 1.2GBs).
> 

> > In comparison, when I use DD to generate a 100GB file on the same volume,
> > my
> > write throughput is around 1GB (client side) and 500MBs (server side)
> > because of replication:
> 
> > Client side:
> 
> > [root at node056 ~]# ifstat -i ib0
> 
> > ib0
> 
> > KB/s in KB/s out
> 
> > 3251.45 1.09e+06
> 
> > 3139.80 1.05e+06
> 
> > 3185.29 1.06e+06
> 
> > 3293.84 1.09e+06
> 
> > ...
> 

> > Server side:
> 
> > [root at lucifer ~]# ifstat -i ib0
> 
> > ib0
> 
> > KB/s in KB/s out
> 
> > 561818.1 1746.42
> 
> > 560020.3 1737.92
> 
> > 526337.1 1648.20
> 
> > 513972.7 1613.69
> 
> > ...
> 

> > DD command:
> 
> > [root at node056 ~]# dd if=/dev/zero of=/home/root/test.dd bs=1M count=100000
> 
> > 100000+0 enregistrements lus
> 
> > 100000+0 enregistrements écrits
> 
> > 104857600000 octets (105 GB) copiés, 202,99 s, 517 MB/s
> 

> > So this issue doesn’t seem coming from the network (which is Infiniband
> > technology in this case)
> 

> > You can find in attachments a set of files:
> 
> > - mybench.sh: the bench script
> 
> > - benches.txt: output of my "bench"
> 
> > - profile.txt: gluster volume profile during the "bench"
> 
> > - vol_status.txt: gluster volume status
> 
> > - vol_info.txt: gluster volume info
> 

> > Can someone help me to fix it (it’s very critical because this volume is on
> > a
> > HPC cluster in production).
> 

> > Thanks by advance,
> 
> > Geoffrey
> 
> > -----------------------------------------------
> 
> > Geoffrey Letessier
> 

> > Responsable informatique & ingénieur système
> 
> > CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
> 
> > Institut de Biologie Physico-Chimique
> 
> > 13, rue Pierre et Marie Curie - 75005 Paris
> 
> > Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr
> 
> > <benches.txt>
> 
> > <mybench.sh>
> 
> > <profile.txt>
> 
> > <vol_info.txt>
> 
> > <vol_status.txt>
> 

> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150621/8f17483d/attachment.html>