[Gluster-devel] Df, ls, reads slow during I/O

Einar Gautun einar.gautun at statkart.no
Tue May 27 15:10:17 UTC 2008


On Mon, 2008-05-26 at 13:17 -0700, Jordan Mendler wrote:
> I have optimized my glusterfs system pretty well for writes, but I am having
> an issue where doing an ls is very slow, and during I/O operations df takes
> forever and often hangs on glusterfs. Below are some examples of the slow ls
> (at the time of the tests, there are no filesystem operations occuring other
> than the ls).
> 

There are several things to look at. A good respons in ls or even worse
ls -la is more to the desktop point of view, but important for everyday
use, also at my place. You may have to sacrifie some speed. A couple of
figures from old HW, AMD Athlon 2,0 GHz and Intel 1,3 Ghz in unify, 2,8
TB space:

ls:

real    0m0.022s
user    0m0.000s
sys     0m0.000s

ls -la:
real    0m1.043s
user    0m0.000s
sys     0m0.000s

Not with AFR yet.

Some hints on the servers:

1: Timer frequency in kernel. I use 250Hz, seems to be more resonsive
than 100 Hz, and "Preemtion Model - Dektop", "No Preemt The Big Kernel
Lock". 

2: Network: 2 switches, GigE. One for ntp, ssh, namespace and all other
traffic, one link each computer, 1500 mtu.
One switch with trunking/bonding, 9000 mtu, private IP range just for
the storage in glusterfs. That made a much better response, especially
under load. This switches are Cisco, and try the different load
balancing schemas. Be sure to use "get a new port in the trunk every
time from anyhost", and not to bind a host to the same port in the trunk
every time. On linux use rr in the bond. 

Intel has good NICs with dual and quadro ports, and check the kernel
Docs for module options.

3: In /etc/sysctl.conf:
vm.dirty_background_ratio=20 (this is %. Between 3-5 or as this 20% wich
I'm trying out for now, which delays disk work for better use of
diskheads - we have small files)

vm.dirty_ratio=60 (when shall disk writes be synchron and SLOW, also in
%. Much memory can higher this value, 40 - 60 is good)

vm.overcommit_ratio=2 (don't run out of memory - no good)

4: We have 3ware pata/sata controllers. Some good adjustments in 
/etc/rc.local:
echo 64 > /sys/block/sdX/queue/max_sectors_kb (this controllers work in
64Kb stripes, wich also affects the cache)
echo 512 > /sys/block/sdX/queue/nr_requests

blockdev --setra 16384 /dev/sdX (Test this value - default is 256, good
on laptops as well)

echo deadline > /sys/block/sdX/queue/scheduler (Some say its the best
elevator for 3ware use)

5: I only use io-threads on servers, no extra on clients.


Regards
Einar


> I am finding that during heavy reads problems are much worse, though they
> also can be bad during heavy writes.
> 
> Does anyone have any ideas of how I can track this problem down? It is the
> only thing preventing our system from being production-ready. The current
> hardware setup is 2 storage servers using ALU on a 16xSATA RAID6 (8cores
> each w/ 8gb RAM), and an AFR'ed namespace on the same hardware RAIDs. My
> configuration is available at http://genome.ucla.edu/~jordan/gluster/
> 
> Thanks so much,
> Jordan
> -----------------
> 
> Examples of slow ls during no FS operations:
> 
> [root at solexa solexa]# time ls solexa_datasets/
> 070403_SLXA-EAS20_0001_FC5063  071121_HWI-EAS172_200C0             docs
> 070405_SLXA-EAS20_0005_FC5065  071203_HWI-EAS172_200BA             genomes
> 070418_SLXA-EAS20_5256         071207_HWI-EAS172_2015A
> Instruments
> 070427_SLXA-EAS20_5152         071211_HWI-EAS172_200gh             L001.tar
> 070515_SLXA-EAS20_5612         071217_HWI-EAS172_14759             L007.tar
> 070523_SLXA-EAS20_5153         071221_HWI-EAS172_200M1
> ls_d_images.output
> 070529_SLXA-EAS20_5594         080108_HWI-EAS172_200G8
> ls_d_images.output.20071029
> 070604_SLXA-EAS20_5447         080116_HWI-EAS172_14758
> ls_d_images.output.20071029.truncated
> 070608_SLXA-EAS20_FC5459       080121_HWI-EAS172_2040F
> ls_d_images.output.2007103016.truncated
> 070612_SLXA-EAS20_5646         080125_HWI-EAS172_13471R
> ls_d_images.output.2007103017.truncated
> 070621_SLXA-EAS20_5590         080128_HWI-EAS172_201UD
> ls_d_images.output.20071030.truncated
> 070625_SLXA-EAS20_5701         080204_HWI-EAS172_201ET
> ls_d_images.output.200710311329.truncated
> 070629_SLXA-EAS20_5861         080206_SLXA-EAS20_203NA
> ls_d_images.output.200710311349.truncated
> 070703_SLXA-EAS20_5731         080215_HWI-EAS172_2009B
> ls_d_images.output.200710311350.nottruncatedbutmissingdoneitems
> 070709_SLXA-EAS20_5611         080226_HWI-EAS172_205A0
> ls_d_images.output.200710311749.truncated
> 070713_SLXA-EAS20_5863         080304_HWI-EAS172_204AY
> ls_d_images.output.200711010821.truncated
> 070717_SLXA-EAS20_5699         080307_HWI-EAS172_20ANM
> ls_d_images.output.200711010845.truncated
> 070723_SLXA-EAS20_5540         080311_HWI-EAS172_204KJ
> ls_d_images.output.200711051802.truncated
> 070727_SLXA-EAS20_4606         080314_HWI-EAS172_204MG
> ls_d_images.output.clustertemp1.200711011622
> 070731_SLXA-EAS20_4611         080320_HWI-EAS172_204KB
> redo.200711061339
> 070810_SLXA-EAS20_5866         080326_HWI-EAS172_204KP             Reports
> 070814_SLXA-EAS20_5697         080401_HWI-EAS172_20CLK
> rsync_to_storage.pl
> 070817_SLXA-EAS20_5603         080407_HWI-EAS172_204MN
> Runs_Comments
> 070821_SLXA-EAS20_11290        080411_HWI-EAS172_204K1
> Shawn_basecalls_U87
> 070827_SLXA-EAS20_11279        080418_HWI-EAS172_204T7
> solexatmp0.tarfile2dirname
> 070831_SLXA-EAS20_5406         080429_HWI-EAS172_204KD
> solexatmp0.tarfile2dirname.new
> 070925_SLXA-EAS20_11296        080505_HWI-EAS172_20DYC
> solexatmp1.tarfile2dirname
> 071002_SLXA-EAS20_4977         080513_HWI-EAS172_20DYG             testdir
> 071022_HWI-EAS172_11989        080515_HWI-EAS172_20DYG_RE
> testdir.tar
> 071026_HWI-EAS172_11989reseq   080516_HWI-EAS335_305T4             test.pl
> 071030_HWI-EAS172_14517        1
> U87.fasta.bz2
> 071105_HWI-EAS172_14515        4276.fasta
> verify_JMM.log
> 071109_HWI-EAS172_14034        -al
> verify_JMM.sh
> 071115_HWI-EAS172_14055        clustertemp1.tarfiles.200711051731
> 
> real    0m2.298s
> user    0m0.001s
> sys    0m0.001s
> 
> [root at solexa solexa]# time ls solexa_datasets/070817_SLXA-EAS20_5603/Data
> C2-36_Firecrest1.8.28_04-10-2007_solexa  default_offsets.txt
> 
> real    0m6.246s
> user    0m0.001s
> sys    0m0.002s
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
-- 
Einar Gautun                     einar.gautun at statkart.no

Statens kartverk            | Norwegian Mapping Authority
3507 Hønefoss               |    NO-3507 Hønefoss, Norway

Ph +47 32118372   Fax +47 32118101       Mob +47 92692662





More information about the Gluster-devel mailing list