[Gluster-users] Gluster CPU Usage is at 100% | Replication throughput is at 300kb/sec on a 100 MBit Interface

Sat May 14 17:17:57 UTC 2011

Dear Gluster-Users,

i am running GlusterFS 3.2.0 since seven days. I installed GlusterFS 
from source.

Current running Kernel:
Linux cluster-001 2.6.38.5-i686-2.6.38.5 #1 SMP Fri May 6 16:19:52
Intel(R) Pentium(R) 4 CPU 1.80GHz GenuineIntel GNU/Linux

The Cluster is configured as follow:
> gluster> volume info
>
> Volume Name: users
> Type: Replicate
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: cluster-001:/mnt/data/users
> Brick2: cluster-002:/mnt/data/users
> Options Reconfigured:
> auth.allow: 10.200.0.*
>
> Volume Name: transformer
> Type: Replicate
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: cluster-001:/mnt/data/transformer
> Brick2: cluster-002:/mnt/data/transformer
> Options Reconfigured:
> auth.allow: 10.200.0.*

On the same machines i mounted the both volumes (transformer and users) 
with native GlusterFS connector:
mount -t glusterfs /etc/glusterd/vols/transformer/transformer-fuse.vol 
/mnt/glusterfs/transformer
mount -t glusterfs /etc/glusterd/vols/users/users-fuse.vol 
/mnt/glusterfs/users

Both volumes are mounted. After going to /mnt/glusterfs/transformer i 
did some performance-tests with dd by writing a 5 MB file with 512bytes 
Chunks:
> cluster-001 ~ # cd /mnt/glusterfs/transformer/
> cluster-001 transformer # date && dd if=/dev/zero of=test.bin bs=512 
> count=10000 && date
> Sat May 14 19:07:54 CEST 2011
> 10000+0 records in
> 10000+0 records out
> 5120000 bytes (5.1 MB) copied, 13.2596 s, 386 kB/s
> Sat May 14 19:08:07 CEST 2011
> cluster-001 transformer #

To store 5MB on the GlusterFS mount, i need 13 seconds.
Via SCP the file is copied with 11,2MB per second.

Do the same on the regular harddrive:
> cluster-001 transformer # cd /tmp
> cluster-001 tmp # date && dd if=/dev/zero of=test.bin bs=512 
> count=10000 && date
> Sat May 14 19:09:27 CEST 2011
> 10000+0 records in
> 10000+0 records out
> 5120000 bytes (5.1 MB) copied, 0.0615758 s, 83.1 MB/s
> Sat May 14 19:09:27 CEST 2011
> cluster-001 tmp #

At the same time, i create the 5 MB File on the GlusterFS Mount, both 
processes (SERVER & CLIENT) from GlusterFS consume up to 100% of CPU:
> top - 19:10:39 up 4 days,  1:29,  2 users,  load average: 0.26, 0.11, 0.07
> Tasks:  56 total,   2 running,  54 sleeping,   0 stopped,   0 zombie
> Cpu(s): 54.8%us, 36.5%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  
> 8.6%si,  0.0%st
> Mem:    505136k total,   472028k used,    33108k free,    38420k buffers
> Swap:  2000088k total,      128k used,  1999960k free,   312748k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  6550 root      20   0 86348  57m 2064 R 50.7 11.7   3:09.42 glusterfs
>  4314 root      20   0 56204  11m 1752 S 45.1  2.4   4:56.15 glusterfsd

This is the /proc/cpuinfo output:
> cluster-001 ~ # cat /proc/cpuinfo
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 15
> model           : 2
> model name      : Intel(R) Pentium(R) 4 CPU 1.80GHz
> stepping        : 4
> cpu MHz         : 1794.695
> cache size      : 512 KB
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 2
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm up pebs bts
> bogomips        : 3589.39
> clflush size    : 64
> cache_alignment : 128
> address sizes   : 36 bits physical, 32 bits virtual
> power management:

The FUSE Kernelmodule is compiled as module and is not directly linked 
to the kernel.

Any suggestions, how to solve this problem?

Liebe Grüße aus Freilassing,

Michael Rack