[Gluster-users] Problem After adding Bricks
Scott Lovell
scottlovell at gmail.com
Fri May 24 18:19:10 UTC 2013
Hello, I have run into some performance issues after adding bricks to
a 3.3.1 volume. Basically I am seeing very high CPU usage and
extremely degraded performance. I started a re-balance but stopped it
after a couple days. The logs have a lot of entries for split-brain as
well as "Non Blocking entrylks failed for". For some of the
directories on the client doing an ls will show multiple entires for
the same directory ( ls below ). I am wondering if it is just
spinning trying to heal itself? I have been able to fix some of these
entries by removing gfid files, stat-ing etc, however I feel I may be
just making matters worse. So far the most permanent fix has been to
rsync the files out of the bricks, remove the directories, and copy it
back in through the normal fuse mount, but that will take quite some
time given the performance problems.
Has Anyone seen this behavior before or know any possible fixes?
Thanks in Advance,
Scott
root at ftpscan02:~# ls -lrt /mnt/glusterfs/ftp_scan/199268/
ls: cannot access /mnt/glusterfs/ftp_scan/199268/mirror: Input/output error
ls: cannot access /mnt/glusterfs/ftp_scan/199268/mirror: Input/output error
ls: cannot access /mnt/glusterfs/ftp_scan/199268/mirror: Input/output error
ls: cannot access /mnt/glusterfs/ftp_scan/199268/mirror: Input/output error
ls: cannot access /mnt/glusterfs/ftp_scan/199268/mirror_trash: No data available
total 765
?????????? ? ? ? ? ? mirror_trash
?????????? ? ? ? ? ? mirror
?????????? ? ? ? ? ? mirror
?????????? ? ? ? ? ? mirror
?????????? ? ? ? ? ? mirror
-rw------- 1 torque torque 90287 May 18 20:47 cache
-rw------- 1 torque torque 667180 May 18 23:35 file_mapping
drwx------ 3 torque torque 8192 May 23 11:31 mirror_trash
drwx------ 3 torque torque 8192 May 23 11:31 mirror_trash
drwx------ 3 torque torque 8192 May 23 11:31 mirror_trash
Volume Name: gv01
Type: Distributed-Replicate
Volume ID: 03cf79bd-c5d8-467d-9f31-6c3c40dd94e2
Status: Started
Number of Bricks: 11 x 2 = 22
Transport-type: tcp
Bricks:
Brick1: fs01:/bricks/b01
Brick2: fs02:/bricks/b01
Brick3: fs01:/bricks/b02
Brick4: fs02:/bricks/b02
Brick5: fs01:/bricks/b03
Brick6: fs02:/bricks/b03
Brick7: fs01:/bricks/b04
Brick8: fs02:/bricks/b04
Brick9: fs01:/bricks/b05
Brick10: fs02:/bricks/b05
Brick11: fs01:/bricks/b06
Brick12: fs02:/bricks/b06
Brick13: fs01:/bricks/b07
Brick14: fs02:/bricks/b07
Brick15: fs01:/bricks/b08
Brick16: fs02:/bricks/b08
Brick17: fs01:/bricks/b09
Brick18: fs02:/bricks/b09
Brick19: fs01:/bricks/b10
Brick20: fs02:/bricks/b10
Brick21: fs01:/bricks/b11
Brick22: fs02:/bricks/b11
Options Reconfigured:
performance.quick-read: off
performance.io-cache: off
performance.stat-prefetch: off
performance.write-behind: off
performance.write-behind-window-size: 1MB
performance.flush-behind: off
nfs.disable: off
performance.cache-size: 16MB
performance.io-thread-count: 8
performance.cache-refresh-timeout: 10
diagnostics.client-log-level: ERROR
performance.read-ahead: on
cluster.data-self-heal: on
nfs.register-with-portmap: on
Top output :
top - 13:00:36 up 1 day, 16:02, 3 users, load average: 38.04, 37.83, 37.68
Tasks: 183 total, 2 running, 181 sleeping, 0 stopped, 0 zombie
Cpu0 : 35.8%us, 49.2%sy, 0.0%ni, 0.0%id, 5.4%wa, 0.0%hi, 9.7%si, 0.0%st
Cpu1 : 32.8%us, 59.5%sy, 0.0%ni, 1.7%id, 6.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 37.2%us, 55.1%sy, 0.0%ni, 1.7%id, 5.6%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu3 : 36.5%us, 56.4%sy, 0.0%ni, 2.0%id, 5.1%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 36.8%us, 54.2%sy, 0.0%ni, 1.0%id, 8.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 36.9%us, 53.0%sy, 0.0%ni, 2.0%id, 8.1%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 38.6%us, 54.0%sy, 0.0%ni, 1.3%id, 5.7%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu7 : 34.3%us, 59.3%sy, 0.0%ni, 2.4%id, 4.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 36.5%us, 55.7%sy, 0.0%ni, 3.4%id, 4.4%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu9 : 34.7%us, 59.2%sy, 0.0%ni, 0.7%id, 5.4%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 34.9%us, 58.2%sy, 0.0%ni, 1.7%id, 5.1%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 38.4%us, 55.0%sy, 0.0%ni, 1.0%id, 5.6%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16405712k total, 16310088k used, 95624k free, 12540824k buffers
Swap: 1999868k total, 9928k used, 1989940k free, 656604k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2460 root 20 0 391m 38m 1616 S 250 0.2 4160:51 glusterfsd
2436 root 20 0 392m 40m 1624 S 243 0.3 4280:26 glusterfsd
2442 root 20 0 391m 39m 1620 S 187 0.2 3933:46 glusterfsd
2454 root 20 0 391m 36m 1620 S 118 0.2 3870:23 glusterfsd
2448 root 20 0 391m 38m 1624 S 110 0.2 3720:50 glusterfsd
2472 root 20 0 393m 42m 1624 S 105 0.3 319:25.80 glusterfsd
2466 root 20 0 391m 37m 1556 R 51 0.2 3407:37 glusterfsd
2484 root 20 0 392m 40m 1560 S 10 0.3 268:51.71 glusterfsd
2490 root 20 0 392m 41m 1616 S 10 0.3 252:44.63 glusterfsd
2496 root 20 0 392m 41m 1544 S 10 0.3 262:26.80 glusterfsd
2478 root 20 0 392m 41m 1536 S 9 0.3 219:15.17 glusterfsd
2508 root 20 0 585m 365m 1364 S 5 2.3 46:20.31 glusterfs
3081 root 20 0 407m 101m 1676 S 2 0.6 239:36.76 glusterfs
More information about the Gluster-users
mailing list