[Gluster-users] severe performance io-cache bug with large cache in rc7 and git?

Thu Apr 9 22:03:28 UTC 2009

Good afternoon,

We're running into some pretty severe performance issues with gluster
when we're using io-cache on the server side. We're just doing some
basic performance measuring between one server and one client. Here's
our setup:

client:
####################################
volume client
  type protocol/client
  option transport-type tcp
  option remote-host server
  option remote-subvolume io-cache
end-volume
####################################

server:
####################################
volume posix
  type storage/posix
  option directory /tmp/gluster
end-volume

volume posix-locks
    type features/locks
    subvolumes posix
end-volume

volume io-cache
    type performance/io-cache
    option cache-size 8GB
    subvolumes posix-locks
end-volume

volume server
  type protocol/server
  subvolumes io-cache
  option transport-type tcp
  option auth.addr.io-cache.allow *
end-volume
####################################

We're just testing streaming a 4gb file full of /dev/random through
gluster with a "cat /mnt/glusterfs/giant-4gb | pv > /dev/null". The
two machines are completely idle. On the client side, the copy starts
out fine at about 112 MB/s, but once we hit about 1.75gb transfered
the performance plummets:

 110MB 0:00:01 [ 110MB/s]
 223MB 0:00:02 [ 113MB/s]
 336MB 0:00:03 [ 113MB/s]
 448MB 0:00:04 [ 113MB/s]
 533MB 0:00:05 [  79MB/s]
 637MB 0:00:06 [ 113MB/s]
 750MB 0:00:07 [ 113MB/s]
 863MB 0:00:08 [ 113MB/s]
 975MB 0:00:09 [ 113MB/s]
1.06GB 0:00:10 [ 113MB/s]
1.17GB 0:00:11 [ 113MB/s]
1.28GB 0:00:12 [ 113MB/s]
1.39GB 0:00:13 [ 113MB/s]
 1.5GB 0:00:14 [ 113MB/s]
1.61GB 0:00:15 [ 111MB/s]
1.72GB 0:00:16 [ 106MB/s]
1.81GB 0:00:17 [99.3MB/s]
 1.9GB 0:00:18 [93.7MB/s]
1.99GB 0:00:19 [86.8MB/s]
2.07GB 0:00:20 [82.8MB/s]
2.11GB 0:00:21 [44.9MB/s]
2.19GB 0:00:22 [74.5MB/s]
2.25GB 0:00:23 [70.4MB/s]
2.32GB 0:00:24 [66.9MB/s]
2.38GB 0:00:25 [59.6MB/s]
2.43GB 0:00:26 [55.7MB/s]
2.48GB 0:00:27 [  53MB/s]
2.53GB 0:00:28 [50.4MB/s]
2.58GB 0:00:29 [48.3MB/s]
2.63GB 0:00:30 [47.6MB/s]
2.67GB 0:00:31 [45.5MB/s]
2.71GB 0:00:32 [43.8MB/s]
2.76GB 0:00:33 [42.2MB/s]
 2.8GB 0:00:34 [40.9MB/s]
2.83GB 0:00:35 [39.6MB/s]
2.87GB 0:00:36 [37.8MB/s]
2.91GB 0:00:37 [36.5MB/s]
2.94GB 0:00:38 [  35MB/s]
2.98GB 0:00:39 [34.5MB/s]
3.01GB 0:00:40 [33.9MB/s]
3.04GB 0:00:41 [33.1MB/s]
3.07GB 0:00:42 [32.8MB/s]
 3.1GB 0:00:43 [28.5MB/s]
3.12GB 0:00:44 [21.2MB/s]

On the server side, the output of dstat shows that the server is just
not sending much data at all:

----total-cpu-usage---- -disk/dm-0----disk/sda- --net/eth1-
---paging-- ---system--
usr sys idl wai hiq siq|_read write _read write|_recv _send|__in_
_out_|_int_ _csw_
 72   1  24   2   0   0|   0     0 :   0     0 |   0     0 |   0     0
|   0     0
  0   0 100   0   0   0|   0     0 :   0     0 |3274B 5256B|   0     0
| 286   424
  0   0  99   1   0   0|   0    76k:   0    76k|1738B 3378B|   0     0
| 283   422
  0   0 100   0   0   0|   0     0 :   0     0 |1664B 3332B|   0     0
| 286   408
  0   0  99   1   0   0|   0     0 :   0   144k|1728B 3390B|   0     0
| 278   408
  0   0 100   0   0   0|   0     0 :   0     0 |1344B 2724B|   0     0
| 275   404
  0   0 100   0   0   0|   0     0 :   0     0 |1280B 2840B|   0     0
| 283   406
  1   1  90   8   0   0|   0   796k:   0   796k|1600B 3194B|   0     0
| 413   455
  0   0 100   0   0   0|   0     0 :   0     0 |1992B 3943B|   0     0
| 282   393
  0   1  99   0   0   0|   0     0 :   0    12k|1472B 2982B|   0     0
|3144   954
  1   3  94   0   0   3|   0    20k:   0    20k| 201k 23.1M|   0     0
|14.8k 4039
  1   3  94   0   0   3|   0     0 :   0     0 |1995k  235M|   0     0
|14.8k 4009
  1   4  92   1   0   2|   0    88k:   0    88k| 991k  118M|   0     0
|14.7k 3967
  2   4  92   1   0   2|   0   100k:   0   100k| 986k  117M|   0     0
|14.6k 4075
  2   3  93   1   0   2|   0     0 :   0    48k| 989k  118M|   0     0
|14.6k 3821
  3   3  92   0   0   2|   0   276k:   0   276k| 994k  118M|   0     0
|14.7k 3537
  2   4  92   0   0   2|   0     0 :   0     0 | 996k  118M|   0     0
|14.7k 3168
  3   3  91   0   0   2|   0     0 :   0     0 | 999k  118M|   0     0
|14.7k 2677
  4   4  90   1   0   2|   0    56k:   0    56k| 999k  118M|   0     0
|14.7k 2246
  5   3  88   1   1   2|   0     0 :   0    12k| 993k  118M|   0     0
|14.7k 2254
  6   4  87   0   1   2|   0     0 :   0     0 | 995k  117M|   0     0
|14.8k 2182
  8   3  87   0   0   3|   0     0 :   0     0 | 986k  118M|   0     0
|14.6k 2202
 12   3  83   1   0   1|   0   180k:   0   176k| 989k  116M|   0     0
|14.7k 2175
 10   3  80   4   0   3|   0   192k:   0   196k| 997k  116M|   0     0
|15.2k 2170
 11   3  81   0   1   4|   0     0 :   0    28k|1009k  114M|   0     0
|15.8k 2145
 15   2  81   0   0   2|   0     0 :   0     0 | 941k  105M|   0     0
|14.9k 2004
 16   3  79   0   0   2|   0     0 :   0     0 | 897k  101M|   0     0
|14.5k 1655
 16   3  78   0   0   3|   0     0 :   0     0 | 899k 93.5M|   0     0
|14.5k 1765
 20   3  74   1   0   2|   0   100k:   0   100k| 842k 88.6M|   0     0
|13.9k 1576
 19   3  75   1   0   3|   0     0 :   0    12k| 792k 83.8M|   0     0
|13.5k  722
 22   2  75   0   0   1|   0    12k:   0    16k| 733k 77.7M|   0     0
|12.6k  423
 22   2  75   0   0   2|   0     0 :   0     0 | 668k 71.5M|   0     0
|11.6k  403
 23   1  75   0   0   1|   0     0 :   0     0 | 620k 66.5M|   0     0
|10.8k  379
 22   2  74   1   0   1|   0    36k:   0    36k| 580k 62.3M|   0     0
|10.1k  429
 22   2  75   0   0   2|   0     0 :   0    12k| 548k 58.8M|   0     0
|9578   377
 22   2  75   0   0   2|   0  8192B:   0   228k| 515k 55.1M|   0     0
|9099   406
 23   2  75   0   0   1|   0     0 :   0     0 | 492k 52.7M|   0     0
|8639   394
 23   1  74   1   0   2|   0    60k:   0    60k| 469k 50.3M|   0     0
|8261   411
 25   1  68   4   0   1|   0   340k:   0   340k| 464k 49.8M|   0     0
|8354   422
 24   1  72   1   0   2|   0     0 :   0    12k| 451k 48.6M|   0     0
|8070   468
 24   2  74   0   0   1|   0    32k:   0    32k| 429k 46.0M|   0     0
|7487   470
 25   2  72   0   0   1|   0     0 :   0     0 | 405k 42.6M|   0     0
|7182   555
 24   1  73   0   0   1|   0     0 :   0     0 | 399k 42.8M|   0     0
|7075   379
 23   2  75   0   0   1|   0   176k:   0   176k| 380k 40.8M|   0     0
|6715   402
 23   1  74   0   0   1|   0     0 :   0    40k| 366k 39.3M|   0     0
|6497   388
 24   1  75   0   0   1|   0     0 :   0     0 | 356k 38.2M|   0     0
|6316   393
 23   1  75   0   0   1|   0     0 :   0     0 | 345k 37.2M|   0     0
|6181   405
 23   1  75   0   0   1|   0     0 :   0     0 | 337k 36.2M|   0     0
|6024   408
 23   1  74   1   0   1|   0    60k:   0    60k| 332k 35.5M|   0     0
|5922   394
 24   1  74   1   0   1|   0     0 :   0    12k| 323k 34.6M|   0     0
|5744   401
 21   1  78   0   0   1|   0   156k:   0   156k| 278k 29.9M|   0     0
|5063   410
 23   2  75   0   0   1|   0     0 :   0     0 | 309k 33.2M|   0     0
|5520   409
 24   1  75   0   0   1|   0     0 :   0     0 | 303k 32.4M|   0     0
|5450   387
 23   3  68   5   0   1|   0   356k:   0   356k| 255k 27.2M|   0     0
|4378   449
 24   1  74   1   0   1|   0     0 :   0    12k| 202k 21.3M|   0     0
|3659   402
 24   1  75   0   0   1|   0     0 :   0     0 | 193k 20.5M|   0     0
|3538   393
 23   1  75   0   0   1|   0     0 :   0     0 | 190k 20.3M|   0     0
|3490   394
 24   1  75   0   0   1|   0     0 :   0     0 | 188k 20.1M|   0     0
|3458   395
 23   2  73   2   0   0|   0   116k:   0   116k| 185k 19.9M|   0     0
|3434   395
 24   1  74   0   0   1|   0     0 :   0    12k| 184k 19.7M|   0     0
|3431   466
 15   1  84   0   0   1|   0     0 :   0     0 | 162k 16.8M|   0     0
|2300   401
  0   0 100   0   0   0|   0     0 :   0     0 | 842B 1974B|   0     0
| 267   388

Looking at the process itself, you can see that the cpu usage stays
constant for a couple seconds, but then rises until it maxes out at
100%. When it hits 100% the transfer speed plummets:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
11975 root      17   0 31336 1916  804 S    0  0.0   0:00.00 glusterfsd
11975 root      17   0 31336 1916  804 S    0  0.0   0:00.00 glusterfsd
11975 root      17   0 31336 1916  804 S    0  0.0   0:00.00 glusterfsd
11975 root      17   0 31336 1916  804 S    0  0.0   0:00.00 glusterfsd
11975 root      17   0 31336 1916  804 S    0  0.0   0:00.00 glusterfsd
11975 root      16   0  105m  77m  888 R   20  0.5   0:00.20 glusterfsd
11975 root      15   0  222m 193m  888 S   29  1.2   0:00.49 glusterfsd
11975 root      15   0  339m 310m  888 S   34  1.9   0:00.83 glusterfsd
11975 root      15   0  456m 427m  888 R   27  2.7   0:01.10 glusterfsd
11975 root      15   0  572m 544m  888 S   36  3.4   0:01.46 glusterfsd
11975 root      15   0  688m 659m  888 S   44  4.1   0:01.90 glusterfsd
11975 root      15   0  804m 776m  888 S   48  4.9   0:02.38 glusterfsd
11975 root      15   0  921m 893m  888 S   45  5.6   0:02.83 glusterfsd
11975 root      15   0 1035m 1.0g  888 S   52  6.3   0:03.35 glusterfsd
11975 root      15   0 1150m 1.1g  888 S   53  7.0   0:03.88 glusterfsd
11975 root      16   0 1263m 1.2g  888 R   61  7.7   0:04.49 glusterfsd
11975 root      15   0 1371m 1.3g  888 S   65  8.4   0:05.14 glusterfsd
11975 root      15   0 1474m 1.4g  888 S   73  9.1   0:05.87 glusterfsd
11975 root      15   0 1573m 1.5g  888 S   79  9.7   0:06.66 glusterfsd
11975 root      15   0 1666m 1.6g  888 S   84 10.3   0:07.50 glusterfsd
11975 root      16   0 1755m 1.7g  888 R   90 10.8   0:08.40 glusterfsd
11975 root      16   0 1839m 1.8g  888 R   94 11.3   0:09.34 glusterfsd
11975 root      17   0 1919m 1.8g  888 R   98 11.9   0:10.32 glusterfsd
11975 root      19   0 1994m 1.9g  888 R  100 12.3   0:11.32 glusterfsd
11975 root      21   0 2063m 2.0g  888 R  100 12.7   0:12.32 glusterfsd
11975 root      25   0 2125m 2.0g  888 R  101 13.1   0:13.33 glusterfsd
11975 root      25   0 2184m 2.1g  888 R   99 13.5   0:14.32 glusterfsd
11975 root      25   0 2238m 2.2g  888 R  100 13.9   0:15.32 glusterfsd
11975 root      25   0 2290m 2.2g  888 R  100 14.2   0:16.32 glusterfsd
11975 root      25   0 2340m 2.3g  888 R  100 14.5   0:17.32 glusterfsd
11975 root      25   0 2386m 2.3g  888 R  100 14.8   0:18.32 glusterfsd
11975 root      25   0 2431m 2.3g  888 R  101 15.1   0:19.33 glusterfsd
11975 root      25   0 2473m 2.4g  888 R   99 15.3   0:20.32 glusterfsd
11975 root      25   0 2514m 2.4g  888 R  101 15.6   0:21.33 glusterfsd
...

This is repeatable. With the same glusterfs and glusterfsd (that has
it's cache now full with the 4.1gb file), the pattern is exactly the
same. The copy maxes out at 112 MB/s for a couple seconds, but once
around 1.5gb has been copied the glusterfsd cpu spikes to 100% and the
transfer speed plummets. In comparison, when I disable the cache (and
just let the kernel manage the file caching), with these files:

client:
#############################
volume client
  type protocol/client
  option transport-type tcp
  option remote-host server
  option remote-subvolume posix-locks
end-volume
#############################

server:
#############################
volume posix
  type storage/posix
  option directory /tmp/gluster
end-volume

volume posix-locks
    type features/locks
    subvolumes posix
end-volume

volume server
  type protocol/server
  subvolumes posix-locks
  option transport-type tcp
  option auth.addr.posix-locks.allow *
end-volume
###############################

The transfer stays at 112 MB/s the entire copy, and glusterfsd cpu
usage doesn't go above 30%. Does anyone know what's going on?