[Gluster-users] Caching differences in Gluster vs Local Storage

Jon Swanson jswanson at valuecommerce.co.jp
Fri Apr 2 07:10:30 UTC 2010


Hello,

First off, thanks again for providing gluster. Awesome project.

This is a n00bish question.  I thought that gluster goes through the VFS 
like any other filesystem, which is where the most of the filesystem 
caching takes place. (Somewhat Simplified)

I'm seeing a major difference in benchmarks when comparing small-ish 
files locally versus on gluster. Namely, about 40x different on writes.

I don't really think this is a problem, but am just seeking a greater 
understanding.  Server / client are all on 3.0.3. Servers are two 
machines in a replicate setup.

Client: CentOS 5.4 2.6.18-164.15.1.el5.centos
Servers: F12 2.6.32.9-70.fc12.x86_64

-----------------------------------------
(To Gluster)
[root at linuxdb1 tiobench-gluster.2]# tiotest -b 16384 -r 4096 -f 32 -t 16 
-d .
Tiotest results for 16 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write         512 MBs |   16.7 s |  30.731 MB/s |   2.0 %  |  33.5 % |
| Random Write 1024 MBs |   38.9 s |  26.314 MB/s |   1.8 %  |  32.5 % |
| Read          512 MBs |    4.8 s | 107.145 MB/s |   4.0 %  | 221.4 % |
| Random Read  1024 MBs |    4.2 s | 241.220 MB/s |  11.6 %  | 543.4 % |
`----------------------------------------------------------------------'
Tiotest latency results:
,-------------------------------------------------------------------------.
| Item         | Average latency | Maximum latency | % >2 sec | % >10 sec |
+--------------+-----------------+-----------------+----------+-----------+
| Write        |        7.747 ms |      240.730 ms |  0.00000 |   0.00000 |
| Random Write |        8.709 ms |     2425.524 ms |  0.00153 |   0.00000 |
| Read         |        2.009 ms |     1575.232 ms |  0.00000 |   0.00000 |
| Random Read  |        0.930 ms |      236.096 ms |  0.00000 |   0.00000 |
|--------------+-----------------+-----------------+----------+-----------|
| Total        |        4.839 ms |     2425.524 ms |  0.00051 |   0.00000 |
`--------------+-----------------+-----------------+----------+-----------'



(To Local)
[root at linuxdb1 tiobench-gluster.2]# tiotest -b 16384 -r 4096 -f 32 -t 16 
-d ~
Tiotest results for 16 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write         512 MBs |   35.7 s |  14.361 MB/s |   0.5 %  | 833.8 % |
| Random Write 1024 MBs |  100.6 s |  10.182 MB/s |   0.4 %  | 379.5 % |
| Read          512 MBs |    0.1 s | 4043.978 MB/s |  74.2 %  | 5832.1 % |
| Random Read  1024 MBs |    0.2 s | 4171.521 MB/s | 131.2 %  | 6425.0 % |
`----------------------------------------------------------------------'
Tiotest latency results:
,-------------------------------------------------------------------------.
| Item         | Average latency | Maximum latency | % >2 sec | % >10 sec |
+--------------+-----------------+-----------------+----------+-----------+
| Write        |        0.846 ms |      154.874 ms |  0.00000 |   0.00000 |
| Random Write |        0.185 ms |      265.350 ms |  0.00000 |   0.00000 |
| Read         |        0.044 ms |       13.088 ms |  0.00000 |   0.00000 |
| Random Read  |        0.043 ms |       16.019 ms |  0.00000 |   0.00000 |
|--------------+-----------------+-----------------+----------+-----------|
| Total        |        0.224 ms |      265.350 ms |  0.00000 |   0.00000 |
`--------------+-----------------+-----------------+----------+-----------'


---------------------------------------
Volume Files. The machine in question is mounting the linuxdb1 volume. 
Any criticisms of the way these files are setup are also extremely welcome.

[root at x2100-gfs1 glusterfs]# cat glusterfsd.vol
## file auto generated by /usr/bin/glusterfs-volgen (export.vol)
# Cmd line:
# $ /usr/bin/glusterfs-volgen --name DBMirror --raid 1 
x2100-gfs1:/data/gfs/DBMirror x2100-gfs2:/data/gfs/DBMirror

############# pdb Mirror #####################################
# RAID 1
# TRANSPORT-TYPE tcp
# mounted by test environment. Currently setup as
# a replicate, or raid1, across x2100-gfs1 and x2100-gfs2
##################################################################

volume pdb-posix
   type storage/posix
   option directory /data/gfs/pdb
end-volume

volume pdb-locks
     type features/locks
     subvolumes pdb-posix
end-volume

volume pdb-iothreads
     type performance/io-threads
     option thread-count 8
     subvolumes pdb-locks
end-volume


############# linuxdb1 Mirror #####################################
# RAID 1
# TRANSPORT-TYPE tcp
# mounted by linuxdb1  Currently configured as a
# replicate, or raid1, across x2100-gfs1 and x2100-gfs2
##################################################################
volume linuxdb1-posix
   type storage/posix
   option directory /data/gfs/linuxdb1
end-volume

volume linuxdb1-locks
     type features/locks
     subvolumes linuxdb1-posix
end-volume

volume linuxdb1-iothreads
     type performance/io-threads
     option thread-count 8
     subvolumes linuxdb1-locks
end-volume

############# vmmirror1 Mirror ###################################
# RAID 1
# TRANSPORT-TYPE tcp
# mounted by stuff (archtest01).  Currently configured as a
# replicate, or raid1, across x2100-gfs1 and x2100-gfs2
##################################################################
volume vmmirror1-posix
   type storage/posix
   option directory /data/gfs/vmmirror1
end-volume

volume vmmirror1-locks
     type features/locks
     subvolumes vmmirror1-posix
end-volume

volume vmmirror1-iothreads
     type performance/io-threads
     option thread-count 8
     subvolumes vmmirror1-locks
end-volume

############# GLOBAL SPECIFICATIONS ###############################
# TRANSPORT-TYPE tcp
# global options. Currently configured to export volumes linuxdb1
# and pdb.
##################################################################

volume server-tcp
     type protocol/server
     option transport-type tcp
     option auth.addr.pdb-iothreads.allow *
     option auth.addr.linuxdb1-iothreads.allow *
     option auth.addr.vmmirror1-iothreads.allow *
     option transport.socket.listen-port 6996
     option transport.socket.nodelay on
     subvolumes pdb-iothreads linuxdb1-iothreads vmmirror1-iothreads
end-volume

[root at x2100-gfs1 glusterfs]# cat glusterfs.vol
## file auto generated by /usr/bin/glusterfs-volgen (mount.vol)
# Cmd line:
# $ /usr/bin/glusterfs-volgen --name DBMirror --raid 1 
x2100-gfs1:/data/gfs/DBMirror x2100-gfs2:/data/gfs/DBMirror

############# PDB Mirror #####################################
# RAID 1
# TRANSPORT-TYPE tcp
# Intended for pdb test environment
# Volume-name: pdb
##############################################################
volume x2100-gfs1-pdb
     type protocol/client
     option transport-type tcp
     option remote-host x2100-gfs1
     option transport.socket.nodelay on
     option remote-port 6996
     option remote-subvolume pdb-iothreads
end-volume

volume x2100-gfs2-pdb
     type protocol/client
     option transport-type tcp
     option remote-host x2100-gfs2
     option transport.socket.nodelay on
     option remote-port 6996
     option remote-subvolume pdb-iothreads
end-volume

# Name of the volume as specified at mount time
volume pdb
     type cluster/replicate
     subvolumes x2100-gfs1-pdb x2100-gfs2-pdb
end-volume

volume pdb-writebehind
     type performance/write-behind
     option cache-size 4MB
     subvolumes pdb
end-volume

volume pdb-readahead
     type performance/read-ahead
     option page-count 4
     subvolumes pdb-writebehind
end-volume

volume pdb-iocache
     type performance/io-cache
     option cache-size `grep 'MemTotal' /proc/meminfo  | awk '{print $2 
* 0.2 / 1024}' | cut -f1 -d.`MB
     option cache-timeout 1
     subvolumes pdb-readahead
end-volume

volume pdb-quickread
     type performance/quick-read
     option cache-timeout 1
     option max-file-size 64kB
     subvolumes pdb-iocache
end-volume

volume pdb-statprefetch
     type performance/stat-prefetch
     subvolumes pdb-quickread
end-volume






############# linuxdb Mirror #####################################
# RAID 1
# TRANSPORT-TYPE tcp
# Intended for linuxdb1 to mount
# Volume-name: linuxdb1
##################################################################
volume x2100-gfs1-linuxdb1
     type protocol/client
     option transport-type tcp
     option remote-host x2100-gfs1
     option transport.socket.nodelay on
     option remote-port 6996
     option remote-subvolume linuxdb1-iothreads
end-volume

volume x2100-gfs2-linuxdb1
     type protocol/client
     option transport-type tcp
     option remote-host x2100-gfs2
     option transport.socket.nodelay on
     option transport.remote-port 6996
     option remote-subvolume linuxdb1-iothreads
end-volume

# Name of the volume as specified at mount time
volume linuxdb1
     type cluster/replicate
     subvolumes x2100-gfs1-linuxdb1 x2100-gfs2-linuxdb1
end-volume

volume linuxdb1-writebehind
     type performance/write-behind
     option cache-size 4MB
     subvolumes linuxdb1
end-volume

volume linuxdb1-readahead
     type performance/read-ahead
     option page-count 4
     subvolumes linuxdb1-writebehind
end-volume

volume linuxdb1-iocache
     type performance/io-cache
     option cache-size `grep 'MemTotal' /proc/meminfo  | awk '{print $2 
* 0.2 / 1024}' | cut -f1 -d.`MB
     option cache-timeout 1
     subvolumes linuxdb1-readahead
end-volume

volume linuxdb1-quickread
     type performance/quick-read
     option cache-timeout 1
     option max-file-size 64kB
     subvolumes linuxdb1-iocache
end-volume

volume linuxdb1-statprefetch
     type performance/stat-prefetch
     subvolumes linuxdb1-quickread
end-volume


############# Virtual Images Mirror ###############################
# RAID 1
# TRANSPORT-TYPE tcp
# Intended for vm testing servers to mount
# Volume-name: vmmirror1
##################################################################
volume x2100-gfs1-vmmirror1
     type protocol/client
     option transport-type tcp
     option remote-host x2100-gfs1
     option transport.socket.nodelay on
     option remote-port 6996
     option remote-subvolume vmmirror1-iothreads
end-volume

volume x2100-gfs2-vmmirror1
     type protocol/client
     option transport-type tcp
     option remote-host x2100-gfs2
     option transport.socket.nodelay on
     option remote-port 6996
     option remote-subvolume vmmirror1-iothreads
end-volume

# Name of the volume as specified at mount time
volume vmmirror1
     type cluster/replicate
     subvolumes x2100-gfs1-vmmirror1 x2100-gfs2-vmmirror1
end-volume

volume vmmirror1-writebehind
     type performance/write-behind
     option cache-size 4MB
     subvolumes vmmirror1
end-volume

volume vmmirror1-readahead
     type performance/read-ahead
     option page-count 4
     subvolumes vmmirror1-writebehind
end-volume

volume vmmirror1-iocache
     type performance/io-cache
     option cache-size `grep 'MemTotal' /proc/meminfo  | awk '{print $2 
* 0.2 / 1024}' | cut -f1 -d.`MB
     option cache-timeout 1
     subvolumes vmmirror1-readahead
end-volume

volume vmmirror1-quickread
     type performance/quick-read
     option cache-timeout 1
     option max-file-size 64kB
     subvolumes vmmirror1-iocache
end-volume

volume vmmirror1-statprefetch
     type performance/stat-prefetch
     subvolumes vmmirror1-quickread
end-volume


Thanks!


More information about the Gluster-users mailing list