[Gluster-users] Shared web hosting with GlusterFS and inotify

Emile Heitor emile.heitor at nbs-system.com
Wed Sep 15 14:58:49 UTC 2010


Hi list,

For a couple of weeks, we're experimenting a web hosting system based on
GlusterFS in order to share customers documentroots between
more-than-one machine.

Involved hardware and software are :

Two servers composed of 2x Intel 5650 (i.e. 2x12 cores @2,6Ghz), 24GB
DDR3 RAM, 146GB SAS disks / RAID 1
Both servers running 64bits Debian Lenny GNU/Linux with GlusterFS 3.0.5
The web server is Apache 2.2, the application is a huge PHP/MySQL monster.

For our first naive tests were using the glusterfs mountpoint as
apache's documentroot. In short, performances were catastrophic.
A single of these servers, without GlusterFS, is capable of handling
about 170 pages per second with 100 concurrent users.
The same server, with apache documentroot being a gluster mountpoint,
drops to 5 PPS for 20 CU and just stops responding for 40+.

We tried a lot of tips (quick-read, io-threads, io-cache, thread-count,
timeouts...) we read on this very mailing list, various websites, or
experiences on our own, we never got better than 10 PPS / 20 users.

So we took another approach: instead of declaring gluster mountpoint as
the documentroot, we declared the local storage, but of course, without
any modification, this would lead to inconsistencies if by any chance
apache writes something (.htaccess, tmp file, log...). And so enters
inotify. Using inotify-tools's "inotifywait", we have this little script
watching for local documentroot modifications, duplicating them to the
glusterfs share. The infinite loop is avoided by a md5 comparison. Here
a very early proof of concept :

#!/bin/sh

[ $# -lt 2  ]&&  echo "usage: $0<source>  <destination>"&&  exit 1

PATH=${PATH}:/bin:/sbin:/usr/bin:/usr/sbin; export PATH

SRC=$1
DST=$2

cd ${SRC}

# no recursion
RSYNC='rsync -dlptgoD --delete "${srcdir}" "${dstdir}/"'

inotifywait -mr \
     --exclude \..*\.sw.* \
     -e close_write -e create -e delete_self -e delete . | \
     while read dir action file
     do
         srcdir="${SRC}/${dir}"
         dstdir="${DST}/${dir}"

         [ -d "${srcdir}" ]&&  \
         [ ! -z "`df -T \"${srcdir}\"|grep tmpfs`" ] \
&&  continue

         # debug
         echo ${dir} ${action} ${file}

         case "${action}" in
         CLOSE_WRITE,CLOSE)
             [ ! -f "${dstdir}/${file}" ]&&  eval ${RSYNC}&&  continue

             md5src="`md5sum \"${srcdir}/${file}\"|cut -d' ' -f1`"
             md5dst="`md5sum \"${dstdir}/${file}\"|cut -d' ' -f1`"
             [ ! $md5src == $md5dst ]&&  eval ${RSYNC}
             ;;
         CREATE,ISDIR)
             [ ! -d "${dstdir}/${file}" ]&&  eval ${RSYNC}
             ;;
         DELETE|DELETE,ISDIR)
             eval ${RSYNC}
             ;;
         esac
     done

As for now a gluster mountpoint is barely unusable as an Apache
DocumentRoot for us (and yes, with htaccess disabled), i'd like to have
the list's point of view on this approach. Do you see any terrible glitch ?

Thanks in advance,

-- 
Emile Heitor, Responsable d'Exploitation
---
www.nbs-system.com, 140 Bd Haussmann, 75008 Paris
Tel: 01.58.56.60.80 / Fax: 01.58.56.60.81





More information about the Gluster-users mailing list