[Gluster-users] Shared web hosting with GlusterFS and inotify

Emile Heitor emile.heitor at nbs-system.com
Wed Sep 15 13:18:06 UTC 2010


Hi list,

For a couple of weeks, we're experimenting a web hosting system based on 
GlusterFS in order to share customers documentroots between 
more-than-one machine.

Involved hardware and software are :

Two servers composed of 2x Intel 5650 (i.e. 2x12 cores @2,6Ghz), 24GB 
DDR3 RAM, 146GB SAS disks / RAID 1
Both servers running 64bits Debian Lenny GNU/Linux with GlusterFS 3.0.5
The web server is Apache 2.2, the application is a huge PHP/MySQL monster.

For our first naive tests were using the glusterfs mountpoint as 
apache's documentroot. In short, performances were catastrophic.
A single of these servers, without GlusterFS, is capable of handling 
about 170 pages per second with 100 concurrent users.
The same server, with apache documentroot being a gluster mountpoint, 
drops to 5 PPS for 20 CU and just stops responding for 40+.

We tried a lot of tips (quick-read, io-threads, io-cache, thread-count, 
timeouts...) we read on this very mailing list, various websites, or 
experiences on our own, we never got better than 10 PPS / 20 users.

So we took another approach: instead of declaring gluster mountpoint as 
the documentroot, we declared the local storage, but of course, without 
any modification, this would lead to inconsistencies if by any chance 
apache writes something (.htaccess, tmp file, log...). And so enters 
inotify. Using inotify-tools's "inotifywait", we have this little script 
watching for local documentroot modifications, duplicating them to the 
glusterfs share. The infinite loop is avoided by a md5 comparison. Here 
a very early proof of concept :

#!/bin/sh

[ $# -lt 2  ] && echo "usage: $0 <source> <destination>" && exit 1

PATH=${PATH}:/bin:/sbin:/usr/bin:/usr/sbin; export PATH

SRC=$1
DST=$2

cd ${SRC}

# no recursion
RSYNC='rsync -dlptgoD --delete "${srcdir}" "${dstdir}/"'

inotifywait -mr \
     --exclude \..*\.sw.* \
     -e close_write -e create -e delete_self -e delete . | \
     while read dir action file
     do
         srcdir="${SRC}/${dir}"
         dstdir="${DST}/${dir}"

         [ -d "${srcdir}" ] && \
         [ ! -z "`df -T \"${srcdir}\"|grep tmpfs`" ] \
&& continue

         # debug
         echo ${dir} ${action} ${file}

         case "${action}" in
         CLOSE_WRITE,CLOSE)
             [ ! -f "${dstdir}/${file}" ] && eval ${RSYNC} && continue

             md5src="`md5sum \"${srcdir}/${file}\"|cut -d' ' -f1`"
             md5dst="`md5sum \"${dstdir}/${file}\"|cut -d' ' -f1`"
             [ ! $md5src == $md5dst ] && eval ${RSYNC}
             ;;
         CREATE,ISDIR)
             [ ! -d "${dstdir}/${file}" ] && eval ${RSYNC}
             ;;
         DELETE|DELETE,ISDIR)
             eval ${RSYNC}
             ;;
         esac
     done

As for now a gluster mountpoint is barely unusable as an Apache 
DocumentRoot for us (and yes, with htaccess disabled), i'd like to have 
the list's point of view on this approach. Do you see any terrible glitch ?

Thanks in advance,

-- 
Emile Heitor, Responsable d'Exploitation
---
www.nbs-system.com, 140 Bd Haussmann, 75008 Paris
Tel: 01.58.56.60.80 / Fax: 01.58.56.60.81




More information about the Gluster-users mailing list