[Gluster-devel] Client side AFR race conditions?

gordan at bobich.net gordan at bobich.net
Tue May 6 17:04:18 UTC 2008


Hmm... So you are saying the problem is writing without locking?
Should writing to a file not involve an implicit lock, regardles of 
flock?

Gordan

On Tue, 6 May 2008, Martin Fick wrote:

> --- Anand Babu Periasamy <ab at gnu.org.in> wrote:
>
>> I really want to understand the issue and help you
>> out. We always have heated discussions even in our
>> labs. We only take it positively :) Your feedback is
>
>> very valuable to us.
>
> No prob!  I appreciate it.
>
>
>> Martin Fick wrote:
>> In other words, what prevents conflicts when
>> client A & B both write to the same file?  Could
>> A's write to subvolume A succeed before B's write
>> to subvolume A, and at the same time B's write to
>> subvolume B succeed before A's write to subvolume
>> B?
>
>
> OK, just for giggles I created a test script to
> attempt to replicate this theoretical problem.  I was
> in fact able to do so fairly easily, in fact, more
> easily than I might have hoped!
>
> What I wrote was a simple shell script which attempts
> to write to a file at the same time from two different
> processes.  Since this is actually hard to time with a
> shell script, I did not expect very good results.  A C
> program is likely to create much greater contention
> possibilities.  The script simply writes once from
> each process to a file and then increments the
> filename counter and starts over with the next file.
> This should perform one 20 character write only from
> each process to each file.  I have attached the test
> script (stress).  I run the script with the following
> options:
>
>  ./stress /mnt -d /mnt2 -c 100
>
> This tells it to perform the test on 100 files and
> specifies the 2 different mount points.
>
>
> As for my glusterfs setup, I use two client afr mounts
> on the same machine /mnt and /mnt2.  As noted above,
> the script is run so that each process points to a
> different client mount.  The server runs on the same
> machine and maps the client subvolumes to /export/a
> and /export/b, configs below.
>
>
> Here are the split brain counts for a few successive
> runs of 100 file writes:
> 4,8,6,7,6,10,18,7,0,5,4,9,10,12.  Not good!  That
> means that it is actually very easy to create this
> problem, at least in the single digits percentage
> wise.
>
> I use this simple command to get those results:
>
>  diff -q /export/a /export/b |wc -l
>
> This command compares the two different subvolumes.  I
> have looked at the file themselves, and yes they are
> different, either lots of AAAs or lots of BBBs.
>
> I hope this helps you debug this race condition a
> little,
>
>
> -Martin
>
>
>
> I am using debian, packages:
>
>  glusterfs client and server 1.3.8-0pre
>  fuse-utils and libfuse  2.7.2-glfs
>  linux kernel 2.6.17-1-vserver-686
>
>
> Server:
>
> volume a
>  type storage/posix
>  option directory /export/a
> end-volume
>
> volume b
>  type storage/posix
>  option directory /export/b
> end-volume
>
> volume afr
>  type cluster/afr
>  subvolumes a b
> end-volume
>
> volume server
>  type protocol/server
>  option transport-type tcp/server
>  option bind-address 192.168.1.75
>
>  subvolumes a b
>
>  option auth.ip.a.allow *
>  option auth.ip.b.allow *
> end-volume
>
>
> Client:
>
>
> volume ca
>  type protocol/client
>  option transport-type tcp/client
>  option remote-host 192.168.1.75
>  option remote-subvolume a
> end-volume
>
> volume cb
>  type protocol/client
>  option transport-type tcp/client
>  option remote-host 192.168.1.75
>  option remote-subvolume b
> end-volume
>
> volume afr
>  type cluster/afr
>  subvolumes ca cb
> end-volume
>
>
>
>
> stress:
>
>
> #!/bin/bash
>
> trap clean HUP INT QUIT TERM
>
> clean()
> {
>  kill $A $B 2>/dev/null
>  wait $A $B
>  rm "$A2B" "$B2A" "$P" 2>/dev/null
>  exit 1
> }
>
> writei()
> {  # dir val rec snd
>  typeset  dir="$1" val="$2" rec="$3" snd="$4"
>  typeset -i i=0
>
>  while  read i < "$rec" ; do
>    [ ! -z "$SW_P" ]  && read  < "$P"
>    echo "$val" > "$dir/$i"  # we follow
>    i=$(($i +1))
>
>    [ ! -z "$SW_S" ]  && sleep 1
>
>    [ ! -z "$C"  -a  $i -gt $C ] && return
>    echo $i > "$snd"
>    [ ! -z "$C"  -a  $i -eq $C ] && return
>    echo "$val" > "$dir/$i"  # we initiate
>  done
> }
>
> D="$1"
> D2="$D"
>
> COM=/tmp/$(basename $0)
> A2B="$COM.a2b"
> B2A="$COM.b2a"
> P="$COM.prompt"
> mkfifo "$A2B" "$B2A"
>
> VA=AAAAAAAAAAAAAAAAAAAAA
> VB=BBBBBBBBBBBBBBBBBBBBB
>
> while [ $# -gt 0 ] ; do
>  case "$1" in
>    -p|--prompt) SW_P="-p" ; mkfifo "$P" ;;
>    -s|--sleep)  SW_S="-s" ;;
>
>    -m|--manual) shift ; M="$1" ;;
>    -d) shift ; D2="$1" ;;
>    -c|--count) shift ; C="$1" ;;
>    -a) shift ; VA="$1" ;;
>    -b) shift ; VB="$1" ;;
>  esac
> shift ; done
>
>
> writei "$D" "$VA" "$B2A$M" "$A2B" &
> A=$!
>
> writei "$D2" "$VB" "$A2B$M" "$B2A" &
> B=$!
>
> echo 0 > "$B2A"
>
> if [ ! -z "$SW_P" ] ; then
>  while true ; do  read ; echo > "$P" ; done
> fi
>
> wait $A $B
> clean
>
>
>
>
>      ____________________________________________________________________________________
> Be a better friend, newshound, and
> know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>





More information about the Gluster-devel mailing list