[Gluster-users] 2.0_rc1 and replicate

Mon Feb 2 12:52:54 UTC 2009

replies inline:

On Mon, Feb 2, 2009 at 3:02 AM, Łukasz Mierzwa <l.mierzwa at gmail.com> wrote:
> Hi,
>
> I've tested replicate a little today and I've got this problems:
>
> 1. client-side replicate with 2 nodes, both acting as server and client
> 2. I start copying big file to gluster mount point
> 3. during file copy I stop gluster server on second node

(I am assuming you are killing the glusterfsd process)

> 4. I got a pice of file on second node and whole file on first one
> 5. during or after the copy I start halted server on second node
> 6. I run ls -lh on second node in gluster mount point

(I am assuming you have started glusterfsd on server-2 before step 6)

> 7. logs are telling me to delete one of files and my second node keeps
> incomplited copy

This can happen (as of now). But can you reproduce this behavior every
time? Do you see, in one of the tries, that it does not complain about
conflict and heals by itself?

this is what happens during write:
* apply xattrs to server1 and server2. the xattr written on server1
means that a write is going to happen to server2. the xattr on server2
means that a write is going to happen on server1.
* call write() to actually write the data.
* depending on where all writes happened - unmark the xattrs. so if
server1 was down we have xattr on server2 saying that write was
pending on server1. when the server1 is brought up we can heal it
easily.
* but if server1 went down during write operation - both server1 and
server2 will have pending xattrs enabled saying that writes are needed
on the other server. This is where we have a conflict situation when
the server comes back up. we are going to fix this behavior by having
another attribute which will be unset after successful write. So the
file having this xattr unset will be the latest. (note that conflict
situation still can arise if the last up server goes down before this
xattr is unset)

>
> I've checked if I got xattrs enabled with attr command:
>
> # attr -l <gluster_data_dir>/iso
> Attribute "glusterfs.afr.entry-pending" has a 8 byte value for iso
> # attr -g glusterfs.afr.entry-pending <gluster_data_dir>/iso
> attr_get: No data available
> Could not get "glusterfs.afr.entry-pending" for iso

"attr" command was not working as expected on my machine. can you try
"getfattr" ?

 getfattr -n trusted.glusterfs.afr.entry-pending -e hex /path/to/file

>
> With client side it seems that if im inside gluster mount point and stop
> server on second node, after starting it again I need to reenter this dir
> for self-heal to work, it does not seem to be needed with server-side.

There is a limitation in the current selfheal - i.e selfheal is done
during "lookup" of an entry. lookup call is done whenever an entry
(directory/file) is accessed. If you are already inside a directory
lookup on that directory is not issued - hence it does not selfheal as
long as you do not re-enter that directory.

> With server side afr (both node afr'ing each other), when I stop second node
> during big file copy and after while start it again if I run 'ls -lh' on
> gluster mount point I'm getting some self-heal race condition, looking at
> dstat output it seems that running 'ls -lh' on one node makes that node to
> self-heal other with it current file content, I forgot about the log files
> so I don't know what they say about it, I will retest it tomorrow.

replicate xlator will report conflict in cases where it sees conflict.
However if you have configured "favorite-child" option it takes
precedence during conflict. As you have not configured this option you
should have see this behavior i.e:
 "it seems that running 'ls -lh' on one node makes that node to
self-heal other with it current file content"

Can you confirm?

> Using replicate slows down writing a lot, with client-side to 1-2MB/s (!) on
> 1Gb link, with server-side to 10-12MB/s.

are you using dd? can you try with bigger bs value? like 1MB?

Krishna

>
> ==============================================
> client-side replicate, server.vol:
>
> volume brick-data
>     type storage/posix
>     option directory /var/glusterfs/data/
> end-volume
>
> volume locks
>     type features/locks
>     subvolumes brick-data
> end-volume
>
> volume client
>     type performance/io-threads
>     option thread-count 4
>     subvolumes locks
> end-volume
>
> volume server
>     type protocol/server
>     option transport-type tcp
>     subvolumes client
>     option auth.login.client.allow gluster
>     option auth.login.gluster.password pass
> end-volume
>
> ========================================================================
> client-side replicate, client vol:
> volume gluster_134
>     type protocol/client
>     option transport-type tcp
>     option remote-host 172.16.110.134
>     option remote-subvolume client
>     option username gluster
>     option password pass
> end-volume
>
> volume gluster_135
>     type protocol/client
>     option transport-type tcp
>     option remote-host 172.16.110.135
>     option remote-subvolume client
>     option username gluster
>     option password pass
> end-volume
>
> volume afr
>     type cluster/replicate
>     subvolumes gluster_134 gluster_135
> end-volume
>
> volume readahead
>     type performance/read-ahead
>     option page-size 1MB
>     option page-count 2
>     subvolumes afr
> end-volume
>
> volume iocache
>     type performance/io-cache
>     option cache-size 64MB
>     option page-size 256KB
>     option page-count 2
>     subvolumes readahead
> end-volume
>
> volume writeback
>     type performance/write-behind
>     option aggregate-size 1MB
>     option window-size 2MB
>     option flush-behind off
>     subvolumes iocache
> end-volume
>
> volume iothreads
>     type performance/io-threads
>     option thread-count 4
>     subvolumes writeback
> end-volume
>
> =============================================================
> server-side replicate, server.vol:
>
> volume brick
>     type storage/posix
>     option directory /var/glusterfs/data/
> end-volume
>
> volume client_afr
>     type features/locks
>     subvolumes brick
> end-volume
>
> volume brick_134
>     type protocol/client
>     option transport-type tcp/client
>     option remote-host 172.16.110.134
>     option remote-subvolume client_afr
>     option username gluster
>     option password Tj72pAz44
> end-volume
>
> volume afr
>     type cluster/replicate
>     subvolumes client_afr brick_134
> end-volume
>
> volume client
>     type performance/io-threads
>     option thread-count 4
>     subvolumes afr
> end-volume
>
> volume server
>     type protocol/server
>     option transport-type tcp
>     subvolumes client_afr client
>     option auth.login.client.allow gluster
>     option auth.login.client_afr.allow gluster
>     option auth.login.gluster.password pass
> end-volume
>
> ================================================================
> server-side replicate, client.vol:
> volume gluster_134
>     type protocol/client
>     option transport-type tcp
>     option remote-host 172.16.110.134
>     option remote-subvolume client
>     option username gluster
>     option password Tj72pAz44
> end-volume
>
> volume gluster_135
>     type protocol/client
>     option transport-type tcp
>     option remote-host 172.16.110.135
>     option remote-subvolume client
>     option username gluster
>     option password Tj72pAz44
> end-volume
>
> volume ha
>     type cluster/ha
>     subvolumes gluster_135 gluster_134
> end-volume
>
> volume readahead
>     type performance/read-ahead
>     option page-size 1MB
>     option page-count 2
>     subvolumes ha
> end-volume
>
> volume iocache
>     type performance/io-cache
>     option cache-size 64MB
>     option page-size 256KB
>     option page-count 2
>     subvolumes readahead
> end-volume
>
> volume writeback
>     type performance/write-behind
>     option aggregate-size 1MB
>     option window-size 2MB
>     option flush-behind off
>     subvolumes iocache
> end-volume
>
> volume iothreads
>     type performance/io-threads
>     option thread-count 4
>     subvolumes writeback
> end-volume
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>
>