[Gluster-users] How to reconcile two servers content after a crash ?

Mon Mar 14 18:40:49 UTC 2011

Hello,

We have a configuration involving two servers and two clients.
One of the two servers crashed, and now that we have restarted it, we'd
like to see the data that have been updated on the running server to be
copied/reconciled with the restored machine.

But nothing seems to happen, though gluster is running on all the machines.

Server 1 - crashed and restored, not updated :
root      1920  0.0  0.0  25368   920 ?        Ssl  07:54   0:00 /usr/sbin/glusterfsd -f /etc/glusterfs/glusterfsd.vol -l /var/log/glusterfs/server.log

Server 2 - Always Up, update by clients
root     28069  0.6  0.2  31708  9468 ?        Ssl   2010 1003:56 /usr/sbin/glusterfsd -f /etc/glusterfs/glusterfsd.vol -l /var/log/glusterfs/server.log

Client 1 and Client 2 :
root     13767  2.1  3.7 526464 306944 ?       Ssl   2010 3166:18 /usr/sbin/glusterfs --log-level=NORMAL --volfile=/etc/glusterfs/glusterfs-client.vol /data

Clients are accessing gluster through the mount invoked via fstab :
/etc/glusterfs/glusterfs-client.vol /data      glusterfs   defaults    0 0

Version informations : all machines are running the same version :
[root at nasdash-01 d]# /usr/sbin/glusterfsd -V
glusterfs 2.0.9 built on Jan 25 2010 15:59:44
Repository revision: v2.0.9

And the configurations files are :

  -- %< -- snip -- %< -- snip -- %< -- snip -- %< -- snip -- %< -- snip --

Client configuration file : Please note that the favorite-child is the
server that crashed and is no more up to date.

# glusterfs client volfile
# 
# as per the suggestion of the gluster devs, the AFR translator
#  has been moved to the clients under 1.4.x

volume nasdash-01
  type protocol/client
  option transport-type tcp
  option remote-host 10.0.0.11
  option remote-subvolume prod-ds-iothreads
  option transport-timeout 10
end-volume

volume nasdash-02
  type protocol/client
  option transport-type tcp
  option remote-host 10.0.0.12
  option remote-subvolume prod-ds-iothreads
  option transport-timeout 10
end-volume

volume nasdash-afr
  type cluster/replicate                        # afr renamed to replicate
  subvolumes nasdash-01 nasdash-02
  option favorite-child nasdash-01
end-volume

volume write-behind
  type performance/write-behind
  option flush-behind on
  subvolumes nasdash-afr
end-volume

  -- %< -- snip -- %< -- snip -- %< -- snip -- %< -- snip -- %< -- snip --
Servers configuration file :
# glusterfs server volfile
# 
# as per the suggestion of the gluster devs, the AFR translator
#  has been moved to the clients under 1.4.x

# dataspace
volume prod-ds
  type storage/posix
  option directory /opt/nas-dash
end-volume

# posix locks for prod-ds
volume prod-ds-locks
  type features/posix-locks
  subvolumes prod-ds
end-volume

# threaded IO performance translator
volume prod-ds-iothreads
  type performance/io-threads
  option thread-count 4
  #option cache-size 64MB
  subvolumes prod-ds-locks
end-volume

# server declaration
volume server
  type protocol/server
  option transport-type tcp/server
  subvolumes prod-ds-iothreads
  option auth.addr.prod-ds-iothreads.allow 10.0.0.*,127.0.0.1
end-volume

  -- %< -- snip -- %< -- snip -- %< -- snip -- %< -- snip -- %< -- snip --

I'm wondering what is the expected procedure to apply to perform a correct
reconcile of the two servers.

I've been thinking of running an rsync operation, but that doesn't seem to
be the best approach as this synchronisation should be handled by gluster.

Best regards,
Paul