[Gluster-users] Problem with Gluster Geo Replication, status faulty

Thu Aug 4 06:42:40 UTC 2011

The geo-replication module forks (gsyncd) two process'  one in the *Master
node* (where geo-replication start was invoked) and another in *Slave node*.
The two gsyncd' communicate with each other through a pair of pipe either on
the same machine or through an ssh-tunnel (depending on where the *Slave
node* resides),

The above log basically means that the communication-channel between Master
and Slave is failing.

Here are the few reasons for which it may occur:

1) The ssh tunnel b/w the *Master & Slave gsyncd* is broken:

        The pre-requisite for geo-rep b/w Master and Slave to work is to
have a passwordless SSH setup b/w them either as mentioned in
http://www.gluster.com/community/documentation/index.php/Gluster_3.2:_Setting_Up_the_Environment_for_Geo-replication
or
as it is done normally.

Verify if the password-less SSH is working fine.

2)  The *Master gsyncd* could *not* *spawn* the *Slave gsyncd* session
successfully:

           a) This could be due to the SSH-tunnel not been setup as desired,
 or
           b) The Master gsyncd spawns the Slave gsyncd process by locating
the gsyncd executable in the slave in a predefined location, if the gsyncd
executable is not found in the expected location the above scenario might
occur.  Execute the following in the Master node to see where Master gsyncd
expects the gsyncd executable in the Slave Node.
                   #gluster volume geo-replication <Master-Volume>
<Slave-URI> config remote-gsyncd

The outuput might would be a location similar to:

                   /usr/local/libexec/glusterfs/gsyncd

Verify in the Slave node if the above output is valid executable. If not
configure the remote-gsyncd to point to the appropriate location by
executing the following command in the Master node.

                  #gluster volume geo-replication <Master-Volume>
<Slave-URI> config remote-gsyncd  <new_location>

          c) The *Slave gsyncd* process *died* unexpectedly after being
spawned:

                    - If the Slave  is a plain directory the gsyncd expects
the directory to be created already. Verify if the directory has already
been created.

                   -  If the Slave is a Gluster Volume then verify if  *Slave
volume is started.*
*
*
*                   - * If  the Slave is a Gluster Volume , gsyncd does a
fuse mount on the Slave Volume, if the mount fails (maybe due to
*fuse*module not running) the
*Slave gsyncd*  *dies* if  it cannot mount the Slave Volume.

                   For all the above scenarios looking at the Slave gsynd
 as well as auxialarry gluster log might throw more light on the issue, to
locate the respective logs refer to the *Locating Log Files* section in
http://gluster.com/community/documentation/index.php/Gluster_3.2:_Troubleshooting_Geo-replication

              If the problem persists post all the above logs by running the
geo-replication session in DEBUG log level by executing the following
command:

               #gluster volume geo-replication <Master-Volume> <Slave-URI>
config-log-level DEBUG

 So that the problem could be root caused. A lot of log improvements is done
in the devel branch, will have a more sanitized logs in the future releases.

Regards,
Kaushik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110804/941a9f2b/attachment.html>