[Gluster-users] Problem with Gluster Geo Replication, status faulty

Thu Aug 4 16:45:09 UTC 2011

Thanks for writing this up, Kaushik. Can you add this to a post on community.gluster.org?

Thanks,
JM

________________________________
From: gluster-users-bounces at gluster.org [gluster-users-bounces at gluster.org] on behalf of Kaushik BV [kaushikbv at gluster.com]
Sent: Wednesday, August 03, 2011 11:42 PM
To: Dantas; zhourong.miao at flixlab.com
Cc: gluster-users at gluster.org
Subject: Re: [Gluster-users] Problem with Gluster Geo Replication, status faulty

The geo-replication module forks (gsyncd) two process'  one in the Master node (where geo-replication start was invoked) and another in Slave node.
The two gsyncd' communicate with each other through a pair of pipe either on the same machine or through an ssh-tunnel (depending on where the Slave node resides),

The above log basically means that the communication-channel between Master and Slave is failing.

Here are the few reasons for which it may occur:

1) The ssh tunnel b/w the Master & Slave gsyncd is broken:

        The pre-requisite for geo-rep b/w Master and Slave to work is to have a passwordless SSH setup b/w them either as mentioned in  http://www.gluster.com/community/documentation/index.php/Gluster_3.2:_Setting_Up_the_Environment_for_Geo-replication or as it is done normally.

Verify if the password-less SSH is working fine.

2)  The Master gsyncd could not spawn the Slave gsyncd session successfully:

           a) This could be due to the SSH-tunnel not been setup as desired,  or
           b) The Master gsyncd spawns the Slave gsyncd process by locating the gsyncd executable in the slave in a predefined location, if the gsyncd executable is not found in the expected location the above scenario might occur.  Execute the following in the Master node to see where Master gsyncd expects the gsyncd executable in the Slave Node.
                   #gluster volume geo-replication <Master-Volume> <Slave-URI> config remote-gsyncd

The outuput might would be a location similar to:

                   /usr/local/libexec/glusterfs/gsyncd

Verify in the Slave node if the above output is valid executable. If not configure the remote-gsyncd to point to the appropriate location by executing the following command in the Master node.

                  #gluster volume geo-replication <Master-Volume> <Slave-URI> config remote-gsyncd  <new_location>

          c) The Slave gsyncd process died unexpectedly after being spawned:

                    - If the Slave  is a plain directory the gsyncd expects the directory to be created already. Verify if the directory has already been created.

                   -  If the Slave is a Gluster Volume then verify if  Slave volume is started.

                   -  If  the Slave is a Gluster Volume , gsyncd does a fuse mount on the Slave Volume, if the mount fails (maybe due to fuse module not running) the Slave gsyncd  dies if  it cannot mount the Slave Volume.

                   For all the above scenarios looking at the Slave gsynd  as well as auxialarry gluster log might throw more light on the issue, to locate the respective logs refer to the Locating Log Files section in http://gluster.com/community/documentation/index.php/Gluster_3.2:_Troubleshooting_Geo-replication

              If the problem persists post all the above logs by running the geo-replication session in DEBUG log level by executing the following command:

               #gluster volume geo-replication <Master-Volume> <Slave-URI> config-log-level DEBUG

 So that the problem could be root caused. A lot of log improvements is done in the devel branch, will have a more sanitized logs in the future releases.

Regards,
Kaushik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110804/87661a2a/attachment.html>