[Gluster-users] Issues with geo-rep

Carl Chenet chaica at ohmytux.com
Thu Jul 7 17:56:46 UTC 2011


On 07/07/2011 15:25, Kaushik BV wrote:
> Hi Chaica,
>
> This primarily means that the RPC communtication between the master
> gsyncd module and slave gsyncd module is broken, this could happen to
> various reasons. Check if it satisies all the pre-requisites:
>
> - If FUSE is installed in the machine, since Geo-replication module
> mounts the GlusterFS volume using FUSE to sync data.
> - If the Slave is a volume, check if the volume is started.
> - If the Slave is a plain directory, check if the directory has been
> created already with the desired permissions (Not applicable in your case)
> - If Glusterfs 3.2 is not installed in the default location (in Master)
> and has been prefixed to be installed in a custom location, configure
> the *gluster-command*  for it to point to exact location.
> - If Glusterfs 3.2 is not installed in the default location (in slave)
> and has been prefixed to be installed in a custom location, configure
> the *remote-gsyncd-command*  for it to point to exact place where gsyncd
>   is located.
> - locate the slave log and see if it has any anomalies.
> - Passwordless SSH is set up properly between the host and the remote
> machine ( Not applicable in your case)

Ok the situation has slightly evolved. Now I do have a slave log and 
clearer error message on the master :


[2011-07-07 19:53:16.258866] I [monitor(monitor):42:monitor] Monitor: 
------------------------------------------------------------
[2011-07-07 19:53:16.259073] I [monitor(monitor):43:monitor] Monitor: 
starting gsyncd worker
[2011-07-07 19:53:16.332720] I [gsyncd:286:main_i] <top>: syncing: 
gluster://localhost:test-volume -> ssh://192.168.1.32::test-volume
[2011-07-07 19:53:16.343554] D [repce:131:push] RepceClient: call 
6302:140305661662976:1310061196.34 __repce_version__() ...
[2011-07-07 19:53:20.931523] D [repce:141:__call__] RepceClient: call 
6302:140305661662976:1310061196.34 __repce_version__ -> 1.0
[2011-07-07 19:53:20.932172] D [repce:131:push] RepceClient: call 
6302:140305661662976:1310061200.93 version() ...
[2011-07-07 19:53:20.933662] D [repce:141:__call__] RepceClient: call 
6302:140305661662976:1310061200.93 version -> 1.0
[2011-07-07 19:53:20.933861] D [repce:131:push] RepceClient: call 
6302:140305661662976:1310061200.93 pid() ...
[2011-07-07 19:53:20.934525] D [repce:141:__call__] RepceClient: call 
6302:140305661662976:1310061200.93 pid -> 10075
[2011-07-07 19:53:20.957355] E [syncdutils:131:log_raise_exception] 
<top>: FAIL:
Traceback (most recent call last):
   File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/gsyncd.py", line 
102, in main
     main_i()
   File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/gsyncd.py", line 
293, in main_i
     local.connect()
   File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/resource.py", 
line 379, in connect
     raise RuntimeError("command failed: " + " ".join(argv))
RuntimeError: command failed: /usr/sbin/glusterfs --xlator-option 
*-dht.assert-no-child-down=true -L DEBUG -l 
/var/log/glusterfs/geo-replication/test-volume/ssh%3A%2F%2Froot%40192.168.1.32%3Agluster%3A%2F%2F127.0.0.1%3Atest-volume.gluster.log 
-s localhost --volfile-id test-volume --client-pid=-1 
/tmp/gsyncd-aux-mount-hy6T_w
[2011-07-07 19:53:20.960621] D [monitor(monitor):58:monitor] Monitor: 
worker seems to be connected (?? racy check)
[2011-07-07 19:53:21.962501] D [monitor(monitor):62:monitor] Monitor: 
worker died in startup phase

The command launched by glusterfs returns a 255 error shell code, which 
I belive means the command is terminated by a signal. On the slave log I 
have :

[2011-07-07 19:54:49.571549] I [fuse-bridge.c:3218:fuse_thread_proc] 
0-fuse: unmounting /tmp/gsyncd-aux-mount-z2Q2Hg
[2011-07-07 19:54:49.572459] W [glusterfsd.c:712:cleanup_and_exit] 
(-->/lib/libc.so.6(clone+0x6d) [0x7f2c8998b02d] 
(-->/lib/libpthread.so.0(+0x68ba) [0x7f2c89c238ba] 
(-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xc5) [0x7f2c8a8f51b5]))) 
0-: received signum (15), shutting down
[2011-07-07 19:54:51.280207] W [write-behind.c:3029:init] 
0-test-volume-write-behind: disabling write-behind for first 0 bytes
[2011-07-07 19:54:51.291669] I [client.c:1935:notify] 
0-test-volume-client-0: parent translators are ready, attempting connect 
on transport
[2011-07-07 19:54:51.292329] I [client.c:1935:notify] 
0-test-volume-client-1: parent translators are ready, attempting connect 
on transport
[2011-07-07 19:55:38.582926] I [rpc-clnt.c:1531:rpc_clnt_reconfig] 
0-test-volume-client-0: changing port to 24009 (from 0)
[2011-07-07 19:55:38.583456] I [rpc-clnt.c:1531:rpc_clnt_reconfig] 
0-test-volume-client-1: changing port to 24009 (from 0)

Bye,
Carl Chenet



More information about the Gluster-users mailing list