[Gluster-users] Issue with geo-replication and nfs auth

Fri May 13 09:42:20 UTC 2011

On 2011-05-12, Cedric Lagneau <cedric.lagneau at openwide.fr> wrote:
> My initial problem on the testing platform is not solved: glusterd geo-replication command stop working after about one day.
>
> On Master:
> #cat ssh%3A%2F%2Froot%40slave.mydomain.com%3Afile%3A%2F%2F%2Fdata%2Ftest2.log 
> [2011-05-12 10:50:53.451495] I [monitor(monitor):19:set_state] Monitor: new state: starting...
> [2011-05-12 10:50:53.465759] I [monitor(monitor):42:monitor] Monitor: ------------------------------------------------------------
> [2011-05-12 10:50:53.466232] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker
> [2011-05-12 10:50:53.596132] I [gsyncd:287:main_i] <top>: syncing: gluster://localhost:test2 -> ssh://slave.mydomain.com:/data/test2
> [2011-05-12 10:50:53.641566] D [repce:131:push] RepceClient: call 1879:140148091115264:1305190253.64 __repce_version__() ...
> [2011-05-12 10:50:53.751271] E [syncdutils:131:log_raise_exception] <top>: FAIL: 
> Traceback (most recent call last):
>   File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/syncdutils.py", line 152, in twrap
>     tf(*aa)
>   File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/repce.py", line 118, in listen
>     rid, exc, res = recv(self.inf)
>   File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/repce.py", line 42, in recv
>     return pickle.load(inf)
> EOFError
> [2011-05-12 10:50:53.759484] D [monitor(monitor):57:monitor] Monitor: worker got connected in 0 sec, waiting 59 more to make sure it's fine
> [2011-05-12 10:51:53.535005] I [monitor(monitor):19:set_state] Monitor: new state: faulty
>
> There is not test2-gluster.log.
>
> On Slave:
> no log (in debug mode) and no process /usr/bin/python /usr/lib/glusterfs/glusterfs/python/syncdaemon/gsyncd.py
>
>
> tcpdump on SLAVE show some ssh traffic with Master server when i start geo-replication.
>
> glusterd strace on master with a starting geo-replication with status faulty:

It would be more interesting to strace the execution of the remote gsyncd. That can be accomplished by
smuggling in strace to the remote-gsyncd command:

# gluster volume geo-replication test2 slave.mydomain.com::/data/test2 config remote-gsyncd \
    "strace -f -s512 -o /tmp/gsyncd-test2.slog `gluster volume geo-replication test2 slave.mydomain.com::/data/test2 config remote-gsyncd`"

>From that we can read out why remote gsyncd invocation/initialization fails.

Csaba