[Gluster-users] GlusterFS geo-replication progress question
Alexander Iliev
ailiev+gluster at mamul.org
Thu Apr 2 00:08:36 UTC 2020
Hi all,
I have a running geo-replication session between two clusters and I'm
trying to figure out what is the current progress of the replication and
possibly how much longer it will take.
It has been running for quite a while now (> 1 month), but the thing is
that both the hardware of the nodes and the link between the two
clusters aren't that great (e.g., the volumes are backed by rotating
disks) and the volume is somewhat sizeable (30-ish TB) and given these
details I'm not really sure how long it is supposed to take normally.
I have several bricks in the volume (same brick size and physical layout
in both clusters) that are now showing up with a Changelog Crawl status
and with a recent LAST_SYNCED date in the `gluster colume
geo-replication status detail` command output which seems to be the
desired state for all bricks. The rest of the bricks though are in
Hybrid Crawl state and have been in that state forever.
So I suppose my questions are - how can I tell if the replication
session is somehow broken and if it's not, then is there are way for me
to find out the progress and the ETA of the replication?
In /var/log/glusterfs/geo-replication/$session_dir/gsyncd.log there are
some errors like:
[2020-03-31 11:48:47.81269] E [syncdutils(worker
/data/gfs/store1/8/brick):822:errlog] Popen: command returned error
cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto
-S /tmp/gsync
d-aux-ssh-6aDWmc/206c4b2c3eb782ea2cf49ab5142bd68b.sock x.x.x.x
/nonexistent/gsyncd slave <vol> x.x.x.x::<vol> --master-node x.x.x.x
--master-node-id 9476b8bb-d7ee-489a-b083-875805343e67 --master-brick
<brick_path> --local-node x.x.x.x
2 --local-node-id 426b564d-35d9-4291-980e-795903e9a386 --slave-timeout
120 --slave-log-level INFO --slave-gluster-log-level INFO
--slave-gluster-command-dir /usr/sbin error=1
[2020-03-31 11:48:47.81617] E [syncdutils(worker
<brick_path>):826:logerr] Popen: ssh> failed with ValueError.
[2020-03-31 11:48:47.390397] I [repce(agent
<brick_path>):97:service_loop] RepceServer: terminating on reaching EOF.
In the brick logs I see stuff like:
[2020-03-29 07:49:05.338947] E [fuse-bridge.c:4167:fuse_xattr_cbk]
0-glusterfs-fuse: extended attribute not supported by the backend storage
I don't know if these are critical, from the rest of the logs it looks
like data is traveling between the clusters.
Any help will be greatly appreciated. Thank you in advance!
Best regards,
--
alexander iliev
More information about the Gluster-users
mailing list