[Gluster-users] GlusterFS geo-replication progress question

Mon Apr 6 22:25:33 UTC 2020

Hi Alexander,

Answers inline below:

On Thu, Apr 2, 2020 at 1:08 AM Alexander Iliev <ailiev+gluster at mamul.org> wrote:
>
> Hi all,
>
> I have a running geo-replication session between two clusters and I'm
> trying to figure out what is the current progress of the replication and
> possibly how much longer it will take.
>
> It has been running for quite a while now (> 1 month), but the thing is
> that both the hardware of the nodes and the link between the two
> clusters aren't that great (e.g., the volumes are backed by rotating
> disks) and the volume is somewhat sizeable (30-ish TB) and given these
> details I'm not really sure how long it is supposed to take normally.
>
> I have several bricks in the volume (same brick size and physical layout
> in both clusters) that are now showing up with a Changelog Crawl status
> and with a recent LAST_SYNCED date in the `gluster colume
> geo-replication status detail` command output which seems to be the
> desired state for all bricks. The rest of the bricks though are in
> Hybrid Crawl state and have been in that state forever.
>
> So I suppose my questions are - how can I tell if the replication
> session is somehow broken and if it's not, then is there are way for me
> to find out the progress and the ETA of the replication?
>
Please go through this section[1] which talks about this.
In Hybrid crawl at present we do not have any accounting information
like how much time it will take to sync data.

> In /var/log/glusterfs/geo-replication/$session_dir/gsyncd.log there are
> some errors like:
>
> [2020-03-31 11:48:47.81269] E [syncdutils(worker
> /data/gfs/store1/8/brick):822:errlog] Popen: command returned error
> cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
> /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto
> -S /tmp/gsync
> d-aux-ssh-6aDWmc/206c4b2c3eb782ea2cf49ab5142bd68b.sock x.x.x.x
> /nonexistent/gsyncd slave <vol> x.x.x.x::<vol> --master-node x.x.x.x
> --master-node-id 9476b8bb-d7ee-489a-b083-875805343e67 --master-brick
> <brick_path> --local-node x.x.x.x
> 2 --local-node-id 426b564d-35d9-4291-980e-795903e9a386 --slave-timeout
> 120 --slave-log-level INFO --slave-gluster-log-level INFO
> --slave-gluster-command-dir /usr/sbin    error=1
> [2020-03-31 11:48:47.81617] E [syncdutils(worker
> <brick_path>):826:logerr] Popen: ssh> failed with ValueError.
> [2020-03-31 11:48:47.390397] I [repce(agent
> <brick_path>):97:service_loop] RepceServer: terminating on reaching EOF.
>

If you are seeing this error at a regular interval then please check
your ssh connection, it might have broken.
If possible please share full traceback form both master and slave to
debug the issue.

> In the brick logs I see stuff like:
>
> [2020-03-29 07:49:05.338947] E [fuse-bridge.c:4167:fuse_xattr_cbk]
> 0-glusterfs-fuse: extended attribute not supported by the backend storage
>
> I don't know if these are critical, from the rest of the logs it looks
> like data is traveling between the clusters.
>
> Any help will be greatly appreciated. Thank you in advance!
>
> Best regards,
> --
> alexander iliev
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
[1]. https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#status

/sunny