[Gluster-users] GlusterFS geo-replication progress question

Sun Apr 19 18:09:14 UTC 2020

Thanks, Sunny.

alexander iliev

On 4/7/20 12:25 AM, Sunny Kumar wrote:
> Hi Alexander,
> 
> Answers inline below:
> 
> On Thu, Apr 2, 2020 at 1:08 AM Alexander Iliev <ailiev+gluster at mamul.org> wrote:
>>
>> Hi all,
>>
>> I have a running geo-replication session between two clusters and I'm
>> trying to figure out what is the current progress of the replication and
>> possibly how much longer it will take.
>>
>> It has been running for quite a while now (> 1 month), but the thing is
>> that both the hardware of the nodes and the link between the two
>> clusters aren't that great (e.g., the volumes are backed by rotating
>> disks) and the volume is somewhat sizeable (30-ish TB) and given these
>> details I'm not really sure how long it is supposed to take normally.
>>
>> I have several bricks in the volume (same brick size and physical layout
>> in both clusters) that are now showing up with a Changelog Crawl status
>> and with a recent LAST_SYNCED date in the `gluster colume
>> geo-replication status detail` command output which seems to be the
>> desired state for all bricks. The rest of the bricks though are in
>> Hybrid Crawl state and have been in that state forever.
>>
>> So I suppose my questions are - how can I tell if the replication
>> session is somehow broken and if it's not, then is there are way for me
>> to find out the progress and the ETA of the replication?
>>
> Please go through this section[1] which talks about this.
> In Hybrid crawl at present we do not have any accounting information
> like how much time it will take to sync data.
> 
>> In /var/log/glusterfs/geo-replication/$session_dir/gsyncd.log there are
>> some errors like:
>>
>> [2020-03-31 11:48:47.81269] E [syncdutils(worker
>> /data/gfs/store1/8/brick):822:errlog] Popen: command returned error
>> cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
>> /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto
>> -S /tmp/gsync
>> d-aux-ssh-6aDWmc/206c4b2c3eb782ea2cf49ab5142bd68b.sock x.x.x.x
>> /nonexistent/gsyncd slave <vol> x.x.x.x::<vol> --master-node x.x.x.x
>> --master-node-id 9476b8bb-d7ee-489a-b083-875805343e67 --master-brick
>> <brick_path> --local-node x.x.x.x
>> 2 --local-node-id 426b564d-35d9-4291-980e-795903e9a386 --slave-timeout
>> 120 --slave-log-level INFO --slave-gluster-log-level INFO
>> --slave-gluster-command-dir /usr/sbin    error=1
>> [2020-03-31 11:48:47.81617] E [syncdutils(worker
>> <brick_path>):826:logerr] Popen: ssh> failed with ValueError.
>> [2020-03-31 11:48:47.390397] I [repce(agent
>> <brick_path>):97:service_loop] RepceServer: terminating on reaching EOF.
>>
> 
> If you are seeing this error at a regular interval then please check
> your ssh connection, it might have broken.
> If possible please share full traceback form both master and slave to
> debug the issue.
> 
>> In the brick logs I see stuff like:
>>
>> [2020-03-29 07:49:05.338947] E [fuse-bridge.c:4167:fuse_xattr_cbk]
>> 0-glusterfs-fuse: extended attribute not supported by the backend storage
>>
>> I don't know if these are critical, from the rest of the logs it looks
>> like data is traveling between the clusters.
>>
>> Any help will be greatly appreciated. Thank you in advance!
>>
>> Best regards,
>> --
>> alexander iliev
>> ________
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
> [1]. https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#status
> 
> /sunny
>