[Gluster-users] Geo-Rep. 3.5.3, Missing Files, Incorrect "Files Pending"

Wed May 6 18:48:41 UTC 2015

Is it reasonable for me to just remove all of the XSYNC-CHANGELOG
files to make it start over with a full sync?

I just want to figure out how to get it to pick up again. Is it better
to remove and re-create the geo-rep session?

Thanks,
Dave

On Tue, May 5, 2015 at 9:27 AM, David Gibbons <david.c.gibbons at gmail.com> wrote:
> I caught one of the nodes transitioning into faulty mode, log output is
> below.
>
>>
>>  In master nodes, look for log messages. Let us know if you feel any issue
>> in log messages. (/var/log/glusterfs/geo-replication/)
>
> When one of the nodes drops into "faulty", which happens periodically, this
> is the type of output that appears in the log:
>
> [root at gfs-a-1 ~]# tail
> /usr/local/var/log/glusterfs/geo-replication/shares/ssh%3A%2F%2Froot%4010.XX.XXX.X%3Agluster%3A%2F%2F127.0.0.1%3Abkpshares.log
> [2015-05-05 09:22:58.140913] W
> [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync:
> .gfid/065c09f9-4502-4a2c-81fa-5e8fcaf22712 [errcode: 23]
> [2015-05-05 09:22:58.152951] W
> [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync:
> .gfid/28a237a4-4346-48c5-bd1c-713273f591c7 [errcode: 23]
> [2015-05-05 09:22:58.327603] W
> [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync:
> .gfid/5755db3e-e9d8-42d2-b415-890842b086ae [errcode: 23]
> [2015-05-05 09:22:58.336714] W
> [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync:
> .gfid/0b7fc219-1e31-4e66-865f-5ae1c26d5e54 [errcode: 23]
> [2015-05-05 09:22:58.360308] W
> [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync:
> .gfid/955cd0e4-dd06-4db6-9391-34dbf72c9b06 [errcode: 23]
> [2015-05-05 09:22:58.367522] W
> [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync:
> .gfid/1d455725-c3e1-4111-92e5-335610d3f513 [errcode: 23]
> [2015-05-05 09:22:58.368226] W
> [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync:
> .gfid/7ce881ae-3491-4e21-b38b-0a27fb620c74 [errcode: 23]
> [2015-05-05 09:22:58.368959] W
> [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync:
> .gfid/056732c1-1537-4925-a30c-b905c110a5b2 [errcode: 23]
> [2015-05-05 09:22:58.369635] W
> [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync:
> .gfid/8c58d6c5-9975-43c6-8f4c-2a92337f7350 [errcode: 23]
> [2015-05-05 09:22:58.369790] W
> [master(/mnt/a-1-shares-brick-2/brick):877:process] _GMaster: incomplete
> sync, retrying changelogs: XSYNC-CHANGELOG.1430830891
>
> When the node is in "active" mode, I get a lot of log output that resembles
> this:
> [2015-05-05 09:23:54.735502] W
> [master(/mnt/a-1-shares-brick-3/brick):877:process] _GMaster: incomplete
> sync, retrying changelogs: XSYNC-CHANGELOG.1430832227
> [2015-05-05 09:23:55.449265] W
> [master(/mnt/a-1-shares-brick-3/brick):250:regjob] <top>: Rsync:
> .gfid/0665be16-04e9-4cbe-a2c9-a633caa8c79d [errcode: 23]
> [2015-05-05 09:23:55.449491] W
> [master(/mnt/a-1-shares-brick-3/brick):877:process] _GMaster: incomplete
> sync, retrying changelogs: XSYNC-CHANGELOG.1430832227
> [2015-05-05 09:23:56.277033] W
> [master(/mnt/a-1-shares-brick-3/brick):250:regjob] <top>: Rsync:
> .gfid/0665be16-04e9-4cbe-a2c9-a633caa8c79d [errcode: 23]
> [2015-05-05 09:23:56.277259] W
> [master(/mnt/a-1-shares-brick-3/brick):860:process] _GMaster: changelogs
> XSYNC-CHANGELOG.1430832227 could not be processed - moving on...
> [2015-05-05 09:23:56.294038] W
> [master(/mnt/a-1-shares-brick-3/brick):862:process] _GMaster: SKIPPED GFID =
> [2015-05-05 09:23:56.381592] I
> [master(/mnt/a-1-shares-brick-3/brick):1130:crawl] _GMaster: finished hybrid
> crawl syncing
> [2015-05-05 09:24:24.404884] I
> [master(/mnt/a-1-shares-brick-4/brick):445:crawlwrap] _GMaster: 1 crawls, 1
> turns
> [2015-05-05 09:24:24.437452] I
> [master(/mnt/a-1-shares-brick-4/brick):1124:crawl] _GMaster: starting hybrid
> crawl...
> [2015-05-05 09:24:24.588865] I
> [master(/mnt/a-1-shares-brick-1/brick):1133:crawl] _GMaster: processing
> xsync changelog
> /usr/local/var/run/gluster/shares/ssh%3A%2F%2Froot%4010.XX.XXX.X%3Agluster%3A%2F%2F127.0.0.1%3Abkpshares/9d9a72f468c582609e97e8929e58b9ff/xsync/XSYNC-CHANGELOG.1430832135
>
> This begs a couple of questions for me:
>
> Are these errcode:23 issues files that have been deleted/renamed since the
> changelog was created?
> Is it correct/expected for the node to drop into faulty and then recover
> itself to active periodically?
>
> Thank you again for your assistance!
> Dave