[Gluster-users] Geo-rep failing initial sync
Saravanakumar Arumugam
sarumuga at redhat.com
Mon Oct 19 09:07:14 UTC 2015
Hi Wade,
There seems to be some issue in syncing the existing data in the volume
using Xsync crawl.
( To give some background: When geo-rep is started it goes to filesystem
crawl(Xsync) and sync all the data to slave, and then the session
switches to CHANGELOG mode).
We are looking in to this.
Any specific reason to go for Stripe volume? This seems to be not
extensively tested with geo-rep.
Thanks,
Saravana
On 10/19/2015 08:24 AM, Wade Fitzpatrick wrote:
> The relevant portions of the log appear to be as follows. Everything
> seemed fairly normal (though quite slow) until
>
> [2015-10-08 15:31:26.471216] I
> [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished
> hybrid crawl syncing, stime: (1444278018, 482251)
> [2015-10-08 15:31:34.39248] I
> [syncdutils(/data/gluster1/static/brick1):220:finalize] <top>: exiting.
> [2015-10-08 15:31:34.40934] I [repce(agent):92:service_loop]
> RepceServer: terminating on reaching EOF.
> [2015-10-08 15:31:34.41220] I [syncdutils(agent):220:finalize] <top>:
> exiting.
> [2015-10-08 15:31:35.615353] I [monitor(monitor):362:distribute]
> <top>: slave bricks: [{'host': 'palace', 'dir':
> '/data/gluster1/static/brick1'}, {'host': 'madonna', 'dir'
> : '/data/gluster1/static/brick2'}]
> [2015-10-08 15:31:35.616558] I [monitor(monitor):383:distribute]
> <top>: worker specs: [('/data/gluster1/static/brick1',
> 'ssh://root@palace:gluster://localhost:static', 1)]
> [2015-10-08 15:31:35.748434] I [monitor(monitor):221:monitor] Monitor:
> ------------------------------------------------------------
> [2015-10-08 15:31:35.748775] I [monitor(monitor):222:monitor] Monitor:
> starting gsyncd worker
> [2015-10-08 15:31:35.837651] I [changelogagent(agent):75:__init__]
> ChangelogAgent: Agent listining...
> [2015-10-08 15:31:35.841150] I
> [gsyncd(/data/gluster1/static/brick1):649:main_i] <top>: syncing:
> gluster://localhost:static -> ssh://root@palace:gluster://localhost:static
> [2015-10-08 15:31:38.543379] I
> [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>:
> setting up xsync change detection mode
> [2015-10-08 15:31:38.543802] I
> [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using
> 'tar over ssh' as the sync engine
> [2015-10-08 15:31:38.544673] I
> [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>:
> setting up xsync change detection mode
> [2015-10-08 15:31:38.544924] I
> [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using
> 'tar over ssh' as the sync engine
> [2015-10-08 15:31:38.546163] I
> [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>:
> setting up xsync change detection mode
> [2015-10-08 15:31:38.546406] I
> [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using
> 'tar over ssh' as the sync engine
> [2015-10-08 15:31:38.548989] I
> [master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync
> temp directory:
> /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
> [2015-10-08 15:31:38.549267] I
> [master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync
> temp directory:
> /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
> [2015-10-08 15:31:38.549467] I
> [master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync
> temp directory:
> /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
> [2015-10-08 15:31:38.549632] I
> [resource(/data/gluster1/static/brick1):1432:service_loop] GLUSTER:
> Register time: 1444278698
> [2015-10-08 15:31:38.582277] I
> [master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster: primary
> master with volume id 3f9f810d-a988-4914-a5ca-5bd7b251a273 ...
> [2015-10-08 15:31:38.584099] I
> [master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster: crawl
> interval: 60 seconds
> [2015-10-08 15:31:38.587405] I
> [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting
> hybrid crawl..., stime: (1444278018, 482251)
> [2015-10-08 15:31:38.588735] I
> [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished
> hybrid crawl syncing, stime: (1444278018, 482251)
> [2015-10-08 15:31:38.590116] I
> [master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster: primary
> master with volume id 3f9f810d-a988-4914-a5ca-5bd7b251a273 ...
> [2015-10-08 15:31:38.591582] I
> [master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster: crawl
> interval: 60 seconds
> [2015-10-08 15:31:38.593844] I
> [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting
> hybrid crawl..., stime: (1444278018, 482251)
> [2015-10-08 15:31:38.594832] I
> [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished
> hybrid crawl syncing, stime: (1444278018, 482251)
> [2015-10-08 15:32:38.641908] I
> [master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1
> crawls, 0 turns
> [2015-10-08 15:32:38.644370] I
> [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting
> hybrid crawl..., stime: (1444278018, 482251)
> [2015-10-08 15:32:39.646733] I
> [master(/data/gluster1/static/brick1):1252:crawl] _GMaster: processing
> xsync changelog
> /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync/XSYNC-CHANGELOG.1444278758
> [2015-10-08 15:32:40.857084] W
> [master(/data/gluster1/static/brick1):803:log_failures] _GMaster:
> ENTRY FAILED: ({'uid': 0, 'gfid':
> 'fc446c88-a5b7-468b-ac52-25b4225fe0cf', 'gid': 0, 'mode': 33188,
> 'entry':
> '.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-1.html',
> 'op': 'MKNOD'}, 17, '02489235-13c5-4232-8d6d-c7843bc5249b')
> [2015-10-08 15:32:40.858580] W
> [master(/data/gluster1/static/brick1):803:log_failures] _GMaster:
> ENTRY FAILED: ({'uid': 0, 'gfid':
> 'e08813c5-055a-4354-94ec-f1b41a14b2a4', 'gid': 0, 'mode': 33188,
> 'entry':
> '.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-2.html',
> 'op': 'MKNOD'}, 17, '0abae047-5816-4199-8203-fa8b974dfef5')
>
> ...
>
> [2015-10-08 15:33:38.236779] W
> [master(/data/gluster1/static/brick1):803:log_failures] _GMaster:
> ENTRY FAILED: ({'uid': 1000, 'gfid':
> 'a41a2ac7-8fec-46bd-a4cc-8d8794e5ee39', 'gid
> ': 1000, 'mode': 33206, 'entry':
> '.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/1PYhnxMyMMcQo8ukuyMsqq.png',
> 'op': 'MKNOD'}, 17, 'e047db7d-f96c-496f-8a83-5db8e41859ca')
> [2015-10-08 15:33:38.237443] W
> [master(/data/gluster1/static/brick1):803:log_failures] _GMaster:
> ENTRY FAILED: ({'uid': 1000, 'gfid':
> '507f77db-0dc0-4d7f-9eb3-8f56b3e01765', 'gid
> ': 1000, 'mode': 33206, 'entry':
> '.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/17H7rpUIXGEQemM0wCoy6c.png',
> 'op': 'MKNOD'}, 17, 'ee7fa964-fc92-4008-b38a-e790fbbb1285')
> [2015-10-08 15:33:38.238053] W
> [master(/data/gluster1/static/brick1):803:log_failures] _GMaster:
> ENTRY FAILED: ({'uid': 1000, 'gfid':
> '6c495557-6808-4ff9-98de-39afbbeeac82', 'gid
> ': 1000, 'mode': 33206, 'entry':
> '.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/3T3VvUQH44my0Eosiieeok.png',
> 'op': 'MKNOD'}, 17, 'cc6a75c4-0817-497e-912b-4442fd19db83')
> [2015-10-08 15:33:43.615427] W
> [master(/data/gluster1/static/brick1):1010:process] _GMaster:
> changelogs XSYNC-CHANGELOG.1444278758 could not be processed - moving
> on...
> [2015-10-08 15:33:43.616425] W
> [master(/data/gluster1/static/brick1):1014:process] _GMaster: SKIPPED
> GFID =
> 6c495557-6808-4ff9-98de-39afbbeeac82,16f94158-2f27-421b-9981-94d4197b2b3b,53d01d46-5724-4c77-846f-aacea7a3a447,9fbb536b-b7c6-41e1-8593-43e8a42b3fbe,1923ceff-d9a4-449e-b1c6-ce37c54d242c,3206332f-ed48-48d7-ad3f-cb82fbda0695,7696c570-edd5-481e-8cdc-3e...[truncated]
>
>
> That type of entry repeats until
>
> [2015-10-09 11:12:22.590574] I
> [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished
> hybrid crawl syncing, stime: (1444349280, 617969)
> [2015-10-09 11:13:22.650285] I
> [master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1
> crawls, 1 turns
> [2015-10-09 11:13:22.653459] I
> [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting
> hybrid crawl..., stime: (1444349280, 617969)
> [2015-10-09 11:13:22.670430] W
> [master(/data/gluster1/static/brick1):1366:Xcrawl] _GMaster: irregular
> xtime for
> ./racesoap/nominations/processed/.processed.2015-10-13.T.Ballina.V1.nomination.1444346457.247.Thj1Ly:
> ENOENT
>
> and then there were no more logs until 2015-10-13.
>
> Thanks,
> Wade.
>
> On 16/10/2015 4:33 pm, Aravinda wrote:
>> Oh ok. I overlooked the status output. Please share the
>> geo-replication logs from "james" and "hilton" nodes.
>>
>> regards
>> Aravinda
>>
>> On 10/15/2015 05:55 PM, Wade Fitzpatrick wrote:
>>> Well I'm kind of worried about the 3 million failures listed in the
>>> FAILURES column, the timestamp showing that syncing "stalled" 2 days
>>> ago and the fact that only half of the files have been transferred
>>> to the remote volume.
>>>
>>> On 15/10/2015 9:27 pm, Aravinda wrote:
>>>> Status looks good. Two master bricks are Active and participating
>>>> in syncing. Please let us know the issue you are observing.
>>>> regards
>>>> Aravinda
>>>> On 10/15/2015 11:40 AM, Wade Fitzpatrick wrote:
>>>>> I have twice now tried to configure geo-replication of our
>>>>> Stripe-Replicate volume to a remote Stripe volume but it always
>>>>> seems to have issues.
>>>>>
>>>>> root at james:~# gluster volume info
>>>>>
>>>>> Volume Name: gluster_shared_storage
>>>>> Type: Replicate
>>>>> Volume ID: 5f446a10-651b-4ce0-a46b-69871f498dbc
>>>>> Status: Started
>>>>> Number of Bricks: 1 x 4 = 4
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: james:/data/gluster1/geo-rep-meta/brick
>>>>> Brick2: cupid:/data/gluster1/geo-rep-meta/brick
>>>>> Brick3: hilton:/data/gluster1/geo-rep-meta/brick
>>>>> Brick4: present:/data/gluster1/geo-rep-meta/brick
>>>>> Options Reconfigured:
>>>>> performance.readdir-ahead: on
>>>>>
>>>>> Volume Name: static
>>>>> Type: Striped-Replicate
>>>>> Volume ID: 3f9f810d-a988-4914-a5ca-5bd7b251a273
>>>>> Status: Started
>>>>> Number of Bricks: 1 x 2 x 2 = 4
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: james:/data/gluster1/static/brick1
>>>>> Brick2: cupid:/data/gluster1/static/brick2
>>>>> Brick3: hilton:/data/gluster1/static/brick3
>>>>> Brick4: present:/data/gluster1/static/brick4
>>>>> Options Reconfigured:
>>>>> auth.allow: 10.x.*
>>>>> features.scrub: Active
>>>>> features.bitrot: on
>>>>> performance.readdir-ahead: on
>>>>> geo-replication.indexing: on
>>>>> geo-replication.ignore-pid-check: on
>>>>> changelog.changelog: on
>>>>>
>>>>> root at palace:~# gluster volume info
>>>>>
>>>>> Volume Name: static
>>>>> Type: Stripe
>>>>> Volume ID: 3de935db-329b-4876-9ca4-a0f8d5f184c3
>>>>> Status: Started
>>>>> Number of Bricks: 1 x 2 = 2
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: palace:/data/gluster1/static/brick1
>>>>> Brick2: madonna:/data/gluster1/static/brick2
>>>>> Options Reconfigured:
>>>>> features.scrub: Active
>>>>> features.bitrot: on
>>>>> performance.readdir-ahead: on
>>>>>
>>>>> root at james:~# gluster vol geo-rep static ssh://gluster-b1::static
>>>>> status detail
>>>>>
>>>>> MASTER NODE MASTER VOL MASTER BRICK SLAVE USER
>>>>> SLAVE SLAVE NODE STATUS CRAWL
>>>>> STATUS LAST_SYNCED ENTRY DATA META FAILURES
>>>>> CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
>>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> james static /data/gluster1/static/brick1 root
>>>>> ssh://gluster-b1::static palace Active Changelog Crawl
>>>>> 2015-10-13 14:23:20 0 0 0 1952064 N/A
>>>>> N/A N/A
>>>>> hilton static /data/gluster1/static/brick3 root
>>>>> ssh://gluster-b1::static palace Active Changelog Crawl
>>>>> N/A 0 0 0 1008035
>>>>> N/A N/A N/A
>>>>> present static /data/gluster1/static/brick4 root
>>>>> ssh://gluster-b1::static madonna Passive N/A
>>>>> N/A N/A N/A N/A N/A N/A
>>>>> N/A N/A
>>>>> cupid static /data/gluster1/static/brick2 root
>>>>> ssh://gluster-b1::static madonna Passive N/A
>>>>> N/A N/A N/A N/A N/A N/A
>>>>> N/A N/A
>>>>>
>>>>>
>>>>> So just to clarify, data is striped over bricks 1 and 3; bricks 2
>>>>> and 4 are the replica.
>>>>>
>>>>> Can someone help me diagnose the problem and find a solution?
>>>>>
>>>>> Thanks in advance,
>>>>> Wade.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>
>>>
>>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151019/72ef3526/attachment.html>
More information about the Gluster-users
mailing list