[Gluster-users] Geo-rep failing initial sync

Saravanakumar Arumugam sarumuga at redhat.com
Mon Oct 19 09:07:14 UTC 2015


Hi Wade,

There seems to be some issue in syncing the existing data in the volume 
using Xsync crawl.
( To give some background: When geo-rep is started it goes to filesystem 
crawl(Xsync) and sync all the data to slave, and then the session 
switches to CHANGELOG mode).

We are looking in to this.

Any specific reason to go for Stripe volume?  This seems to be not 
extensively tested with geo-rep.

Thanks,
Saravana

On 10/19/2015 08:24 AM, Wade Fitzpatrick wrote:
> The relevant portions of the log appear to be as follows. Everything 
> seemed fairly normal (though quite slow) until
>
> [2015-10-08 15:31:26.471216] I 
> [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished 
> hybrid crawl syncing, stime: (1444278018, 482251)
> [2015-10-08 15:31:34.39248] I 
> [syncdutils(/data/gluster1/static/brick1):220:finalize] <top>: exiting.
> [2015-10-08 15:31:34.40934] I [repce(agent):92:service_loop] 
> RepceServer: terminating on reaching EOF.
> [2015-10-08 15:31:34.41220] I [syncdutils(agent):220:finalize] <top>: 
> exiting.
> [2015-10-08 15:31:35.615353] I [monitor(monitor):362:distribute] 
> <top>: slave bricks: [{'host': 'palace', 'dir': 
> '/data/gluster1/static/brick1'}, {'host': 'madonna', 'dir'
> : '/data/gluster1/static/brick2'}]
> [2015-10-08 15:31:35.616558] I [monitor(monitor):383:distribute] 
> <top>: worker specs: [('/data/gluster1/static/brick1', 
> 'ssh://root@palace:gluster://localhost:static', 1)]
> [2015-10-08 15:31:35.748434] I [monitor(monitor):221:monitor] Monitor: 
> ------------------------------------------------------------
> [2015-10-08 15:31:35.748775] I [monitor(monitor):222:monitor] Monitor: 
> starting gsyncd worker
> [2015-10-08 15:31:35.837651] I [changelogagent(agent):75:__init__] 
> ChangelogAgent: Agent listining...
> [2015-10-08 15:31:35.841150] I 
> [gsyncd(/data/gluster1/static/brick1):649:main_i] <top>: syncing: 
> gluster://localhost:static -> ssh://root@palace:gluster://localhost:static
> [2015-10-08 15:31:38.543379] I 
> [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: 
> setting up xsync change detection mode
> [2015-10-08 15:31:38.543802] I 
> [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 
> 'tar over ssh' as the sync engine
> [2015-10-08 15:31:38.544673] I 
> [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: 
> setting up xsync change detection mode
> [2015-10-08 15:31:38.544924] I 
> [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 
> 'tar over ssh' as the sync engine
> [2015-10-08 15:31:38.546163] I 
> [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: 
> setting up xsync change detection mode
> [2015-10-08 15:31:38.546406] I 
> [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 
> 'tar over ssh' as the sync engine
> [2015-10-08 15:31:38.548989] I 
> [master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync 
> temp directory: 
> /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
> [2015-10-08 15:31:38.549267] I 
> [master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync 
> temp directory: 
> /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
> [2015-10-08 15:31:38.549467] I 
> [master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync 
> temp directory: 
> /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
> [2015-10-08 15:31:38.549632] I 
> [resource(/data/gluster1/static/brick1):1432:service_loop] GLUSTER: 
> Register time: 1444278698
> [2015-10-08 15:31:38.582277] I 
> [master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster: primary 
> master with volume id 3f9f810d-a988-4914-a5ca-5bd7b251a273 ...
> [2015-10-08 15:31:38.584099] I 
> [master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster: crawl 
> interval: 60 seconds
> [2015-10-08 15:31:38.587405] I 
> [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting 
> hybrid crawl..., stime: (1444278018, 482251)
> [2015-10-08 15:31:38.588735] I 
> [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished 
> hybrid crawl syncing, stime: (1444278018, 482251)
> [2015-10-08 15:31:38.590116] I 
> [master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster: primary 
> master with volume id 3f9f810d-a988-4914-a5ca-5bd7b251a273 ...
> [2015-10-08 15:31:38.591582] I 
> [master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster: crawl 
> interval: 60 seconds
> [2015-10-08 15:31:38.593844] I 
> [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting 
> hybrid crawl..., stime: (1444278018, 482251)
> [2015-10-08 15:31:38.594832] I 
> [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished 
> hybrid crawl syncing, stime: (1444278018, 482251)
> [2015-10-08 15:32:38.641908] I 
> [master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1 
> crawls, 0 turns
> [2015-10-08 15:32:38.644370] I 
> [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting 
> hybrid crawl..., stime: (1444278018, 482251)
> [2015-10-08 15:32:39.646733] I 
> [master(/data/gluster1/static/brick1):1252:crawl] _GMaster: processing 
> xsync changelog 
> /var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync/XSYNC-CHANGELOG.1444278758
> [2015-10-08 15:32:40.857084] W 
> [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: 
> ENTRY FAILED: ({'uid': 0, 'gfid': 
> 'fc446c88-a5b7-468b-ac52-25b4225fe0cf', 'gid': 0, 'mode': 33188, 
> 'entry': 
> '.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-1.html', 
> 'op': 'MKNOD'}, 17, '02489235-13c5-4232-8d6d-c7843bc5249b')
> [2015-10-08 15:32:40.858580] W 
> [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: 
> ENTRY FAILED: ({'uid': 0, 'gfid': 
> 'e08813c5-055a-4354-94ec-f1b41a14b2a4', 'gid': 0, 'mode': 33188, 
> 'entry': 
> '.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-2.html', 
> 'op': 'MKNOD'}, 17, '0abae047-5816-4199-8203-fa8b974dfef5')
>
> ...
>
> [2015-10-08 15:33:38.236779] W 
> [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: 
> ENTRY FAILED: ({'uid': 1000, 'gfid': 
> 'a41a2ac7-8fec-46bd-a4cc-8d8794e5ee39', 'gid
> ': 1000, 'mode': 33206, 'entry': 
> '.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/1PYhnxMyMMcQo8ukuyMsqq.png', 
> 'op': 'MKNOD'}, 17, 'e047db7d-f96c-496f-8a83-5db8e41859ca')
> [2015-10-08 15:33:38.237443] W 
> [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: 
> ENTRY FAILED: ({'uid': 1000, 'gfid': 
> '507f77db-0dc0-4d7f-9eb3-8f56b3e01765', 'gid
> ': 1000, 'mode': 33206, 'entry': 
> '.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/17H7rpUIXGEQemM0wCoy6c.png', 
> 'op': 'MKNOD'}, 17, 'ee7fa964-fc92-4008-b38a-e790fbbb1285')
> [2015-10-08 15:33:38.238053] W 
> [master(/data/gluster1/static/brick1):803:log_failures] _GMaster: 
> ENTRY FAILED: ({'uid': 1000, 'gfid': 
> '6c495557-6808-4ff9-98de-39afbbeeac82', 'gid
> ': 1000, 'mode': 33206, 'entry': 
> '.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/3T3VvUQH44my0Eosiieeok.png', 
> 'op': 'MKNOD'}, 17, 'cc6a75c4-0817-497e-912b-4442fd19db83')
> [2015-10-08 15:33:43.615427] W 
> [master(/data/gluster1/static/brick1):1010:process] _GMaster: 
> changelogs XSYNC-CHANGELOG.1444278758 could not be processed - moving 
> on...
> [2015-10-08 15:33:43.616425] W 
> [master(/data/gluster1/static/brick1):1014:process] _GMaster: SKIPPED 
> GFID = 
> 6c495557-6808-4ff9-98de-39afbbeeac82,16f94158-2f27-421b-9981-94d4197b2b3b,53d01d46-5724-4c77-846f-aacea7a3a447,9fbb536b-b7c6-41e1-8593-43e8a42b3fbe,1923ceff-d9a4-449e-b1c6-ce37c54d242c,3206332f-ed48-48d7-ad3f-cb82fbda0695,7696c570-edd5-481e-8cdc-3e...[truncated]
>
>
> That type of entry repeats until
>
> [2015-10-09 11:12:22.590574] I 
> [master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished 
> hybrid crawl syncing, stime: (1444349280, 617969)
> [2015-10-09 11:13:22.650285] I 
> [master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1 
> crawls, 1 turns
> [2015-10-09 11:13:22.653459] I 
> [master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting 
> hybrid crawl..., stime: (1444349280, 617969)
> [2015-10-09 11:13:22.670430] W 
> [master(/data/gluster1/static/brick1):1366:Xcrawl] _GMaster: irregular 
> xtime for 
> ./racesoap/nominations/processed/.processed.2015-10-13.T.Ballina.V1.nomination.1444346457.247.Thj1Ly: 
> ENOENT
>
> and then there were no more logs until 2015-10-13.
>
> Thanks,
> Wade.
>
> On 16/10/2015 4:33 pm, Aravinda wrote:
>> Oh ok. I overlooked the status output. Please share the 
>> geo-replication logs from "james" and "hilton" nodes.
>>
>> regards
>> Aravinda
>>
>> On 10/15/2015 05:55 PM, Wade Fitzpatrick wrote:
>>> Well I'm kind of worried about the 3 million failures listed in the 
>>> FAILURES column, the timestamp showing that syncing "stalled" 2 days 
>>> ago and the fact that only half of the files have been transferred 
>>> to the remote volume.
>>>
>>> On 15/10/2015 9:27 pm, Aravinda wrote:
>>>> Status looks good. Two master bricks are Active and participating 
>>>> in syncing. Please let us know the issue you are observing.
>>>> regards
>>>> Aravinda
>>>> On 10/15/2015 11:40 AM, Wade Fitzpatrick wrote:
>>>>> I have twice now tried to configure geo-replication of our 
>>>>> Stripe-Replicate volume to a remote Stripe volume but it always 
>>>>> seems to have issues.
>>>>>
>>>>> root at james:~# gluster volume info
>>>>>
>>>>> Volume Name: gluster_shared_storage
>>>>> Type: Replicate
>>>>> Volume ID: 5f446a10-651b-4ce0-a46b-69871f498dbc
>>>>> Status: Started
>>>>> Number of Bricks: 1 x 4 = 4
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: james:/data/gluster1/geo-rep-meta/brick
>>>>> Brick2: cupid:/data/gluster1/geo-rep-meta/brick
>>>>> Brick3: hilton:/data/gluster1/geo-rep-meta/brick
>>>>> Brick4: present:/data/gluster1/geo-rep-meta/brick
>>>>> Options Reconfigured:
>>>>> performance.readdir-ahead: on
>>>>>
>>>>> Volume Name: static
>>>>> Type: Striped-Replicate
>>>>> Volume ID: 3f9f810d-a988-4914-a5ca-5bd7b251a273
>>>>> Status: Started
>>>>> Number of Bricks: 1 x 2 x 2 = 4
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: james:/data/gluster1/static/brick1
>>>>> Brick2: cupid:/data/gluster1/static/brick2
>>>>> Brick3: hilton:/data/gluster1/static/brick3
>>>>> Brick4: present:/data/gluster1/static/brick4
>>>>> Options Reconfigured:
>>>>> auth.allow: 10.x.*
>>>>> features.scrub: Active
>>>>> features.bitrot: on
>>>>> performance.readdir-ahead: on
>>>>> geo-replication.indexing: on
>>>>> geo-replication.ignore-pid-check: on
>>>>> changelog.changelog: on
>>>>>
>>>>> root at palace:~# gluster volume info
>>>>>
>>>>> Volume Name: static
>>>>> Type: Stripe
>>>>> Volume ID: 3de935db-329b-4876-9ca4-a0f8d5f184c3
>>>>> Status: Started
>>>>> Number of Bricks: 1 x 2 = 2
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: palace:/data/gluster1/static/brick1
>>>>> Brick2: madonna:/data/gluster1/static/brick2
>>>>> Options Reconfigured:
>>>>> features.scrub: Active
>>>>> features.bitrot: on
>>>>> performance.readdir-ahead: on
>>>>>
>>>>> root at james:~# gluster vol geo-rep static ssh://gluster-b1::static 
>>>>> status detail
>>>>>
>>>>> MASTER NODE    MASTER VOL    MASTER BRICK SLAVE USER 
>>>>> SLAVE                       SLAVE NODE STATUS     CRAWL 
>>>>> STATUS       LAST_SYNCED            ENTRY DATA    META FAILURES    
>>>>> CHECKPOINT TIME    CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
>>>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
>>>>>
>>>>> james          static        /data/gluster1/static/brick1 root 
>>>>> ssh://gluster-b1::static palace    Active     Changelog Crawl    
>>>>> 2015-10-13 14:23:20    0        0       0 1952064 N/A 
>>>>> N/A                     N/A
>>>>> hilton         static        /data/gluster1/static/brick3 root 
>>>>> ssh://gluster-b1::static palace    Active     Changelog Crawl 
>>>>> N/A                    0        0       0       1008035 
>>>>> N/A                N/A                     N/A
>>>>> present        static        /data/gluster1/static/brick4 root 
>>>>> ssh://gluster-b1::static madonna    Passive    N/A 
>>>>> N/A                    N/A N/A     N/A     N/A N/A 
>>>>> N/A                     N/A
>>>>> cupid          static        /data/gluster1/static/brick2 root 
>>>>> ssh://gluster-b1::static madonna    Passive    N/A 
>>>>> N/A                    N/A N/A     N/A     N/A N/A 
>>>>> N/A                     N/A
>>>>>
>>>>>
>>>>> So just to clarify, data is striped over bricks 1 and 3; bricks 2 
>>>>> and 4 are the replica.
>>>>>
>>>>> Can someone help me diagnose the problem and find a solution?
>>>>>
>>>>> Thanks in advance,
>>>>> Wade.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>
>>>
>>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151019/72ef3526/attachment.html>


More information about the Gluster-users mailing list