[Gluster-users] Geo-rep failing initial sync

Wade Fitzpatrick wade.fitzpatrick at ladbrokes.com.au
Fri Oct 16 03:17:33 UTC 2015


I now have a situation similar to 
https://bugzilla.redhat.com/show_bug.cgi?id=1202649 but trying to 
register to report the bug, I don't receive the confirmation email to my 
account so I can't register.

Stopping and starting geo-replication has no effect and in fact now 
shows no status at all.

root at james:~# gluster volume geo-replication static ssh://gluster-b1::static status
No active geo-replication sessions between static and ssh://gluster-b1::static
root at james:~# gluster volume geo-replication static ssh://gluster-b1::static stop
Stopping geo-replication session between static & ssh://gluster-b1::static has been successful
root at james:~# gluster volume geo-replication static ssh://gluster-b1::static status
No active geo-replication sessions between static and ssh://gluster-b1::static
root at james:~# gluster volume geo-replication static ssh://gluster-b1::static start
Starting geo-replication session between static & ssh://gluster-b1::static has been successful
root at james:~# gluster volume geo-replication static ssh://gluster-b1::static status
No active geo-replication sessions between static and ssh://gluster-b1::static
root at james:~# gluster volume geo-replication static ssh://gluster-b1::static status
No active geo-replication sessions between static and ssh://gluster-b1::static


This is what is reported in 
/var/log/glusterfs/geo-replication/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic.log

[2015-10-16 12:31:35.679045] I [monitor(monitor):222:monitor] Monitor: starting gsyncd worker
[2015-10-16 12:31:41.453392] I [monitor(monitor):282:monitor] Monitor: worker(/data/gluster1/static/brick1) died in startup phase
[2015-10-16 12:31:51.595781] I [monitor(monitor):221:monitor] Monitor: ------------------------------------------------------------
[2015-10-16 12:31:51.596124] I [monitor(monitor):222:monitor] Monitor: starting gsyncd worker
[2015-10-16 12:31:51.680993] I [changelogagent(agent):75:__init__] ChangelogAgent: Agent listining...
[2015-10-16 12:31:51.684289] I [gsyncd(/data/gluster1/static/brick1):649:main_i] <top>: syncing: gluster://localhost:static -> ssh://root@palace:gluster://localhost:static
[2015-10-16 12:31:54.378592] I [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: setting up xsync change detection mode
[2015-10-16 12:31:54.379020] I [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 'tar over ssh' as the sync engine
[2015-10-16 12:31:54.379853] I [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: setting up changelog change detection mode
[2015-10-16 12:31:54.380121] I [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 'tar over ssh' as the sync engine
[2015-10-16 12:31:54.381195] I [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: setting up changeloghistory change detection mode
[2015-10-16 12:31:54.381473] I [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 'tar over ssh' as the sync engine
[2015-10-16 12:31:56.395081] E [repce(agent):117:worker] <top>: call failed:
Traceback (most recent call last):
   File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/repce.py", line 113, in worker
     res = getattr(self.obj, rmeth)(*in_data[2:])
   File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/changelogagent.py", line 41, in register
     return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level, retries)
   File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/libgfchangelog.py", line 45, in cl_register
     cls.raise_changelog_err()
   File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err
     raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 111] Connection refused
[2015-10-16 12:31:56.396080] E [repce(/data/gluster1/static/brick1):207:__call__] RepceClient: call 4297:140312069371648:1444959114.39 (register) failed on peer with ChangelogException
[2015-10-16 12:31:56.396344] E [resource(/data/gluster1/static/brick1):1428:service_loop] GLUSTER: Changelog register failed, [Errno 111] Connection refused
[2015-10-16 12:31:56.396723] I [syncdutils(/data/gluster1/static/brick1):220:finalize] <top>: exiting.
[2015-10-16 12:31:56.398370] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2015-10-16 12:31:56.398675] I [syncdutils(agent):220:finalize] <top>: exiting.
[2015-10-16 12:31:57.381922] I [monitor(monitor):282:monitor] Monitor: worker(/data/gluster1/static/brick1) died in startup phase
[2015-10-16 12:32:01.250627] I [gsyncd(/data/gluster1/static/brick1):649:main_i] <top>: syncing: gluster://localhost:static -> ssh://root@palace:gluster://localhost:static
[2015-10-16 12:32:01.252258] I [changelogagent(agent):75:__init__] ChangelogAgent: Agent listining...
[2015-10-16 12:32:03.950707] I [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: setting up xsync change detection mode
[2015-10-16 12:32:03.951102] I [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 'tar over ssh' as the sync engine
[2015-10-16 12:32:03.952385] I [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: setting up changelog change detection mode
[2015-10-16 12:32:03.952636] I [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 'tar over ssh' as the sync engine
[2015-10-16 12:32:03.953428] I [master(/data/gluster1/static/brick1):83:gmaster_builder] <top>: setting up changeloghistory change detection mode
[2015-10-16 12:32:03.953665] I [master(/data/gluster1/static/brick1):401:__init__] _GMaster: using 'tar over ssh' as the sync engine


Also

/var/lib/glusterd/geo-replication/static_gluster-b1_static/brick_%2Fdata%2Fgluster1%2Fstatic%2Fbrick1.status:
{"checkpoint_time": 0, "last_synced": 1444950684, "checkpoint_completed": "No", "meta": 0, "failures": 1952064, "entry": 0, "slave_node": "N/A", "data": 0, "worker_status": "Faulty", "crawl_status": "N/A", "checkpoint_completion_time": 0}

/var/lib/glusterd/geo-replication/static_gluster-b1_static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic.status:
Started

/var/lib/glusterd/geo-replication/static_gluster-b1_static/ssh%3A%2F%2Froot%40madonna%3Agluster%3A%2F%2F127.0.0.1%3Astatic.status:
Started



On 15/10/2015 10:25 pm, Wade Fitzpatrick wrote:
> Well I'm kind of worried about the 3 million failures listed in the 
> FAILURES column, the timestamp showing that syncing "stalled" 2 days 
> ago and the fact that only half of the files have been transferred to 
> the remote volume.
>
> On 15/10/2015 9:27 pm, Aravinda wrote:
>> Status looks good. Two master bricks are Active and participating in 
>> syncing. Please let us know the issue you are observing.
>> regards
>> Aravinda
>> On 10/15/2015 11:40 AM, Wade Fitzpatrick wrote:
>>> I have twice now tried to configure geo-replication of our 
>>> Stripe-Replicate volume to a remote Stripe volume but it always 
>>> seems to have issues.
>>>
>>> root at james:~# gluster volume info
>>>
>>> Volume Name: gluster_shared_storage
>>> Type: Replicate
>>> Volume ID: 5f446a10-651b-4ce0-a46b-69871f498dbc
>>> Status: Started
>>> Number of Bricks: 1 x 4 = 4
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: james:/data/gluster1/geo-rep-meta/brick
>>> Brick2: cupid:/data/gluster1/geo-rep-meta/brick
>>> Brick3: hilton:/data/gluster1/geo-rep-meta/brick
>>> Brick4: present:/data/gluster1/geo-rep-meta/brick
>>> Options Reconfigured:
>>> performance.readdir-ahead: on
>>>
>>> Volume Name: static
>>> Type: Striped-Replicate
>>> Volume ID: 3f9f810d-a988-4914-a5ca-5bd7b251a273
>>> Status: Started
>>> Number of Bricks: 1 x 2 x 2 = 4
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: james:/data/gluster1/static/brick1
>>> Brick2: cupid:/data/gluster1/static/brick2
>>> Brick3: hilton:/data/gluster1/static/brick3
>>> Brick4: present:/data/gluster1/static/brick4
>>> Options Reconfigured:
>>> auth.allow: 10.x.*
>>> features.scrub: Active
>>> features.bitrot: on
>>> performance.readdir-ahead: on
>>> geo-replication.indexing: on
>>> geo-replication.ignore-pid-check: on
>>> changelog.changelog: on
>>>
>>> root at palace:~# gluster volume info
>>>
>>> Volume Name: static
>>> Type: Stripe
>>> Volume ID: 3de935db-329b-4876-9ca4-a0f8d5f184c3
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: palace:/data/gluster1/static/brick1
>>> Brick2: madonna:/data/gluster1/static/brick2
>>> Options Reconfigured:
>>> features.scrub: Active
>>> features.bitrot: on
>>> performance.readdir-ahead: on
>>>
>>> root at james:~# gluster vol geo-rep static ssh://gluster-b1::static 
>>> status detail
>>>
>>> MASTER NODE    MASTER VOL    MASTER BRICK SLAVE USER    
>>> SLAVE                       SLAVE NODE STATUS     CRAWL STATUS       
>>> LAST_SYNCED            ENTRY DATA    META    FAILURES    CHECKPOINT 
>>> TIME    CHECKPOINT COMPLETED    CHECKPOINT COMPLETION TIME
>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>> james          static        /data/gluster1/static/brick1 root 
>>> ssh://gluster-b1::static 10.37.1.11    Active     Changelog Crawl    
>>> 2015-10-13 14:23:20    0        0       0       1952064 
>>> N/A                N/A                     N/A
>>> hilton         static        /data/gluster1/static/brick3 root 
>>> ssh://gluster-b1::static 10.37.1.11    Active     Changelog Crawl 
>>> N/A                    0        0       0       1008035 
>>> N/A                N/A                     N/A
>>> present        static        /data/gluster1/static/brick4 root 
>>> ssh://gluster-b1::static 10.37.1.12    Passive    N/A 
>>> N/A                    N/A      N/A     N/A     N/A 
>>> N/A                N/A                     N/A
>>> cupid          static        /data/gluster1/static/brick2 root 
>>> ssh://gluster-b1::static 10.37.1.12    Passive    N/A 
>>> N/A                    N/A      N/A     N/A     N/A 
>>> N/A                N/A                     N/A
>>>
>>>
>>> So just to clarify, data is striped over bricks 1 and 3; bricks 2 
>>> and 4 are the replica.
>>>
>>> Can someone help me diagnose the problem and find a solution?
>>>
>>> Thanks in advance,
>>> Wade.
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151016/dccd55cd/attachment.html>


More information about the Gluster-users mailing list