[Gluster-users] Geo-replication gets faulty after first sync of files. GlusterFS 3.7.4
Wade Fitzpatrick
wade.fitzpatrick at ladbrokes.com.au
Wed Oct 21 03:56:30 UTC 2015
That is almost exactly what I am seeing too. It seems to be a problem
with geo-replicating a Stripe and/or Replicate volume.
On 21/10/2015 4:02 am, Родион Скрябин wrote:
> I have a long time war with geo-replications. First two battles were won:
> 1. geo-replication on a cluster with glusterfs-3.4.1 (Centos 6.6, xfs:
> server1-server2 => geo-replica at east-server1 )
> 2. One volume named infra-geo geo-replicated well on a 2nd cluster
> with glusterfs-3.7.4 (Centos 7.1, ext4: server3-server4 => geo-replica
> at east-server2)
>
> The last battle lasts for two months, and I need your help to win it :)
>
> Georeplication of second volume, named storage-2 getting faulty after
> initial sync of data. It could sync all files (620GB last time), and
> became falty. Restart of every and each gluster service won't affect
> the result, status is the same - faulty.
> The second volume named storage-2 geo-replicated also on a 2nd cluster
> with glusterfs-3.7.4 (Centos 7.1, ext4: server3-server4 => geo-replica
> at east-server2)
>
> The noticeble things:
> (date of creating geo-replica: '2015-10-16 16:03')
> 1.
> /var/log/glusterfs/geo-replication/storage-2/ssh%3A%2F%2Froot%40172.16.10.43%3Agluster%3A%2F%2F127.0.0.1%3Astorage-2.%2Fmnt%2Fsda4%2Fbrick-storage-3-changes.log-20151018
> is ordianry and has length of 8.8 MB
>
> The current changelog started at '2015-10-18 17:43:17.117287' has
> 9.7GB, 34909550 lines of:
> E [mem-pool.c:417:mem_get0]
> (-->/lib64/libglusterfs.so.0(recursive_rmdir+0x192) [0x7f885735cbb2]
> -->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f885733f55f]
> -->/lib64/libglusterfs.so.0(mem_get0+0x81) [0x7f8857371751] )
> 0-mem-pool: invalid argument [Invalid argument]
>
> Only 22MB, has something more uniq:
> [2015-10-19 16:42:42.652110] I
> [gf-changelog.c:542:gf_changelog_register_generic] 0-gfchangelog:
> Registering brick: /mnt/sda4/brick-storage-3 [notify filter: 1]
> [2015-10-19 16:42:44.106591] T [rpcsvc.c:2298:rpcsvc_init]
> 0-rpc-service: rx pool: 8
> [2015-10-19 16:42:44.106643] T
> [rpcsvc-auth.c:119:rpcsvc_auth_init_auth] 0-rpc-service:
> Authentication enabled: AUTH_GLUSTERFS
> [2015-10-19 16:42:44.106648] T
> [rpcsvc-auth.c:119:rpcsvc_auth_init_auth] 0-rpc-service:
> Authentication enabled: AUTH_GLUSTERFS-v2
> [2015-10-19 16:42:44.106656] T
> [rpcsvc-auth.c:119:rpcsvc_auth_init_auth] 0-rpc-service:
> Authentication enabled: AUTH_UNIX
> [2015-10-19 16:42:44.106661] T
> [rpcsvc-auth.c:119:rpcsvc_auth_init_auth] 0-rpc-service:
> Authentication enabled: AUTH_NULL
> [2015-10-19 16:42:44.106664] D [rpcsvc.c:2317:rpcsvc_init]
> 0-rpc-service: RPC service inited.
> [2015-10-19 16:42:44.106673] D [rpcsvc.c:1874:rpcsvc_program_register]
> 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver:
> 1, Port: 0
> [2015-10-19 16:42:44.106687] D
> [rpc-transport.c:288:rpc_transport_load] 0-rpc-transport: attempt to
> load file /usr/lib64/glusterfs/3.7.4/rpc-transport/socket.so
> [2015-10-19 16:42:44.107068] D [socket.c:3794:socket_init]
> 0-socket.gfchangelog: disabling nodelay
> [2015-10-19 16:42:44.107076] D [socket.c:3845:socket_init]
> 0-socket.gfchangelog: Configued transport.tcp-user-timeout=0
> [2015-10-19 16:42:44.107081] D [socket.c:3928:socket_init]
> 0-socket.gfchangelog: SSL support on the I/O path is NOT enabled
> [2015-10-19 16:42:44.107084] D [socket.c:3931:socket_init]
> 0-socket.gfchangelog: SSL support for glusterd is NOT enabled
> [2015-10-19 16:42:44.107093] D [socket.c:3948:socket_init]
> 0-socket.gfchangelog: using system polling thread
> [2015-10-19 16:42:44.107159] D [rpcsvc.c:1874:rpcsvc_program_register]
> 0-rpc-service: New program registered: LIBGFCHANGELOG REBORP, Num:
> 1886350951, Ver: 1, Port: 0
> [2015-10-19 16:42:44.107182] D
> [rpc-clnt.c:989:rpc_clnt_connection_init] 0-gfchangelog: defaulting
> frame-timeout to 30mins
> [2015-10-19 16:42:44.107192] D
> [rpc-clnt.c:1003:rpc_clnt_connection_init] 0-gfchangelog: disable
> ping-timeout
> [2015-10-19 16:42:44.107200] D
> [rpc-transport.c:288:rpc_transport_load] 0-rpc-transport: attempt to
> load file /usr/lib64/glusterfs/3.7.4/rpc-transport/socket.so
> [2015-10-19 16:42:44.107233] D [socket.c:3794:socket_init]
> 0-gfchangelog: disabling nodelay
> [2015-10-19 16:42:44.107238] D [socket.c:3845:socket_init]
> 0-gfchangelog: Configued transport.tcp-user-timeout=0
> [2015-10-19 16:42:44.107242] D [socket.c:3928:socket_init]
> 0-gfchangelog: SSL support on the I/O path is NOT enabled
> [2015-10-19 16:42:44.107248] D [socket.c:3931:socket_init]
> 0-gfchangelog: SSL support for glusterd is NOT enabled
> [2015-10-19 16:42:44.107255] D [socket.c:3948:socket_init]
> 0-gfchangelog: using system polling thread
> [2015-10-19 16:42:44.107263] T [rpc-clnt.c:418:rpc_clnt_reconnect]
> 0-gfchangelog: attempting reconnect
> [2015-10-19 16:42:44.107267] T [socket.c:2887:socket_connect]
> 0-gfchangelog: connecting 0x7f2dbc08c7e0, state=0 gen=0 sock=-1
> [2015-10-19 16:42:44.107272] T
> [name.c:295:af_unix_client_get_remote_sockaddr] 0-gfchangelog: using
> connect-path
> /var/run/gluster/changelog-535626c1c4cea4957a54c920f955b1ac.sock
> [2015-10-19 16:42:44.107284] T [name.c:111:af_unix_client_bind]
> 0-gfchangelog: bind-path not specified for unix socket, letting
> connect to assign default value
> [2015-10-19 16:42:46.107494] T [rpc-clnt.c:1406:rpc_clnt_record]
> 0-gfchangelog: Auth Info: pid: 0, uid: 0, gid: 0, owner:
> [2015-10-19 16:42:46.107543] T
> [rpc-clnt.c:1263:rpc_clnt_record_build_header] 0-rpc-clnt: Request
> fraglen 500, payload: 436, rpc hdr: 64
> [2015-10-19 16:42:46.107589] T [rpc-clnt.c:1600:rpc_clnt_submit]
> 0-rpc-clnt: submitted request (XID: 0x1 Program: LIBGFCHANGELOG,
> ProgVers: 1, Proc: 1) to rpc-transport (gfchangelog)
> [2015-10-19 16:42:46.107618] D
> [rpc-clnt-ping.c:281:rpc_clnt_start_ping] 0-gfchangelog: ping timeout
> is 0, returning
> [2015-10-19 16:42:46.109489] T [rpc-clnt.c:663:rpc_clnt_reply_init]
> 0-gfchangelog: received rpc message (RPC XID: 0x1 Program:
> LIBGFCHANGELOG, ProgVers: 1, Proc: 1) from rpc-transport (gfchangelog)
> [2015-10-19 16:42:46.146891] I
> [gf-history-changelog.c:760:gf_changelog_extract_min_max]
> 0-gfchangelog: MIN: 1439390689, MAX: 1445272962, TOTAL CHANGELOGS: 391739
> [2015-10-19 16:42:46.147039] I
> [gf-history-changelog.c:907:gf_history_changelog] 0-gfchangelog:
> FINAL: from: 1445023075, to: 1445272962, changes: 16643
> [2015-10-19 16:42:46.148604] D
> [gf-history-changelog.c:298:gf_history_changelog_scan] 0-gfchangelog:
> hist_done 1, is_last_scan: 0
>
>
>
> 2.
> /var/log/glusterfs/geo-replication/storage-2/ssh%3A%2F%2Froot%40172.16.10.43%3Agluster%3A%2F%2F127.0.0.1%3Astorage-2.log
>
> [2015-10-18 10:42:44.890366] W
> [master(/mnt/sda4/brick-storage-3):1041:process] _GMaster: incomplete
> sync, retrying changelogs: CHANGELOG.1445022009 CHANGELOG.1445022024
> CHANGELOG.1445022039 CHANGELOG.1445022054 CHANGELOG.1445022069 CHAN
> GELOG.1445022084 CHANGELOG.1445022099 CHANGELOG.1445022114
> CHANGELOG.1445022129 CHANGELOG.1445022144 CHANGELOG.1445022159
> CHANGELOG.1445022174 CHANGELOG.1445022189 CHANGELOG.1445022204
> CHANGELOG.1445022219 CHANGELOG.1445022234 CHANGELOG.
> 1445022249 CHANGELOG.1445022264 CHANGELOG.1445022279
> CHANGELOG.1445022294 CHANGELOG.1445022309 CHANGELOG.1445022324
> CHANGELOG.1445022339 CHANGELOG.1445022354 CHANGELOG.1445022369
> CHANGELOG.1445022384 CHANGELOG.1445022399 CHANGELOG.144502
> 2414 CHANGELOG.1445022429 CHANGELOG.1445022444 CHANGELOG.1445022459
> CHANGELOG.1445022474 CHANGELOG.1445022489 CHANGELOG.1445022504
> CHANGELOG.1445022519 CHANGELOG.1445022534 CHANGELOG.1445022549
> CHANGELOG.1445022564 CHANGELOG.1445022579 C
> HANGELOG.1445022594 CHANGELOG.1445022609 CHANGELOG.1445022624
> CHANGELOG.1445022639 CHANGELOG.1445022654 CHANGELOG.1445022669
> CHANGELOG.1445022684 CHANGELOG.1445022699 CHANGELOG.1445022714
> CHANGELOG.1445022729 CHANGELOG.1445022744 CHANGEL
> OG.1445022759 CHANGELOG.1445022774 CHANGELOG.1445022789
> CHANGELOG.1445022805 CHANGELOG.1445022820 CHANGELOG.1445022835
> CHANGELOG.1445022850 CHANGELOG.1445022865 CHANGELOG.1445022880
> CHANGELOG.1445022895 CHANGELOG.1445022910 CHANGELOG.144
> 5022925 CHANGELOG.1445022940 CHANGELOG.1445022955 CHANGELOG.1445022970
> CHANGELOG.1445022985 CHANGELOG.1445023000 CHANGELOG.1445023015
> CHANGELOG.1445023030 CHANGELOG.1445023045 CHANGELOG.1445023060
> CHANGELOG.1445023075 CHANGELOG.144502309
> 0 CHANGELOG.1445023105 CHANGELOG.1445023120 CHANGELOG.1445023135
> CHANGELOG.1445023150 CHANGELOG.1445023165 CHANGELOG.1445023180
> CHANGELOG.1445023195 CHANGELOG.1445023210 CHANGELOG.1445023225
> CHANGELOG.1445023240 CHANGELOG.1445023255 CHAN
> GELOG.1445023270 CHANGELOG.1445023285 CHANGELOG.1445023300
> CHANGELOG.1445023315 CHANGELOG.1445023330 CHANGELOG.1445023345
> CHANGELOG.1445023360 CHANGELOG.1445023375 CHANGELOG.1445023390
> CHANGELOG.1445023405 CHANGELOG.1445023420 CHANGELOG.
> 1445023435 CHANGELOG.1445023450 CHANGELOG.1445023465
> CHANGELOG.1445023480 CHANGELOG.1445023495 CHANGELOG.1445023510
> CHANGELOG.1445023525 CHANGELOG.1445023540 CHANGELOG.1445023555
> CHANGELOG.1445023570 CHANGELOG.1445023585 CHANGELOG.144502
> 3600 CHANGELOG.1445023615 CHANGELOG.1445023630 CHANGELOG.1445023645
> CHANGELOG.1445023660 CHANGELOG.1445023675 CHANGELOG.1445023690
> CHANGELOG.1445023705 CHANGELOG.1445023720 CHANGELOG.1445023735
> CHANGELOG.1445023750 CHANGELOG.1445023765 C
> HANGELOG.1445023780 CHANGELOG.1445023796 CHANGELOG.1445023811
> CHANGELOG.1445023826 CHANGELOG.1445023841 CHANGELOG.1445023856
> CHANGELOG.1445023871 CHANGELOG.1445023886 CHANGELOG.1445023901
> CHANGELOG.1445023916 CHANGELOG.1445023931 CHANGEL
> OG.1445023946 CHANGELOG.1445023961 CHANGELOG.1445023976
> CHANGELOG.1445023991 CHANGELOG.1445024006 CHANGELOG.1445024021
> CHANGELOG.1445024036 CHANGELOG.1445024051 CHANGELOG.1445024066
> CHANGELOG.1445024081 CHANGELOG.1445024096 CHANGELOG.144
> 5024111 CHANGELOG.1445024126 CHANGELOG.1445024141 CHANGELOG.1445024156
> CHANGELOG.1445024171 CHANGELOG.1445024186 CHANGELOG.1445024201
> CHANGELOG.1445024216 CHANGELOG.1445024231 CHANGELOG.1445024246
> CHANGELOG.1445024261 CHANGELOG.144502427
> 6 CHANGELOG.1445024291 CHANGELOG.1445024306 CHANGELOG.1445024321
> CHANGELOG.1445024336 CHANGELOG.1445024351 CHANGELOG.1445024366
> CHANGELOG.1445024381 CHANGELOG.1445024396 CHANGELOG.1445024411
> [2015-10-18 10:43:02.808736] E
> [repce(/mnt/sda4/brick-storage-3):207:__call__] RepceClient: call
> 24010:140490734180160:1445190182.72 (entry_ops) failed on peer with
> OSError
> [2015-10-18 10:43:02.809122] E
> [syncdutils(/mnt/sda4/brick-storage-3):276:log_raise_exception] <top>:
> FAIL:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 165,
> in main
> main_i()
> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 659,
> in main_i
> local.service_loop(*[r for r in [remote] if r])
> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
> 1451, in service_loop
> g2.crawlwrap()
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 591,
> in crawlwrap
> self.crawl()
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
> 1106, in crawl
> self.changelogs_batch_process(changes)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
> 1081, in changelogs_batch_process
> self.process(batch)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 959,
> in process
> self.process_change(change, done, retry)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 914,
> in process_change
> failures = self.slave.server.entry_ops(entries)
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226,
> in __call__
> return self.ins(self.meth, *a)
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208,
> in __call__
> raise res
> OSError: [Errno 16] Device or resource busy
> [2015-10-18 10:43:02.811510] I
> [syncdutils(/mnt/sda4/brick-storage-3):220:finalize] <top>: exiting.
> [2015-10-18 10:43:02.819149] I [repce(agent):92:service_loop]
> RepceServer: terminating on reaching EOF.
> [2015-10-18 10:43:02.819616] I [syncdutils(agent):220:finalize] <top>:
> exiting.
>
> Since that I get the repeating error:
> [2015-10-18 10:43:13.73946] I [monitor(monitor):221:monitor] Monitor:
> ------------------------------------------------------------
> [2015-10-18 10:43:13.74423] I [monitor(monitor):222:monitor] Monitor:
> starting gsyncd worker
> [2015-10-18 10:43:13.177963] I [changelogagent(agent):75:__init__]
> ChangelogAgent: Agent listining...
> [2015-10-18 10:43:13.178881] I
> [gsyncd(/mnt/sda4/brick-storage-3):649:main_i] <top>: syncing:
> gluster://localhost:storage-2 ->
> ssh://root@east-storage2:gluster://localhost:storage-2
> [2015-10-18 10:43:17.93168] I
> [master(/mnt/sda4/brick-storage-3):83:gmaster_builder] <top>: setting
> up xsync change detection mode
> [2015-10-18 10:43:17.93669] I
> [master(/mnt/sda4/brick-storage-3):404:__init__] _GMaster: using
> 'rsync' as the sync engine
> [2015-10-18 10:43:17.95079] I
> [master(/mnt/sda4/brick-storage-3):83:gmaster_builder] <top>: setting
> up changelog change detection mode
> [2015-10-18 10:43:17.95390] I
> [master(/mnt/sda4/brick-storage-3):404:__init__] _GMaster: using
> 'rsync' as the sync engine
> [2015-10-18 10:43:17.96730] I
> [master(/mnt/sda4/brick-storage-3):83:gmaster_builder] <top>: setting
> up changeloghistory change detection mode
> [2015-10-18 10:43:17.97048] I
> [master(/mnt/sda4/brick-storage-3):404:__init__] _GMaster: using
> 'rsync' as the sync engine
> [2015-10-18 10:43:19.915657] I
> [master(/mnt/sda4/brick-storage-3):1220:register] _GMaster: xsync temp
> directory:
> /var/lib/misc/glusterfsd/storage-2/ssh%3A%2F%2Froot%40172.16.10.43%3Agluster%3A%2F%2F127.0.0.1%3Astorage-2/535626c1c4cea4957a
> 54c920f955b1ac/xsync
> [2015-10-18 10:43:19.916011] I
> [resource(/mnt/sda4/brick-storage-3):1432:service_loop] GLUSTER:
> Register time: 1445190199
> [2015-10-18 10:43:19.935817] I
> [master(/mnt/sda4/brick-storage-3):530:crawlwrap] _GMaster: primary
> master with volume id 054d6225-39fc-40f5-9604-5ee4b6dcd8b4 ...
> [2015-10-18 10:43:19.938358] I
> [master(/mnt/sda4/brick-storage-3):539:crawlwrap] _GMaster: crawl
> interval: 1 seconds
> [2015-10-18 10:43:19.942023] I
> [master(/mnt/sda4/brick-storage-3):1135:crawl] _GMaster: starting
> history crawl... turns: 1, stime: (1445021993, 0), etime: 1445190199
> [2015-10-18 10:43:20.975270] I
> [master(/mnt/sda4/brick-storage-3):1164:crawl] _GMaster: slave's time:
> (1445021993, 0)
> [2015-10-18 10:43:38.50978] E
> [repce(/mnt/sda4/brick-storage-3):207:__call__] RepceClient: call
> 13364:140337327839040:1445190217.96 (entry_ops) failed on peer with
> OSError
> [2015-10-18 10:43:38.51417] E
> [syncdutils(/mnt/sda4/brick-storage-3):276:log_raise_exception] <top>:
> FAIL:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 165,
> in main
> main_i()
> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 659,
> in main_i
> local.service_loop(*[r for r in [remote] if r])
> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
> 1438, in service_loop
> g3.crawlwrap(oneshot=True)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 591,
> in crawlwrap
> self.crawl()
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
> 1173, in crawl
> self.changelogs_batch_process(changes)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
> 1081, in changelogs_batch_process
> self.process(batch)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 959,
> in process
> self.process_change(change, done, retry)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 914,
> in process_change
> failures = self.slave.server.entry_ops(entries)
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226,
> in __call__
> return self.ins(self.meth, *a)
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208,
> in __call__
> raise res
> OSError: [Errno 16] Device or resource busy
> [2015-10-18 10:43:38.53669] I
> [syncdutils(/mnt/sda4/brick-storage-3):220:finalize] <top>: exiting.
> [2015-10-18 10:43:38.56446] I [repce(agent):92:service_loop]
> RepceServer: terminating on reaching EOF.
> [2015-10-18 10:43:38.56844] I [syncdutils(agent):220:finalize] <top>:
> exiting.
>
> Any suggestions would be very appreciated, thanks
>
> With regards to all community,
> --
> Rodion
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151021/382b31d5/attachment.html>
More information about the Gluster-users
mailing list