[Gluster-users] Geo-Replication - Changelog socket is not present - Falling back to xsync

Tue May 26 18:13:44 UTC 2015

So, changelog is still active but I notice that some file were missing.

So I ‘m running a rsync -avn between the two vol (master and slave) to sync then again by touching the missing files (hopping geo-rep will do the rest).

One question, can I pass the slave vol a RO ? Because if somebody change a file in the slave it’s no longer synced (changes and delete but rename keep synced between master and slave).

Will it have an impact on geo-replication process if I pass the slave vol a RO ?

Thanks again.

--
Cyril Peponnet

On May 25, 2015, at 12:43 AM, Kotresh Hiremath Ravishankar <khiremat at redhat.com<mailto:khiremat at redhat.com>> wrote:

Hi Cyril,

Answers inline

Thanks and Regards,
Kotresh H R

----- Original Message -----
From: "Cyril N PEPONNET (Cyril)" <cyril.peponnet at alcatel-lucent.com<mailto:cyril.peponnet at alcatel-lucent.com>>
To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com<mailto:khiremat at redhat.com>>
Cc: "gluster-users" <gluster-users at gluster.org<mailto:gluster-users at gluster.org>>
Sent: Friday, May 22, 2015 9:34:47 PM
Subject: Re: [Gluster-users] Geo-Replication - Changelog socket is not present - Falling back to xsync

One last question, correct me if I’m wrong.

When you start a geo-rep process it starts with xsync aka hybrid crawling
(sending files every 60s, with files windows set as 8192 files per sent).

When the crawl is done it should use changelog detector and dynamically
change things to slaves.

1/ During the hybride crawl, if we delete files from master (and they were
already transfered to the slave), xsync process will not delete them from
the slave (and we can’t change as the option as is hardcoded).
When it will pass to changelog, will it remove the non existent folders and
files on the slave that are no longer on the master ?

 You are right, xsync does not sync delete files, once it is already synced.
 After xsync, when it switches to changelog, it doesn't delete all the non existing
 entries on slave that are no longer on the master. Changelog is capable of deleting
 files from the time it got switched to changelog.

2/ With changelog, if I add a file of 10GB and after a file of 1KB, will the
changelog process with queue (waiting for the 10GB file to be sent) or are
the sent done in thread ?
(ex I add a 10GB file and I delete it after 1min, what will happen ?)

  Changelog records the operations happened in master and is replayed by geo-replication
  on to slave volume. Geo-replication syncs files in two phases.

  1. Phase-1: Create entries through RPC( 0 byte files on slave keeping gfid intact as in master)
  2. Phase-2: Sync data, through rsync/tar_over_ssh (Multi threaded)

  Ok, now keeping that in mind, Phase-1 happens serially, and the phase two happens parallely.
  Zero byte files of 10GB and 1KB gets created on slave serially and data for the same syncs
  parallely. Another thing to remember, geo-rep makes sure that, syncing data to file is tried
  only after zero byte file for the same is created already.

In latest release 3.7, xsync crawl is minimized by the feature called history crawl introduced in 3.6.
So the chances of missing deletes/renames are less.

Thanks.

--
Cyril Peponnet

On May 21, 2015, at 10:22 PM, Kotresh Hiremath Ravishankar
<khiremat at redhat.com<mailto:khiremat at redhat.com>> wrote:

Great, hope that should work. Let's see

Thanks and Regards,
Kotresh H R

----- Original Message -----
From: "Cyril N PEPONNET (Cyril)" <cyril.peponnet at alcatel-lucent.com<mailto:cyril.peponnet at alcatel-lucent.com>>
To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com<mailto:khiremat at redhat.com>>
Cc: "gluster-users" <gluster-users at gluster.org<mailto:gluster-users at gluster.org>>
Sent: Friday, May 22, 2015 5:31:13 AM
Subject: Re: [Gluster-users] Geo-Replication - Changelog socket is not
present - Falling back to xsync

Thanks to JoeJulian / Kaushal I managed to re-enable the changelog option
and
the socket is now present.

For the record I had some clients running rhs gluster-fuse and our nodes
are
running glusterfs release and op-version are not “compatible”.

Now I have to wait for the init crawl see if it switches to changelog
detector mode.

Thanks Kotresh
--
Cyril Peponnet

On May 21, 2015, at 8:39 AM, Cyril Peponnet
<cyril.peponnet at alcatel-lucent.com<mailto:cyril.peponnet at alcatel-lucent.com>> wrote:

Hi,

Unfortunately,

# gluster vol set usr_global changelog.changelog off
volume set: failed: Staging failed on
mvdcgluster01.us.alcatel-lucent.com<http://mvdcgluster01.us.alcatel-lucent.com>.
Error: One or more connected clients cannot support the feature being
set.
These clients need to be upgraded or disconnected before running this
command again

I don’t know really why, I have some clients using 3.6 as fuse client
others are running on 3.5.2.

Any advice ?

--
Cyril Peponnet

On May 20, 2015, at 5:17 AM, Kotresh Hiremath Ravishankar
<khiremat at redhat.com<mailto:khiremat at redhat.com>> wrote:

Hi Cyril,

From the brick logs, it seems the changelog-notifier thread has got
killed
for some reason,
as notify is failing with EPIPE.

Try the following. It should probably help:
1. Stop geo-replication.
2. Disable changelog: gluster vol set <master-vol-name>
changelog.changelog off
3. Enable changelog: glluster vol set <master-vol-name>
changelog.changelog on
4. Start geo-replication.

Let me know if it works.

Thanks and Regards,
Kotresh H R

----- Original Message -----
From: "Cyril N PEPONNET (Cyril)" <cyril.peponnet at alcatel-lucent.com<mailto:cyril.peponnet at alcatel-lucent.com>>
To: "gluster-users" <gluster-users at gluster.org<mailto:gluster-users at gluster.org>>
Sent: Tuesday, May 19, 2015 3:16:22 AM
Subject: [Gluster-users] Geo-Replication - Changelog socket is not
present - Falling back to xsync

Hi Gluster Community,

I have a 3 nodes setup at location A and a two node setup at location
B.

All running 3.5.2 under Centos-7.

I have one volume I sync through georeplication process.

So far so good, the first step of geo-replication is done
(hybrid-crawl).

Now I’d like to use the change log detector in order to delete files on
the
slave when they are gone on master.

But it always fallback to xsync mecanism (even when I force it using
config
changelog_detector changelog):

[2015-05-18 12:29:49.543922] I [monitor(monitor):129:monitor] Monitor:
------------------------------------------------------------
[2015-05-18 12:29:49.544018] I [monitor(monitor):130:monitor] Monitor:
starting gsyncd worker
[2015-05-18 12:29:49.614002] I [gsyncd(/export/raid/vol):532:main_i]
<top>:
syncing: gluster://localhost:vol ->
ssh://root@x.x.x.x:gluster://localhost:vol
[2015-05-18 12:29:54.696532] I
[master(/export/raid/vol):58:gmaster_builder]
<top>: setting up xsync change detection mode
[2015-05-18 12:29:54.696888] I [master(/export/raid/vol):357:__init__]
_GMaster: using 'rsync' as the sync engine
[2015-05-18 12:29:54.697930] I
[master(/export/raid/vol):58:gmaster_builder]
<top>: setting up changelog change detection mode
[2015-05-18 12:29:54.698160] I [master(/export/raid/vol):357:__init__]
_GMaster: using 'rsync' as the sync engine
[2015-05-18 12:29:54.699239] I [master(/export/raid/vol):1104:register]
_GMaster: xsync temp directory:
/var/run/gluster/vol/ssh%3A%2F%2Froot%40x.x.x.x%3Agluster%3A%2F%2F127.0.0.1%3Avol/ce749a38ba30d4171cd674ec00ab24f9/xsync
[2015-05-18 12:30:04.707216] I
[master(/export/raid/vol):682:fallback_xsync]
_GMaster: falling back to xsync mode
[2015-05-18 12:30:04.742422] I
[syncdutils(/export/raid/vol):192:finalize]
<top>: exiting.
[2015-05-18 12:30:05.708123] I [monitor(monitor):157:monitor] Monitor:
worker(/export/raid/vol) died in startup phase
[2015-05-18 12:30:05.708369] I [monitor(monitor):81:set_state] Monitor:
new
state: faulty
[201

After some python debugging and stack strace printing I figure out
that:

/var/run/gluster/vol/ssh%3A%2F%2Froot%40x.x.x.x%3Agluster%3A%2F%2F127.0.0.1%3Avol/ce749a38ba30d4171cd674ec00ab24f9/changes.log

[2015-05-18 19:41:24.511423] I
[gf-changelog.c:179:gf_changelog_notification_init] 0-glusterfs:
connecting
to changelog socket:
/var/run/gluster/changelog-ce749a38ba30d4171cd674ec00ab24f9.sock
(brick:
/export/raid/vol)
[2015-05-18 19:41:24.511445] W
[gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs:
connection
attempt 1/5...
[2015-05-18 19:41:26.511556] W
[gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs:
connection
attempt 2/5...
[2015-05-18 19:41:28.511670] W
[gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs:
connection
attempt 3/5...
[2015-05-18 19:41:30.511790] W
[gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs:
connection
attempt 4/5...
[2015-05-18 19:41:32.511890] W
[gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs:
connection
attempt 5/5...
[2015-05-18 19:41:34.512016] E
[gf-changelog.c:204:gf_changelog_notification_init] 0-glusterfs: could
not
connect to changelog socket! bailing out...

/var/run/gluster/changelog-ce749a38ba30d4171cd674ec00ab24f9.sock
doesn’t
exist. So the
https://github.com/gluster/glusterfs/blob/release-3.5/xlators/features/changelog/lib/src/gf-changelog.c#L431
is failing because
https://github.com/gluster/glusterfs/blob/release-3.5/xlators/features/changelog/lib/src/gf-changelog.c#L153
cannot open the socket file.

And I don’t find any error related to changelog in log files, except on
brick
logs node 2 (site A)

bricks/export-raid-vol.log-20150517:[2015-05-14 17:06:52.636908] E
[changelog-helpers.c:168:changelog_rollover_changelog] 0-vol-changelog:
Failed to send file name to notify thread (reason: Broken pipe)
bricks/export-raid-vol.log-20150517:[2015-05-14 17:06:52.636949] E
[changelog-helpers.c:280:changelog_handle_change] 0-vol-changelog:
Problem
rolling over changelog(s)

gluster vol status is all fine, and change-log options are enabled in
vol
file

volume vol-changelog
type features/changelog
option changelog on
option changelog-dir /export/raid/vol/.glusterfs/changelogs
option changelog-brick /export/raid/vol
subvolumes vol-posix
end-volume

Any help will be appreciated :)

Oh Btw, hard to stop / restart the volume as I have around 4k clients
connected.

Thanks !

--
Cyril Peponnet

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150526/95324261/attachment.html>