[Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty

Kotresh Hiremath Ravishankar khiremat at redhat.com
Mon Sep 3 05:58:16 UTC 2018


Hi Marcus,

Geo-rep had few important fixes in 4.1.3. Is it possible to upgrade and
check whether the issue is still seen?

Thanks,
Kotresh HR

On Sat, Sep 1, 2018 at 5:08 PM, Marcus Pedersén <marcus.pedersen at slu.se>
wrote:

> Hi again,
>
> I found another problem on the other master node.
>
> The node toggles Active/Faulty and it is the same error over and over
> again.
>
>
> [2018-09-01 11:23:02.94080] E [repce(worker /urd-gds/gluster):197:__call__]
> RepceClient: call failed    call=1226:139955262510912:1535800981.24
> method=entry_ops        error=GsyncdError
> [2018-09-01 11:23:02.94214] E [syncdutils(worker /urd-gds/gluster):300:log_raise_exception]
> <top>: execution of "gluster" failed with ENOENT (No such file or directory)
> [2018-09-01 11:23:02.106194] I [repce(agent /urd-gds/gluster):80:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-09-01 11:23:02.124444] I [gsyncdstatus(monitor):244:set_worker_status]
> GeorepStatus: Worker Status Change status=Faulty
>
>
> I have also found a python error as well, I have only seen this once
> though.
>
>
> [2018-09-01 11:16:45.907660] I [master(worker
> /urd-gds/gluster):1536:crawl] _GMaster: slave's time
> stime=(1524101534, 0)
> [2018-09-01 11:16:47.364109] E [syncdutils(worker
> /urd-gds/gluster):332:log_raise_exception] <top>: FAIL:
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
> 362, in twrap
>     tf(*aargs)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1939,
> in syncjob
>     po = self.sync_engine(pb, self.log_err)
>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1442,
> in rsync
>     rconf.ssh_ctl_args + \
> AttributeError: 'NoneType' object has no attribute 'split'
> [2018-09-01 11:16:47.384531] I [repce(agent /urd-gds/gluster):80:service_loop]
> RepceServer: terminating on reaching EOF.
> [2018-09-01 11:16:48.362987] I [monitor(monitor):279:monitor] Monitor:
> worker died in startup phase     brick=/urd-gds/gluster
> [2018-09-01 11:16:48.370701] I [gsyncdstatus(monitor):244:set_worker_status]
> GeorepStatus: Worker Status Change status=Faulty
> [2018-09-01 11:16:58.390548] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
>
>
> I attach the logs as well.
>
>
> Many thanks!
>
>
> Best regards
>
> Marcus Pedersén
>
>
>
>
> ------------------------------
> *Från:* gluster-users-bounces at gluster.org <gluster-users-bounces@
> gluster.org> för Marcus Pedersén <marcus.pedersen at slu.se>
> *Skickat:* den 31 augusti 2018 16:09
> *Till:* khiremat at redhat.com
>
> *Kopia:* gluster-users at gluster.org
> *Ämne:* Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does
> not work Now: Upgraded to 4.1.3 geo node Faulty
>
>
> I realy appologize, third try to make mail smaller.
>
>
> /Marcus
>
>
> ------------------------------
> *Från:* Marcus Pedersén
> *Skickat:* den 31 augusti 2018 16:03
> *Till:* Kotresh Hiremath Ravishankar
> *Kopia:* gluster-users at gluster.org
> *Ämne:* SV: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does
> not work Now: Upgraded to 4.1.3 geo node Faulty
>
>
> Sorry, resend due to too large mail.
>
>
> /Marcus
> ------------------------------
> *Från:* Marcus Pedersén
> *Skickat:* den 31 augusti 2018 15:19
> *Till:* Kotresh Hiremath Ravishankar
> *Kopia:* gluster-users at gluster.org
> *Ämne:* SV: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does
> not work Now: Upgraded to 4.1.3 geo node Faulty
>
>
> Hi Kotresh,
>
> Please find attached logs, only logs from today.
>
> The python error was repeated over and over again until I disabled selinux.
>
> After that the node bacame active again.
>
> The return code 23 seems to be repeated over and over again.
>
>
> rsync version 3.1.2
>
>
> Thanks a lot!
>
>
> Best regards
>
> Marcus
>
>
> ------------------------------
> *Från:* Kotresh Hiremath Ravishankar <khiremat at redhat.com>
> *Skickat:* den 31 augusti 2018 11:09
> *Till:* Marcus Pedersén
> *Kopia:* gluster-users at gluster.org
> *Ämne:* Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does
> not work Now: Upgraded to 4.1.3 geo node Faulty
>
> Hi Marcus,
>
> Could you attach full logs? Is the same trace back happening repeatedly?
> It will be helpful you attach the corresponding mount log as well.
> What's the rsync version, you are using?
>
> Thanks,
> Kotresh HR
>
> On Fri, Aug 31, 2018 at 12:16 PM, Marcus Pedersén <marcus.pedersen at slu.se>
> wrote:
>
>> Hi all,
>>
>> I had problems with stopping sync after upgrade to 4.1.2.
>>
>> I upgraded to 4.1.3 and it ran fine for one day, but now one of the
>> master nodes shows faulty.
>>
>> Most of the sync jobs have return code 23, how do I resolve this?
>>
>> I see messages like:
>>
>> _GMaster: Sucessfully fixed all entry ops with gfid mismatch
>>
>> Will this resolve error code 23?
>>
>> There is also a python error.
>>
>> The python error was a selinux problem, turning off selinux made node go
>> to active again.
>>
>> See log below.
>>
>>
>> CentOS 7, installed through SIG Gluster (OS updated to latest at the same
>> time)
>>
>> Master cluster: 2 x (2 + 1) distributed, replicated
>>
>> Client cluster: 1 x (2 + 1) replicated
>>
>>
>> Many thanks in advance!
>>
>>
>> Best regards
>>
>> Marcus Pedersén
>>
>>
>>
>> gsyncd.log from Faulty node:
>>
>> [2018-08-31 06:25:51.375267] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.8099 num_files=57    job=3   return_code=23
>> [2018-08-31 06:25:51.465895] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0904 num_files=3     job=3   return_code=23
>> [2018-08-31 06:25:52.562107] E [repce(worker
>> /urd-gds/gluster):197:__call__] RepceClient: call failed
>> call=30069:139655665837888:1535696752.35        method=entry_ops
>> error=OSError
>> [2018-08-31 06:25:52.562346] E [syncdutils(worker
>> /urd-gds/gluster):332:log_raise_exception] <top>: FAIL:
>> Traceback (most recent call last):
>>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311,
>> in main
>>     func(args)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72,
>> in subcmd_worker
>>     local.service_loop(remote)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
>> 1288, in service_loop
>>     g3.crawlwrap(oneshot=True)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615,
>> in crawlwrap
>>     self.crawl()
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545,
>> in crawl
>>     self.changelogs_batch_process(changes)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445,
>> in changelogs_batch_process
>>     self.process(batch)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280,
>> in process
>>     self.process_change(change, done, retry)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179,
>> in process_change
>>     failures = self.slave.server.entry_ops(entries)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in
>> __call__
>>     return self.ins(self.meth, *a)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in
>> __call__
>>     raise res
>> OSError: [Errno 13] Permission denied
>> [2018-08-31 06:25:52.578367] I [repce(agent /urd-gds/gluster):80:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2018-08-31 06:25:53.558765] I [monitor(monitor):279:monitor] Monitor:
>> worker died in startup phase     brick=/urd-gds/gluster
>> [2018-08-31 06:25:53.569777] I [gsyncdstatus(monitor):244:set_worker_status]
>> GeorepStatus: Worker Status Change status=Faulty
>> [2018-08-31 06:26:03.593161] I [monitor(monitor):158:monitor] Monitor:
>> starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
>> [2018-08-31 06:26:03.636452] I [gsyncd(agent /urd-gds/gluster):297:main]
>> <top>: Using session config file       path=/var/lib/glusterd/geo-rep
>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>> [2018-08-31 06:26:03.636810] I [gsyncd(worker /urd-gds/gluster):297:main]
>> <top>: Using session config file      path=/var/lib/glusterd/geo-rep
>> lication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
>> [2018-08-31 06:26:03.637486] I [changelogagent(agent
>> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
>> [2018-08-31 06:26:03.650330] I [resource(worker
>> /urd-gds/gluster):1377:connect_remote] SSH: Initializing SSH connection
>> between master and slave...
>> [2018-08-31 06:26:05.296473] I [resource(worker
>> /urd-gds/gluster):1424:connect_remote] SSH: SSH connection between
>> master and slave established.        duration=1.6457
>> [2018-08-31 06:26:05.297904] I [resource(worker
>> /urd-gds/gluster):1096:connect] GLUSTER: Mounting gluster volume
>> locally...
>> [2018-08-31 06:26:06.396939] I [resource(worker
>> /urd-gds/gluster):1119:connect] GLUSTER: Mounted gluster volume
>> duration=1.0985
>> [2018-08-31 06:26:06.397691] I [subcmds(worker
>> /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful.
>> Acknowledging back to monitor
>> [2018-08-31 06:26:16.815566] I [master(worker
>> /urd-gds/gluster):1593:register] _GMaster: Working dir
>> path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo
>> -001_urd-gds-volume/urd-gds-gluster
>> [2018-08-31 06:26:16.816423] I [resource(worker
>> /urd-gds/gluster):1282:service_loop] GLUSTER: Register time
>> time=1535696776
>> [2018-08-31 06:26:16.888772] I [gsyncdstatus(worker
>> /urd-gds/gluster):277:set_active] GeorepStatus: Worker Status
>> Change        status=Active
>> [2018-08-31 06:26:16.892049] I [gsyncdstatus(worker
>> /urd-gds/gluster):249:set_worker_crawl_status] GeorepStatus: Crawl
>> Status Change    status=History Crawl
>> [2018-08-31 06:26:16.892703] I [master(worker
>> /urd-gds/gluster):1507:crawl] _GMaster: starting history crawl    turns=1
>> stime=(1525739167, 0)   entry_stime=(1525740143, 0)     etime=1535696776
>> [2018-08-31 06:26:17.914803] I [master(worker
>> /urd-gds/gluster):1536:crawl] _GMaster: slave's time
>> stime=(1525739167, 0)
>> [2018-08-31 06:26:18.521718] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.1063 num_files=17    job=3   return_code=23
>> [2018-08-31 06:26:19.260137] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.3441 num_files=34    job=1   return_code=23
>> [2018-08-31 06:26:19.615191] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0923 num_files=7     job=3   return_code=23
>> [2018-08-31 06:26:19.891227] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.1302 num_files=12    job=1   return_code=23
>> [2018-08-31 06:26:19.922700] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.5024 num_files=50    job=2   return_code=23
>>
>> [2018-08-31 06:26:21.639342] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=1.5233 num_files=5     job=3   return_code=23
>> [2018-08-31 06:26:22.12726] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.1191 num_files=7     job=1   return_code=23
>> [2018-08-31 06:26:22.86136] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0731 num_files=4     job=1   return_code=23
>> [2018-08-31 06:26:22.503290] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0779 num_files=15    job=2   return_code=23
>> [2018-08-31 06:26:23.214704] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0738 num_files=9     job=3   return_code=23
>> [2018-08-31 06:26:23.251876] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.2478 num_files=33    job=2   return_code=23
>> [2018-08-31 06:26:23.802699] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0873 num_files=9     job=3   return_code=23
>> [2018-08-31 06:26:23.828176] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0758 num_files=3     job=2   return_code=23
>> [2018-08-31 06:26:23.854063] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.2662 num_files=34    job=1   return_code=23
>> [2018-08-31 06:26:24.403228] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0997 num_files=30    job=3   return_code=23
>> [2018-08-31 06:26:25.526] I [master(worker /urd-gds/gluster):1944:syncjob]
>> Syncer: Sync Time Taken      duration=0.0965 num_files=8     job=3
>> return_code=23
>> [2018-08-31 06:26:25.438527] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0832 num_files=9     job=1   return_code=23
>> [2018-08-31 06:26:25.447256] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.6180 num_files=86    job=2   return_code=23
>> [2018-08-31 06:26:25.571913] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0706 num_files=2     job=3   return_code=0
>> [2018-08-31 06:26:27.21325] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0814 num_files=1     job=1   return_code=23
>> [2018-08-31 06:26:27.615520] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0933 num_files=13    job=1   return_code=23
>> [2018-08-31 06:26:27.668323] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.2190 num_files=95    job=2   return_code=23
>> [2018-08-31 06:26:27.740139] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0716 num_files=11    job=2   return_code=23
>> [2018-08-31 06:26:28.191068] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.1167 num_files=38    job=3   return_code=23
>> [2018-08-31 06:26:28.268213] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0768 num_files=7     job=3   return_code=23
>> [2018-08-31 06:26:28.317909] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0770 num_files=4     job=2   return_code=23
>> [2018-08-31 06:26:28.710064] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0932 num_files=23    job=1   return_code=23
>> [2018-08-31 06:26:28.907250] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0886 num_files=26    job=2   return_code=23
>> [2018-08-31 06:26:28.976679] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0692 num_files=4     job=2   return_code=23
>> [2018-08-31 06:26:29.55774] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0788 num_files=9     job=2   return_code=23
>> [2018-08-31 06:26:29.295576] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0847 num_files=16    job=1   return_code=23
>> [2018-08-31 06:26:29.665076] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.1087 num_files=25    job=2   return_code=23
>> [2018-08-31 06:26:30.277998] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.1122 num_files=40    job=2   return_code=23
>> [2018-08-31 06:26:31.153105] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.3822 num_files=74    job=3   return_code=23
>> [2018-08-31 06:26:31.227639] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0743 num_files=18    job=3   return_code=23
>> [2018-08-31 06:26:31.302660] I [master(worker
>> /urd-gds/gluster):1944:syncjob] Syncer: Sync Time Taken
>> duration=0.0748 num_files=18    job=3   return_code=23
>>
>>
>> ---
>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina
>> personuppgifter. För att läsa mer om hur detta går till, klicka här
>> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
>> E-mailing SLU will result in SLU processing your personal data. For more
>> information on how this is done, click here
>> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina
> personuppgifter. För att läsa mer om hur detta går till, klicka här
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina
> personuppgifter. För att läsa mer om hur detta går till, klicka här
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>



-- 
Thanks and Regards,
Kotresh H R
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180903/d98b5d10/attachment.html>


More information about the Gluster-users mailing list