[Bugs] [Bug 1364422] [libgfchangelog]: If changelogs are not available for the requested time range, no distinguished error

bugzilla at redhat.com bugzilla at redhat.com
Fri Mar 10 11:08:04 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1364422

Rahul Hinduja <rhinduja at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ON_QA                       |VERIFIED



--- Comment #16 from Rahul Hinduja <rhinduja at redhat.com> ---
Thanks to kotresh in helping this race condition verification. With 3.1.3 it
crashes and remains faulty. With 3.2.0 it goes to hybrid and continuous. Moving
this bug to verified state. 

On 3.1.3

Set stime to less than start time available as follows.

[root at dhcp42-195 ~]# getfattr -d -m . -e hex /rhs/brick3/b3/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick3/b3/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.d820a210-d7a7-4b6c-807f-be7e5d5fbad6.5c513245-edca-436e-a98f-9367f1105a6a.stime=0x59d24c220008dc36
trusted.glusterfs.d820a210-d7a7-4b6c-807f-be7e5d5fbad6.xtime=0x58c24c2c000acca9
trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
trusted.glusterfs.volume-id=0xd820a210d7a74b6c807fbe7e5d5fbad6

[root at dhcp42-195 ~]# getfattr -d -m . -e hex /rhs/brick3/b3/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick3/b3/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.d820a210-d7a7-4b6c-807f-be7e5d5fbad6.5c513245-edca-436e-a98f-9367f1105a6a.stime=0x59d24c220008dc36
trusted.glusterfs.d820a210-d7a7-4b6c-807f-be7e5d5fbad6.xtime=0x58c24c2c000acca9
trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
trusted.glusterfs.volume-id=0xd820a210d7a74b6c807fbe7e5d5fbad6

[root at dhcp42-195 ~]# less
/rhs/brick3/b3/.glusterfs/changelogs/htime/HTIME.1489086377

1489086392    <<<< start time available

[root at dhcp42-195 ~]# python
Python 2.7.5 (default, Oct 11 2015, 17:47:16)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import xattr
>>> import os
>>> xattr.set("/rhs/brick3/b3/", "trusted.glusterfs.d820a210-d7a7-4b6c-807f-be7e5d5fbad6.5c513245-edca-436e-a98f-9367f1105a6a.stime", struct.pack("!II", 1489086300, 0))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'struct' is not defined
>>> import struct
>>> xattr.set("/rhs/brick3/b3/", "trusted.glusterfs.d820a210-d7a7-4b6c-807f-be7e5d5fbad6.5c513245-edca-436e-a98f-9367f1105a6a.stime", struct.pack("!II", 1489086300, 0))
>>>
KeyboardInterrupt
>>> quit
Use quit() or Ctrl-D (i.e. EOF) to exit
>>>
[root at dhcp42-195 ~]# gluster volume geo-replication r 10.70.43.124::s status

MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE            
 SLAVE NODE      STATUS    CRAWL STATUS       LAST_SYNCED                  
------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.195    r             /rhs/brick3/b3    root          10.70.43.124::s  
 N/A             Faulty    N/A                N/A                          
10.70.42.195    r             /rhs/brick2/b2    root          10.70.43.124::s  
 10.70.43.124    Active    Changelog Crawl    2017-03-10 01:48:15          
[root at dhcp42-195 ~]# gluster volume geo-replication r 10.70.43.124::s stop
Stopping geo-replication session between r & 10.70.43.124::s has been
successful
[root at dhcp42-195 ~]# gluster volume geo-replication r 10.70.43.124::s start
Starting geo-replication session between r & 10.70.43.124::s has been
successful
[root at dhcp42-195 ~]# gluster volume geo-replication r 10.70.43.124::s status

MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE            
 SLAVE NODE    STATUS             CRAWL STATUS    LAST_SYNCED          
--------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.195    r             /rhs/brick3/b3    root          10.70.43.124::s  
 N/A           Initializing...    N/A             N/A                  
10.70.42.195    r             /rhs/brick2/b2    root          10.70.43.124::s  
 N/A           Initializing...    N/A             N/A                  
[root at dhcp42-195 ~]# gluster volume geo-replication r 10.70.43.124::s status

MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE            
 SLAVE NODE      STATUS    CRAWL STATUS     LAST_SYNCED                  
----------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.195    r             /rhs/brick3/b3    root          10.70.43.124::s  
 10.70.43.124    Active    History Crawl    2017-03-10 01:48:12          
10.70.42.195    r             /rhs/brick2/b2    root          10.70.43.124::s  
 10.70.43.124    Active    History Crawl    2017-03-10 01:48:15          
[root at dhcp42-195 ~]# gluster volume geo-replication r 10.70.43.124::s status

MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE            
 SLAVE NODE      STATUS    CRAWL STATUS     LAST_SYNCED                  
----------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.195    r             /rhs/brick3/b3    root          10.70.43.124::s  
 N/A             Faulty    N/A              N/A                          
10.70.42.195    r             /rhs/brick2/b2    root          10.70.43.124::s  
 10.70.43.124    Active    History Crawl    2017-03-10 01:48:15          
[root at dhcp42-195 ~]# gluster volume geo-replication r 10.70.43.124::s status

MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE            
 SLAVE NODE      STATUS    CRAWL STATUS     LAST_SYNCED                  
----------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.195    r             /rhs/brick3/b3    root          10.70.43.124::s  
 N/A             Faulty    N/A              N/A                          
10.70.42.195    r             /rhs/brick2/b2    root          10.70.43.124::s  
 10.70.43.124    Active    History Crawl    2017-03-10 01:48:15          
[root at dhcp42-195 ~]# less
/var/log/glusterfs/geo-replication/r/ssh%3A%2F%2Froot%4010.70.43.124%3Agluster%3A%2F%2F127.0.0.1%3As.log
[root at dhcp42-195 ~]#

It crashes with following traceback:
[2017-03-10 07:40:44.911781] I [monitor(monitor):343:monitor] Monitor:
worker(/rhs/brick3/b3) died in startup phase
[2017-03-10 07:40:55.161108] I [monitor(monitor):266:monitor] Monitor:
------------------------------------------------------------
[2017-03-10 07:40:55.161722] I [monitor(monitor):267:monitor] Monitor: starting
gsyncd worker
[2017-03-10 07:40:55.412802] I [gsyncd(/rhs/brick3/b3):710:main_i] <top>:
syncing: gluster://localhost:r -> ssh://root@10.70.43.124:gluster://localhost:s
[2017-03-10 07:40:55.417292] I [changelogagent(agent):73:__init__]
ChangelogAgent: Agent listining...
[2017-03-10 07:40:58.478969] I [master(/rhs/brick3/b3):83:gmaster_builder]
<top>: setting up xsync change detection mode
[2017-03-10 07:40:58.479608] I [master(/rhs/brick3/b3):367:__init__] _GMaster:
using 'rsync' as the sync engine
[2017-03-10 07:40:58.481811] I [master(/rhs/brick3/b3):83:gmaster_builder]
<top>: setting up changelog change detection mode
[2017-03-10 07:40:58.482242] I [master(/rhs/brick3/b3):367:__init__] _GMaster:
using 'rsync' as the sync engine
[2017-03-10 07:40:58.484389] I [master(/rhs/brick3/b3):83:gmaster_builder]
<top>: setting up changeloghistory change detection mode
[2017-03-10 07:40:58.484896] I [master(/rhs/brick3/b3):367:__init__] _GMaster:
using 'rsync' as the sync engine
[2017-03-10 07:41:00.517904] I [master(/rhs/brick3/b3):1249:register] _GMaster:
xsync temp directory:
/var/lib/misc/glusterfsd/r/ssh%3A%2F%2Froot%4010.70.43.124%3Agluster%3A%2F%2F127.0.0.1%3As/c4853a41d2b23f970bf0b4e046255cd5/xsync
[2017-03-10 07:41:00.518466] I [resource(/rhs/brick3/b3):1491:service_loop]
GLUSTER: Register time: 1489131660
[2017-03-10 07:41:00.547487] I [master(/rhs/brick3/b3):510:crawlwrap] _GMaster:
primary master with volume id d820a210-d7a7-4b6c-807f-be7e5d5fbad6 ...
[2017-03-10 07:41:00.553791] I [master(/rhs/brick3/b3):519:crawlwrap] _GMaster:
crawl interval: 1 seconds
[2017-03-10 07:41:00.562241] I [master(/rhs/brick3/b3):1163:crawl] _GMaster:
starting history crawl... turns: 1, stime: (1489086300, 0), etime: 1489131660
[2017-03-10 07:41:00.563826] E [repce(agent):117:worker] <top>: call failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 54,
in history
    num_parallel)
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 100,
in cl_history_changelog
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27,
in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory
[2017-03-10 07:41:00.565355] E [repce(/rhs/brick3/b3):207:__call__]
RepceClient: call 11457:140177899616064:1489131660.56 (history) failed on peer
with ChangelogException
[2017-03-10 07:41:00.565733] E [resource(/rhs/brick3/b3):1506:service_loop]
GLUSTER: Changelog History Crawl failed, [Errno 2] No such file or directory
[2017-03-10 07:41:00.566288] I [syncdutils(/rhs/brick3/b3):220:finalize] <top>:
exiting.
[2017-03-10 07:41:00.571651] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2017-03-10 07:41:00.572334] I [syncdutils(agent):220:finalize] <top>: exiting.
[2017-03-10 07:41:01.483324] I [monitor(monitor):343:monitor] Monitor:
worker(/rhs/brick3/b3) died in startup phase
[2017-03-10 07:41:11.734217] I [monitor(monitor):266:monitor] Monitor:
------------------------------------------------------------
[2017-03-10 07:41:11.734749] I [monitor(monitor):267:monitor] Monitor: starting
gsyncd worker
[2017-03-10 07:41:11.922350] I [gsyncd(/rhs/brick3/b3):710:main_i] <top>:
syncing: gluster://localhost:r -> ssh://root@10.70.43.124:gluster://localhost:s
[2017-03-10 07:41:11.922379] I [changelogagent(agent):73:__init__]
ChangelogAgent: Agent listining...
[2017-03-10 07:41:14.980911] I [master(/rhs/brick3/b3):83:gmaster_builder]
<top>: setting up xsync change detection mode
[2017-03-10 07:41:14.981561] I [master(/rhs/brick3/b3):367:__init__] _GMaster:
using 'rsync' as the sync engine
[2017-03-10 07:41:14.983578] I [master(/rhs/brick3/b3):83:gmaster_builder]
<top>: setting up changelog change detection mode
[2017-03-10 07:41:14.983989] I [master(/rhs/brick3/b3):367:__init__] _GMaster:
using 'rsync' as the sync engine
[2017-03-10 07:41:14.985780] I [master(/rhs/brick3/b3):83:gmaster_builder]
<top>: setting up changeloghistory change detection mode
[2017-03-10 07:41:14.986200] I [master(/rhs/brick3/b3):367:__init__] _GMaster:
using 'rsync' as the sync engine

======================================================================================================================================================
On 3.2:  1489129736 is the start time available. Set it to 1489129600. It does
not crash it goes to hybrid crawl instead.

[root at dhcp42-7 ~]# gluster vol geo-rep ra 10.70.41.211::rs stop
Stopping geo-replication session between ra & 10.70.41.211::rs has been
successful
[root at dhcp42-7 ~]# python
Python 2.7.5 (default, Aug  2 2016, 04:20:16)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import xattr
>>> import os
>>> import struct
>>> xattr.set("/rhs/brick1/b1", "trusted.glusterfs.04979d53-79f1-47a9-9eb8-8b19c6765368.39db963b-d234-4409-8b06-bb5bcea75911.stime", struct.pack("!II", 1489129600, 0))
>>>
[root at dhcp42-7 ~]# gluster vol geo-rep ra 10.70.41.211::rs status

MASTER NODE    MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE             
 SLAVE NODE    STATUS     CRAWL STATUS    LAST_SYNCED          
------------------------------------------------------------------------------------------------------------------------------------
10.70.42.7     ra            /rhs/brick1/b1    root          10.70.41.211::rs  
 N/A           Stopped    N/A             N/A                  
10.70.42.7     ra            /rhs/brick2/b2    root          10.70.41.211::rs  
 N/A           Stopped    N/A             N/A      

[root at dhcp42-7 ~]# gluster vol geo-rep ra 10.70.41.211::rs start
Starting geo-replication session between ra & 10.70.41.211::rs has been
successful

[root at dhcp42-7 ~]# gluster vol geo-rep ra 10.70.41.211::rs status
^[[A
MASTER NODE    MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE             
 SLAVE NODE    STATUS             CRAWL STATUS    LAST_SYNCED          
--------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.7     ra            /rhs/brick1/b1    root          10.70.41.211::rs  
 N/A           Initializing...    N/A             N/A                  
10.70.42.7     ra            /rhs/brick2/b2    root          10.70.41.211::rs  
 N/A           Initializing...    N/A             N/A                  
[root at dhcp42-7 ~]# gluster vol geo-rep ra 10.70.41.211::rs status

[root at dhcp42-7 ~]#
[root at dhcp42-7 ~]#
[root at dhcp42-7 ~]# gluster vol geo-rep ra 10.70.41.211::rs status

MASTER NODE    MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE             
 SLAVE NODE      STATUS    CRAWL STATUS     LAST_SYNCED                  
----------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.7     ra            /rhs/brick1/b1    root          10.70.41.211::rs  
 10.70.41.211    Active    Hybrid Crawl     2017-03-10 07:09:25          
10.70.42.7     ra            /rhs/brick2/b2    root          10.70.41.211::rs  
 10.70.41.211    Active    History Crawl    2017-03-10 07:09:25          
[root at dhcp42-7 ~]# gluster vol geo-rep ra 10.70.41.211::rs status

MASTER NODE    MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE             
 SLAVE NODE      STATUS    CRAWL STATUS       LAST_SYNCED                  
------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.7     ra            /rhs/brick1/b1    root          10.70.41.211::rs  
 10.70.41.211    Active    Changelog Crawl    2017-03-10 07:09:13          
10.70.42.7     ra            /rhs/brick2/b2    root          10.70.41.211::rs  
 10.70.41.211    Active    History Crawl      2017-03-10 07:09:25  


Does not crash:
[2017-03-10 07:44:56.887825] I [gsyncd(/rhs/brick1/b1):747:main_i] <top>:
syncing: gluster://localhost:ra ->
ssh://root@10.70.41.211:gluster://localhost:rs
[2017-03-10 07:45:08.826819] I [master(/rhs/brick2/b2):83:gmaster_builder]
<top>: setting up xsync change detection mode
[2017-03-10 07:45:08.827837] I [master(/rhs/brick2/b2):369:__init__] _GMaster:
using 'rsync' as the sync engine
[2017-03-10 07:45:08.827426] I [master(/rhs/brick1/b1):83:gmaster_builder]
<top>: setting up xsync change detection mode
[2017-03-10 07:45:08.829051] I [master(/rhs/brick1/b1):369:__init__] _GMaster:
using 'rsync' as the sync engine
[2017-03-10 07:45:08.831597] I [master(/rhs/brick2/b2):83:gmaster_builder]
<top>: setting up changelog change detection mode
[2017-03-10 07:45:08.832722] I [master(/rhs/brick2/b2):369:__init__] _GMaster:
using 'rsync' as the sync engine
[2017-03-10 07:45:08.836221] I [master(/rhs/brick1/b1):83:gmaster_builder]
<top>: setting up changelog change detection mode
[2017-03-10 07:45:08.837248] I [master(/rhs/brick1/b1):369:__init__] _GMaster:
using 'rsync' as the sync engine
[2017-03-10 07:45:08.838807] I [master(/rhs/brick2/b2):83:gmaster_builder]
<top>: setting up changeloghistory change detection mode
[2017-03-10 07:45:08.839443] I [master(/rhs/brick2/b2):369:__init__] _GMaster:
using 'rsync' as the sync engine
[2017-03-10 07:45:08.839672] I [master(/rhs/brick1/b1):83:gmaster_builder]
<top>: setting up changeloghistory change detection mode
[2017-03-10 07:45:08.840241] I [master(/rhs/brick1/b1):369:__init__] _GMaster:
using 'rsync' as the sync engine
[2017-03-10 07:45:10.901714] I [master(/rhs/brick2/b2):1253:register] _GMaster:
xsync temp directory:
/var/lib/misc/glusterfsd/ra/ssh%3A%2F%2Froot%4010.70.41.211%3Agluster%3A%2F%2F127.0.0.1%3Ars/764586b145d7206a154a778f64bd2f50/xsync
[2017-03-10 07:45:10.902595] I [resource(/rhs/brick2/b2):1533:service_loop]
GLUSTER: Register time: 1489131910
[2017-03-10 07:45:10.919552] I [master(/rhs/brick2/b2):512:crawlwrap] _GMaster:
primary master with volume id 04979d53-79f1-47a9-9eb8-8b19c6765368 ...
[2017-03-10 07:45:10.919812] I [master(/rhs/brick1/b1):1253:register] _GMaster:
xsync temp directory:
/var/lib/misc/glusterfsd/ra/ssh%3A%2F%2Froot%4010.70.41.211%3Agluster%3A%2F%2F127.0.0.1%3Ars/c19b89ac45352ab8c894d210d136dd56/xsync
[2017-03-10 07:45:10.920842] I [resource(/rhs/brick1/b1):1533:service_loop]
GLUSTER: Register time: 1489131910
[2017-03-10 07:45:10.930277] I [master(/rhs/brick2/b2):521:crawlwrap] _GMaster:
crawl interval: 1 seconds
[2017-03-10 07:45:10.930512] I [master(/rhs/brick1/b1):512:crawlwrap] _GMaster:
primary master with volume id 04979d53-79f1-47a9-9eb8-8b19c6765368 ...
[2017-03-10 07:45:10.955785] I [master(/rhs/brick1/b1):521:crawlwrap] _GMaster:
crawl interval: 1 seconds
[2017-03-10 07:45:10.962408] I [master(/rhs/brick2/b2):1167:crawl] _GMaster:
starting history crawl... turns: 1, stime: (1489129765, 0), etime: 1489131910
[2017-03-10 07:45:10.972481] I [master(/rhs/brick1/b1):1167:crawl] _GMaster:
starting history crawl... turns: 1, stime: (-1, 0), etime: 1489131910
[2017-03-10 07:45:10.972976] I [master(/rhs/brick1/b1):1170:crawl] _GMaster:
stime not available, abandoning history crawl
[2017-03-10 07:45:10.973434] I [resource(/rhs/brick1/b1):1548:service_loop]
GLUSTER: No stime available, using xsync crawl
[2017-03-10 07:45:10.977698] I [master(/rhs/brick1/b1):512:crawlwrap] _GMaster:
primary master with volume id 04979d53-79f1-47a9-9eb8-8b19c6765368 ...
[2017-03-10 07:45:10.984404] I [master(/rhs/brick1/b1):521:crawlwrap] _GMaster:
crawl interval: 60 seconds
[2017-03-10 07:45:10.992049] I [master(/rhs/brick1/b1):1275:crawl] _GMaster:
starting hybrid crawl..., stime: (-1, 0)
[2017-03-10 07:45:11.968451] I [master(/rhs/brick2/b2):1196:crawl] _GMaster:
slave's time: (1489129765, 0)
[2017-03-10 07:45:12.15741] I [master(/rhs/brick1/b1):1285:crawl] _GMaster:
processing xsync changelog
/var/lib/misc/glusterfsd/ra/ssh%3A%2F%2Froot%4010.70.41.211%3Agluster%3A%2F%2F127.0.0.1%3Ars/c19b89ac45352ab8c894d210d136dd56/xsync/XSYNC-CHANGELOG.1489131910
[2017-03-10 07:45:13.115169] I [master(/rhs/brick1/b1):1282:crawl] _GMaster:
finished hybrid crawl syncing, stime: (1489129753, 210362)
[2017-03-10 07:45:13.119362] I [master(/rhs/brick1/b1):512:crawlwrap] _GMaster:
primary master with volume id 04979d53-79f1-47a9-9eb8-8b19c6765368 ...
[2017-03-10 07:45:13.126692] I [master(/rhs/brick1/b1):521:crawlwrap] _GMaster:
crawl interval: 3 seconds
[2017-03-10 07:45:13.235455] I [master(/rhs/brick2/b2):1210:crawl] _GMaster:
finished history crawl syncing, endtime: 1489131896, stime: (1489129765, 0)
[2017-03-10 07:45:25.250497] I [master(/rhs/brick2/b2):1167:crawl] _GMaster:
starting history crawl... turns: 2, stime: (1489129765, 0), etime: 1489131925
[2017-03-10 07:45:25.256480] I [master(/rhs/brick2/b2):1210:crawl] _GMaster:
finished history crawl syncing, endtime: 1489131911, stime: (1489129765, 0)
[2017-03-10 07:45:25.261801] I [master(/rhs/brick2/b2):512:crawlwrap] _GMaster:
primary master with volume id 04979d53-79f1-47a9-9eb8-8b19c6765368 ...

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=3O89N7ieFe&a=cc_unsubscribe


More information about the Bugs mailing list