[Bugs] [Bug 1622076] [geo-rep]: geo-rep reverse sync in FO/ FB can accidentally delete the content at original master incase of gfid conflict in 3.4.0 without explicit user rmdir

bugzilla at redhat.com bugzilla at redhat.com
Fri Aug 24 11:51:31 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1622076



--- Comment #1 from Kotresh HR <khiremat at redhat.com> ---

Description of problem:
=======================

The new enhancement of automatic resolution of the gfid conflicts between
Master and Slave results in data being deleted at Master if their exists a gfid
conflict from Salve in FB scenario. 

Without the feature the geo-rep used to be faulty and it would log in the
messages for admin intervention. 

Following are detailed scenarios: 

Scenario 1:
+++++++++++

If the file is synced to slave and than appended at master (when geo-rep is
stopped) and again it got appended as part of slave write. 

With feature=> Than the geo-rep reverse sync from slave=>master will overwrite
the data that was originally written on Master. 

Without feature => The content didnt change at Master. It didnt sync from
slave. Theoretically it should have but HYBRID crawl seems to have not picked
this up and picked directories before it. The workers keeps crashing because of
directories missmatch. 


Scenario 2:
+++++++++++

If the file is synced to slave and geo-rep went down. A new file is created
with the same name at slave (Has different gfid). 

With feature => Than the  geo-rep reverse sync from slave=>master will
overwrite the file that was originally present at master. 

Without feature => File logs error and ignores to sync. Following are errors: 

[2018-08-24 07:57:26.793184] E [master(/rhs/brick1/b1):782:log_failures]
_GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
'32040ac3-7437-4df4-a238-cb0d6e43cf89', 'gid': 0, 'mode': 33188, 'entry':
'.gfid/00000000-0000-0000-0000-000000000001/rahul', 'op': 'MKNOD'}, 17,
'0e3c52a4-03c9-4c1b-938f-2017b04a6c34')

Scenario 3:
+++++++++++

If the directory (containing some files) got synced to slave and geo-rep went
down. A new directory created with same name (Has different gfid) with
different content.

With feature=> Than the geo-rep reverse sync from slave=>master will overwrite
the content which was present at master. 

Without feature => Worker crashes and remains faulty. It doesnt do explicit
rmdir and warns the user. 

[2018-08-24 07:57:26.794929] E [master(/rhs/brick2/b3):782:log_failures]
_GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
'86f710c9-1e2f-4c76-8695-475dc639236b', 'gid': 0, 'mode': 16877, 'entry':
'.gfid/00000000-0000-0000-0000-000000000001/NEW_DIR', 'op': 'MKDIR'}, 17,
'61a2eee2-05c6-4a7f-93d8-2b1e3e179896')
[2018-08-24 07:57:26.795558] E
[syncdutils(/rhs/brick2/b3):280:log_raise_exception] <top>: The above directory
failed to sync. Please fix it to proceed further.
[2018-08-24 07:57:26.796639] E [master(/rhs/brick1/b1):782:log_failures]
_GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
'86f710c9-1e2f-4c76-8695-475dc639236b', 'gid': 0, 'mode': 16877, 'entry':
'.gfid/00000000-0000-0000-0000-000000000001/NEW_DIR', 'op': 'MKDIR'}, 17,
'61a2eee2-05c6-4a7f-93d8-2b1e3e179896')
[2018-08-24 07:57:26.797055] E
[syncdutils(/rhs/brick1/b1):280:log_raise_exception] <top>: The above directory
failed to sync. Please fix it to proceed further.
[2018-08-24 07:57:26.822345] I [syncdutils(/rhs/brick2/b3):253:finalize] <top>:
exiting.


Version-Release number of selected component (if applicable):
=============================================================

mainline


How reproducible:
=================

Always

Additional info:
================

1. FO / FB scenarios are rare and is done only as a disaster recovery. 
2. The intention and the enhancement actually helps the user to have the auto
resolution and provide better usability. 
3. This however could execute "rm -rf or rmdir" via code which is not used by
the user
4. If their exists a content with same name different gfid which was written in
the actual master, than that would be accidentally

Solution:
=========

1. Everyone that needs to do FO/FB has to follow the right steps mentioned in
the admin guide. They are not commonly used and hence require a reference  
2. Explain these scenario as a "NOTE / WARNING / Expectation" in the admin
guide for user awareness.
3. Create a config option which can disable the auto gfid resolution
4. Provide a not in admin guide to use the config option (from 3) if the user
choose to handle the conflicting directories manually as a precautionary
measure before deleting (from 2).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list