[Gluster-users] [Centos7x64] Geo-replication problem glusterfs 3.7.0-2

wodel youchi wodel.youchi at gmail.com
Mon May 25 12:25:24 UTC 2015


Hi, and thanks for your replies.

For Kotresh : No, I am not using tar ssh for my geo-replication.

For Aravinda: I had to recreate my slave volume all over et restart the
geo-replication.

If I have thousands of files with this problem, do I have to execute the
fix for all of them? is there an easy way?
Can checkpoints help me in this situation?
and more important, what can cause this problem?

I am syncing containers, they contain lot of files small files, using tar
ssh, would it be more suitable?


PS: I tried to execute this command on the Master

bash generate-gfid-file.sh localhost:data2   $PWD/get-gfid.sh
/tmp/master_gfid_file.txt

but I got errors with files that have blank (space) in their names,
for example: Admin Guide.pdf

the script sees two files Admin and Guide.pdf, then the get-gfid.sh
returns errors "no such file or directory"

thanks.


2015-05-25 7:00 GMT+01:00 Aravinda <avishwan at redhat.com>:

> Looks like this is GFID conflict issue not the tarssh issue.
>
> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
> 'e529a399-756d-4cb1-9779-0af2822a0d94', 'gid': 0, 'mode': 33152, 'entry':
> '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.mdb', 'op': 'CREATE'}, 2)
>
>     Data: {'uid': 0,
>            'gfid': 'e529a399-756d-4cb1-9779-0af2822a0d94',
>            'gid': 0,
>            'mode': 33152,
>            'entry': '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.mdb',
>            'op': 'CREATE'}
>
>     and Error: 2
>
> During creation of "main.mdb" RPC failed with error number 2, ie, ENOENT.
> This error comes when parent directory not exists or exists with different
> GFID.
> In this case Parent GFID "874799ef-df75-437b-bc8f-3fcd58b54789" does not
> exists on slave.
>
>
> To fix the issue,
> -----------------
> Find the parent directory of "main.mdb",
> Get the GFID of that directory, using getfattr
> Check the GFID of the same directory in Slave(To confirm GFIDs are
> different)
> To fix the issue, Delete that directory in Slave.
> Set virtual xattr for that directory and all the files inside that
> directory.
>     setfattr -n glusterfs.geo-rep.trigger-sync -v "1" <DIR>
>     setfattr -n glusterfs.geo-rep.trigger-sync -v "1" <file-path>
>
>
> Geo-rep will recreate the directory with Proper GFID and starts sync.
>
> Let us know if you need any help.
>
> --
> regards
> Aravinda
>
>
>
>
> On 05/25/2015 10:54 AM, Kotresh Hiremath Ravishankar wrote:
>
>> Hi Wodel,
>>
>> Is the sync mode, tar over ssh (i.e., config use_tarssh is true) ?
>> If yes, there is known issue with it and patch is already up in master.
>>
>> But it can be resolved in either of the two ways.
>>
>> 1. If sync mode required is tar over ssh, just disable sync_xattrs which
>> is true
>>     by default.
>>
>>      gluster vol geo-rep <master-vol> <slave-host>::<slave-vol> config
>> sync_xattrs false
>>
>> 2. If sync mode is ok to be changed to rsync. Please do.
>>           gluster vol geo-rep <master-vol> <slave-host>::<slave-vol>
>> use_tarssh false
>>
>> NOTE: rsync supports syncing of acls and xattrs where as tar over ssh
>> does not.
>>        In 3.7.0-2, tar over ssh should be used with sync_xattrs to false
>>
>> Hope this helps.
>>
>> Thanks and Regards,
>> Kotresh H R
>>
>> ----- Original Message -----
>>
>>> From: "wodel youchi" <wodel.youchi at gmail.com>
>>> To: "gluster-users" <gluster-users at gluster.org>
>>> Sent: Sunday, May 24, 2015 3:31:38 AM
>>> Subject: [Gluster-users] [Centos7x64] Geo-replication problem glusterfs
>>> 3.7.0-2
>>>
>>> Hi,
>>>
>>> I have two gluster servers in replicated mode as MASTERS
>>> and one server for replicated geo-replication.
>>>
>>> I've updated my glusterfs installation to 3.7.0-2, all three servers
>>>
>>> I've recreated my slave volumes
>>> I've started the geo-replication, it worked for a while and now I have
>>> some
>>> problmes
>>>
>>> 1- Files/directories are not deleted on slave
>>> 2- New files/rectories are not synced to the slave.
>>>
>>> I have these lines on the active master
>>>
>>> [2015-05-23 06:21:17.156939] W
>>> [master(/mnt/brick2/brick):792:log_failures]
>>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>>> 'e529a399-756d-4cb1-9779-0af2822a0d94', 'gid': 0, 'mode': 33152, 'entry':
>>> '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.mdb', 'op': 'CREATE'},
>>> 2)
>>> [2015-05-23 06:21:17.158066] W
>>> [master(/mnt/brick2/brick):792:log_failures]
>>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>>> 'b4bffa4c-2e88-4b60-9f6a-c665c4d9f7ed', 'gid': 0, 'mode': 33152, 'entry':
>>> '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.hdb', 'op': 'CREATE'},
>>> 2)
>>> [2015-05-23 06:21:17.159154] W
>>> [master(/mnt/brick2/brick):792:log_failures]
>>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>>> '9920cdee-6b87-4408-834b-4389f5d451fe', 'gid': 0, 'mode': 33152, 'entry':
>>> '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.db', 'op': 'CREATE'}, 2)
>>> [2015-05-23 06:21:17.160242] W
>>> [master(/mnt/brick2/brick):792:log_failures]
>>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>>> '307756d2-d924-456f-b090-10d3ff9caccb', 'gid': 0, 'mode': 33152, 'entry':
>>> '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.ndb', 'op': 'CREATE'},
>>> 2)
>>> [2015-05-23 06:21:17.161283] W
>>> [master(/mnt/brick2/brick):792:log_failures]
>>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>>> '69ebb4cb-1157-434b-a6e9-386bea81fc1d', 'gid': 0, 'mode': 33152, 'entry':
>>> '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/COPYING', 'op': 'CREATE'}, 2)
>>> [2015-05-23 06:21:17.162368] W
>>> [master(/mnt/brick2/brick):792:log_failures]
>>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>>> '7d132fda-fc82-4ad8-8b6c-66009999650c', 'gid': 0, 'mode': 33152, 'entry':
>>> '.gfid/f6f2582e-0c5c-4cba-943a-6d5f64baf340/daily.cld', 'op': 'CREATE'},
>>> 2)
>>> [2015-05-23 06:21:17.163718] W
>>> [master(/mnt/brick2/brick):792:log_failures]
>>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>>> 'd8a0303e-ba45-4e45-a8fd-17994c34687b', 'gid': 0, 'mode': 16832, 'entry':
>>>
>>> '.gfid/f6f2582e-0c5c-4cba-943a-6d5f64baf340/clamav-54acc14b44e696e1cfb4a75ecc395fe0',
>>> 'op': 'MKDIR'}, 2)
>>> [2015-05-23 06:21:17.165102] W
>>> [master(/mnt/brick2/brick):792:log_failures]
>>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>>> '49d42bf6-3146-42bd-bc29-e704927d6133', 'gid': 0, 'mode': 16832, 'entry':
>>>
>>> '.gfid/f6f2582e-0c5c-4cba-943a-6d5f64baf340/clamav-debec3aa6afe64bffaee8d099e76f3d4',
>>> 'op': 'MKDIR'}, 2)
>>> [2015-05-23 06:21:17.166147] W
>>> [master(/mnt/brick2/brick):792:log_failures]
>>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>>> '1ddb93ae-3717-4347-910f-607afa67cdb0', 'gid': 0, 'mode': 33152, 'entry':
>>>
>>> '.gfid/49d42bf6-3146-42bd-bc29-e704927d6133/clamav-704a1e9a3e2c97ccac127632d7c6b8e4',
>>> 'op': 'CREATE'}, 2)
>>>
>>>
>>> in the slave lot of lines like this
>>>
>>> [2015-05-22 07:53:57.071999] W [fuse-bridge.c:1970:fuse_create_cbk]
>>> 0-glusterfs-fuse: 25833: /.gfid/03a5a40b-c521-47ac-a4e3-916a6df42689 =>
>>> -1
>>> (Operation not permitted)
>>>
>>>
>>> in the active master I have 3.7 GB of XSYNC-CHANGELOG.xxxxxxx files in
>>>
>>> /var/lib/misc/glusterfsd/data2/ssh%3A%2F%2Froot%4010.10.10.10%3Agluster%3A%2F%2F127.0.0.1%3Aslavedata2/e55761a256af4acfe9b4a419be62462a/xsync
>>>
>>> I don't know if this is normal.
>>>
>>> any idea?
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150525/b827e47e/attachment.html>


More information about the Gluster-users mailing list