[Gluster-users] Debugging georeplication failures

Aravinda avishwan at redhat.com
Wed Nov 25 05:34:33 UTC 2015


One more thing,

Need not worry too much about the SKIPPED_GFIDs list. Due to entry 
failure Geo-rep is unable to create the entry and successive rsync fails 
for that file, but all the GFIDs which were in the same batch is logged 
as failures. Which is not true, Rsync does partial sync skips the failed 
GFIDs and syncs rest of the files. I am working on fixing the logging issue.

regards
Aravinda

On 11/25/2015 10:51 AM, Aravinda wrote:
> Hi,
>
> Looks like GFID conflict in Slave. (Same filename with different GFID 
> exists in Slave undeleted may be due to unlink failure or any other 
> failure)
> Need to identify the cause for GFID conflict. Please share the 
> workload details or share the changelogs from brick 
> backend(/data/media/.glusterfs/changelogs)
>
> "ENTRY FAILED" shows file exists error but shows different GFID
>
> [2015-11-20 11:40:14.93090] W [master(/data/media):803:log_failures]
> _GMaster: ENTRY FAILED: ({'uid': 33, 'gfid':
> '31d66429-c700-4a10-bb32-35e1b36a479f', 'gid': 33, 'mode': 33206, 
> 'entry':
> '.gfid/b1dc6c6d-dac7-4da9-9577-4614942a72a0/official-nightmare-before-christmas-vampire-teddy-girls-dress-body-web.jpg', 
>
> 'op': 'CREATE'},*17, 'df0e67f5-f2ce-45c3-b4f1-224aa3059ec7'*)
>
> Also looks like Split brain issues in Slave. Refer this document to 
> resolve Split brain issues in Slave.
>
> https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md 
>
>
> regards
> Aravinda
>
> On 11/25/2015 03:08 AM, Audrius Butkevicius wrote:
>> So the version of rsync is 3.1.0, but the bug mentioned only applies to
>> large files, where as in my case the files are less than a MB.
>>
>> I've started digging through the logs and found a bunch of these on the
>> slave:
>>
>> [2015-11-20 11:40:46.730805] W [fuse-bridge.c:1978:fuse_create_cbk]
>> 0-glusterfs-fuse: 1882288: 
>> /.gfid/31d66429-c700-4a10-bb32-35e1b36a479f =>
>> -1 (Operation not permitted)
>> [2015-11-20 12:39:59.269844] W [fuse-bridge.c:1978:fuse_create_cbk]
>> 0-glusterfs-fuse: 1918306: 
>> /.gfid/6802a0c6-1f62-4213-a70d-7b46d9ff8f3a =>
>> -1 (Operation not permitted)
>>
>> So something funky was happening for an hour 4 days ago. Given the 
>> volume
>> is on EBS, maybe there was some glitch there.
>>
>> I can also find the corresponding failures on the master:
>>
>> [2015-11-20 11:40:14.93090] W [master(/data/media):803:log_failures]
>> _GMaster: ENTRY FAILED: ({'uid': 33, 'gfid':
>> '31d66429-c700-4a10-bb32-35e1b36a479f', 'gid': 33, 'mode': 33206, 
>> 'entry':
>> '.gfid/b1dc6c6d-dac7-4da9-9577-4614942a72a0/official-nightmare-before-christmas-vampire-teddy-girls-dress-body-web.jpg', 
>>
>> 'op': 'CREATE'}, 17, 'df0e67f5-f2ce-45c3-b4f1-224aa3059ec7')
>> [2015-11-20 11:40:14.265054] W [master(/data/media):803:log_failures]
>> _GMaster: META FAILED: ({'go':
>> '.gfid/31d66429-c700-4a10-bb32-35e1b36a479f', 'stat': {'atime':
>> 1448019600.232466, 'gid': 33, 'mtime': 1448019600.316466, 'mode': 33279,
>> 'uid': 33}, 'op': 'META'}, 2)
>>
>> If I grep for SKIPPED GFID I get the following:
>>
>> [2015-11-20 11:40:40.704817] W [master(/data/media):1014:process] 
>> _GMaster:
>> SKIPPED GFID =
>> 192632af-28c5-4e03-a62d-458fe7f3b5f9,7ea8d7a8-524b-4dd0-b97a-dc7d3481f341,204f6112-0e8d-4f6d-855b-bf10f9c63b62,7e626e8f-edad-4f39-a6c6-547a1da34aa1,1f0d0208-1962-4eb1-91d4-cf7ed297d8e3,95d389c4-3258-4ca0-8fc4-26b8427b1eaf,425cedc6-6343-4326-8540-996d2d56dc9c,5955928b-2b8f-4cc9-a336-3eac4382789b,8932efcd-ba90-46ec-84c8-5e9e51cc84e9,2530275d-5f03-4143-9abf-d07cc79bf80a,73574466-86f3-4ab2-b5da-c31ac28c27c1,776e5e8f-5c6a-46b1-ad54-733e157d2097,008a69f3-217c-4dbc-a469-5a5bc8ecd589,dca8d8d9-03cf-4793-92e4-bfcfddd262f6,c85b7a29-73af-4f44-a07e-a44082d7a93a,6c1f56d6-4ea6-4910-9677-ea33edd35d28,0ea56588-87fa-4355-9403-e311525454fc,c8ce76c9-e21d-46ce-a2b5-14dfd0070f64,db9e6484-0e5e-4f6e-815b-3c2b273deee5,35d10752-43b5-4398-be5f-17cb9de73a6b,396e5faf-74a1-4849-97e3-009dbfb22836,d148e7d5-c2f3-4d06-8cd6-8588e6aac196,404d20c5-1c6c-4aad-98be-2c23930173b3,f1fae11c-db8e-4cd5-8e47-a3870316f89c,d8daa413-e57f-44fb-b907-b1a497f2dcfa,5f6ee8c2-84fb-432e-95cd-e428ab256e83,6bf54dcd-c3b4-4187-a390-eca841e46570,335c07ca-d339-4d3a-aa88-3b5753d24fbf,8fdbac00-6628-4f22-8fb4-b7a6524cae49,31d66429-c700-4a10-bb32-35e1b36a479f 
>>
>> [2015-11-20 11:41:35.907850] W [master(/data/media):1014:process] 
>> _GMaster:
>> SKIPPED GFID =
>> 03069c7f-8eaa-45b0-92ed-50cb648cd912,788f5ed1-923e-4b86-9696-2a6de07ebb2e,43d12b40-b6e2-43c4-8883-85e89dc81321 
>>
>> [2015-11-20 12:11:55.492068] W [master(/data/media):1014:process] 
>> _GMaster:
>> SKIPPED GFID =
>> eb02369f-7ca8-480a-b00c-768964410ed8,17045ac9-27dd-4bf9-9f90-d7b146070dd5,265e3d9c-1657-45cb-bbf6-db439eb18ccf,553c420f-b3cc-47f2-8d5f-cfc2ffdd1a92 
>>
>> [2015-11-20 12:12:53.372432] W [master(/data/media):1014:process] 
>> _GMaster:
>> SKIPPED GFID =
>> 66c5878e-8c00-4f7d-a3ad-4adec84a5e22,f4dc086d-9c2b-449c-9e31-bbae9ebcdea7,f99317b2-72e8-49e3-b676-647abad508b1 
>>
>> [2015-11-20 12:37:55.773813] W [master(/data/media):1014:process] 
>> _GMaster:
>> SKIPPED GFID =
>> 4af54f1c-e8e1-4915-9328-a458d5d35d5d,acbe1f12-87e8-4192-b864-d90030269bba,7d27a795-da63-4742-9e91-abd8fa543612,8d4e642d-fd40-44d6-8419-8d3459df7ce3 
>>
>> [2015-11-20 12:39:28.852575] W [master(/data/media):1014:process] 
>> _GMaster:
>> SKIPPED GFID =
>> d90dc121-02e7-4a79-bc03-1bd8fddd9f48,54bb563f-ab44-4e91-a46b-764a122ce7fa,088141de-7545-40f9-b776-751738a89740,2dab3faf-4a6c-407a-88cd-cddef6f55299,d887806f-23b4-4389-a4dc-f9027702a2df,fc5a9bc8-ea62-4677-baed-16510541373a,33136ad2-c5b4-448c-991d-1e72fefef021,cf3e2675-e41b-4782-9478-91773eb0a4aa,6412d878-e0f1-4700-84df-05f4af35962f,ec3cf6e1-7f27-4650-b978-8a5a7f620389,d3651bb9-cd2d-4c5f-93e6-fe4fb1cdf5db,ecb0415e-1524-40f4-870e-1fd0f8371b1d,a118aaae-bd3e-4b19-a0e0-891aa9edb09a,7642d3f3-f1e5-4aca-bcfe-bdb3c44779a9,2e29f3f8-c460-48eb-9db5-b281b67cc2bf,e61db54b-3979-488a-8789-a5d0615c5a97,4212d840-9c22-4d9e-b61b-5e35271dfe80,dad1c60b-9da6-4e57-b014-daa1aca73ce3,93699a3d-40b8-4bbd-b78f-aabf965df57f,4fad7468-91f2-4deb-aaf7-6401068c9e6d,c9738295-46cc-4fe7-b359-dc94f5815ce9,91853c5c-4877-4c9e-9481-c86368942f78,59deed8e-d3d0-4ab7-854e-53a8dd455de0,20b86c13-7df1-4d13-bac1-7d628a00d6ce,b7b86a2d-7963-41a4-a423-14e25d1e78c4,3c17d7fe-bb7f-489c-a525-5c8b7bb93c3e,e230d207-7c68-4983-a958-f2dcfc1ce694,fa8bf3c0-abae-446c-83c5-45ef8bcaa4b8,14089102-8106-45d9-a3f1-d1446b568f4e,6802a0c6-1f62-4213-a70d-7b46d9ff8f3a,0a253bbc-ef98-4da0-951f-e17c5a7f5858,ef054b76-986b-4a89-b8e6-b4988221aaa2,48c0a153-708c-44ee-b186-cf255936a02b,fa2646a6-807c-4e9d-8f2b-a9cdf2674e0c,1ed4a563-4f6a-4b5a-9866-89025fe7afd5,0f293cf7-bc32-4f8a-87d5-388a4bffb4af,f4126726-667b-451d-8214-a18bb3f468cd,e23dc8b3-da1c-4d18-aec9-22e0aa174d81,40b9f10d-7304-4c0b-8498-bef23b305d03,15c25d1e-2a62-495e-887f-14d0cb0527b1,67371804-9084-4801-b664-44e88bea8ac3,4750fa3f-d1a4-4472-b10d-3f75d0b451dc 
>>
>> [2015-11-23 09:18:10.43391] W [master(/data/media):1014:process] 
>> _GMaster:
>> SKIPPED GFID =
>> 228843f3-62f0-4687-b5eb-6d1e21257ad0,b0078359-fbf0-4709-8f40-8383a11d7875,60cff4d5-8b5d-4f7f-8bc1-27081a011458,bedb6ac4-208d-47e1-812c-5547c84ab841,da6810d9-4883-45e1-b73e-55a7ff17b5e7,e03b5c03-b25c-49ba-86f0-8a709a9c2658,053673a0-c1cc-4057-83fa-f97740cb5d4f,dbd6ea84-8f24-4a47-ac41-22c3fd788ecf,43caa3e7-ca04-47ab-b950-105606b313a4,62d8b1d0-fc89-4fb1-a41a-957dcb34d325,4e8fe1fa-60cd-47fa-bad6-f617c312f53b,6c3d6cf3-62ae-4ab8-9dc3-7815552401fe,f79be814-7e78-4985-bcdd-688da23d1808,c4186455-0f06-4b5d-89be-3c5ccbdeb6f0,f9c4ccdb-2337-479d-845d-ee4d85b69ece,bcd14726-1bab-4d97-8915-ec8bbe8faf8c,cca82341-a430-4a59-a900-1af66dcf7bb8,b7043a8e-4286-4831-91ec-c146e40bc6be,995ffeb6-a906-4078-88c6-404a2b38aad4,227f9987-5057-4133-848a-2b22aca5dde1,90b35242-32db-4570-8070-cf9dd49322a5,c6863c8f-1914-4a2d-814b-6e5853134faf,e2d19b1a-fc07-441c-b110-ca816b46fc40,9a3d0c0b-7d84-416f-9f3e-21b32a11ba1d,d8163f6b-8c40-418c-9c06-b3743af24e4e,522d7247-a75b-4af9-acb2-52a99eeced89,4b56ea9d-413a-4e24-b44e-433f7603ad6d 
>>
>>
>> There are also the following lines on the master, which might have some
>> impact:
>>
>> E [MSGID: 108008] [afr-read-txn.c:89:afr_read_txn_refresh_done]
>> 0-media-replicate-0: Failing READ on gfid
>> abdc7d5e-9187-4916-ae83-a8b615e32a17: split-brain observed. 
>> [Input/output
>> error]
>>
>> E [MSGID: 108008] [afr-read-txn.c:89:afr_read_txn_refresh_done]
>> 0-media-replicate-0: Failing GETXATTR on gfid
>> abdc7d5e-9187-4916-ae83-a8b615e32a17: split-brain observed. 
>> [Input/output
>> error]
>>
>> E [mem-pool.c:417:mem_get0]
>> (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x809a2) 
>> [0x7f79e436b9a2]
>> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg+0x79f)
>> [0x7f79e430cb1f]
>> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get0+0x81)
>> [0x7f79e433e4a1] ) 0-mem-pool: invalid argument [Invalid argument]
>>
>> E [mem-pool.c:417:mem_get0]
>> (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(recursive_rmdir+0x192)
>> [0x7f79e4329b32]
>> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg+0x79f)
>> [0x7f79e430cb1f]
>> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get0+0x81)
>> [0x7f79e433e4a1] ) 0-mem-pool: invalid argument [Invalid argument]
>>
>> E [resource(/data/media):222:errlog] Popen: command "ssh
>> -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
>> /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S
>> /tmp/gsyncd-aux-ssh-dpY5cI/8216bb7da58a00926f369bb7ac8c7e03.sock
>> root at us-west-gluster.server.com 
>> /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd
>> --session-owner 6922055e-49a1-4afd-a3a0-a47960d6ba54 -N --listen 
>> --timeout
>> 120 gluster://localhost:media" returned with 143, saying:
>> E [resource(/data/media):226:logerr] Popen: ssh> [2015-11-18
>> 21:57:19.772896] I [cli.c:721:main] 0-cli: Started running
>> /usr/sbin/gluster with version 3.7.5
>> E [resource(/data/media):226:logerr] Popen: ssh> [2015-11-18
>> 21:57:19.772955] I [cli.c:608:cli_rpc_init] 0-cli: Connecting to remote
>> glusterd at localhost
>> E [resource(/data/media):226:logerr] Popen: ssh> [2015-11-18
>> 21:57:19.871930] I [MSGID: 101190]
>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with index 1
>> E [resource(/data/media):226:logerr] Popen: ssh> [2015-11-18
>> 21:57:19.872018] I [socket.c:2355:socket_event_handler] 0-transport:
>> disconnecting now
>> E [resource(/data/media):226:logerr] Popen: ssh> [2015-11-18
>> 21:57:19.872898] I [cli-rpc-ops.c:6348:gf_cli_getwd_cbk] 0-cli: Received
>> resp to getwd
>> E [resource(/data/media):226:logerr] Popen: ssh> [2015-11-18
>> 21:57:19.872963] I [input.c:36:cli_batch] 0-: Exiting with: 0
>>
>> Status detail shows the following:
>>
>> root at eu-gluster-1:/var/log/glusterfs/geo-replication/media# gluster 
>> volume
>> geo-replication media root at us-west-gluster.websitewebsitewebs.com::media
>> status detail
>>
>> MASTER NODE                            MASTER VOL    MASTER BRICK    
>> SLAVE
>> USER    SLAVE                                            SLAVE NODE
>>                         STATUS     CRAWL STATUS LAST_SYNCED
>>   ENTRY    DATA    META    FAILURES    CHECKPOINT TIME CHECKPOINT
>> COMPLETED    CHECKPOINT COMPLETION TIME
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
>>
>> eu-gluster-1.websitewebsitewebs.com    media /data/media     root
>>         us-west-gluster.websitewebsitewebs.com::media
>> us-west-gluster.websitewebsitewebs.com    Active     Changelog Crawl
>>   2015-11-24 20:59:25    0        0       0       633 N/A
>>       N/A                     N/A
>> eu-gluster-2.websitewebsitewebs.com    media /data/media     root
>>         us-west-gluster.websitewebsitewebs.com::media
>> us-west-gluster.websitewebsitewebs.com    Passive N/A                N/A
>>                     N/A      N/A     N/A     N/A         N/A
>>   N/A                     N/A
>>
>>
>>
>>
>> What is the right way to retry failed items?
>> Can I get a list of them somehow so that I could touch them in hopes 
>> to fix
>> this?
>> I wonder why does it not retry the items automatically?
>>
>>
>> On Tue, Nov 24, 2015 at 6:11 AM, Venky Shankar <vshankar at redhat.com> 
>> wrote:
>>
>>> On Tue, Nov 24, 2015 at 1:23 AM, Audrius Butkevicius
>>> <audrius.butkevicius at gmail.com> wrote:
>>>> Hi,
>>>>
>>>> I've got a geo-replicated gluster volume, with a few hundred thousand
>>>> images, which get generated on demand.
>>>>
>>>> I started getting replication failures in the status detail view, but
>>> it's
>>>> not obvious to me where to find the actual errors or how to 
>>>> actually fix
>>>> them.
>>> Chris here[1] mentioned about a bug in rsync (thanks!). Could that be
>>> the issue here?
>>>
>>> Mind checking rsync version used?
>>>
>>> [1]:
>>> http://www.gluster.org/pipermail/gluster-users/2015-November/024423.html 
>>>
>>>
>>>> The docs seem to be secretive about this as well. It seems if I 
>>>> tear the
>>>> geo-replication down, and do a force create from scratch, it goes 
>>>> back in
>>>> sync again, but as the files get generated, it starts getting failures
>>> again
>>>> at some point.
>>>>
>>>> Can someone provide me with information on how to check which files 
>>>> are
>>>> causing failures, and what are the actual failures? Or point me to the
>>>> relevant part in the docs?
>>>>
>>>> Version 3.7.5-ubuntu1~trusty1
>>>>
>>>> Related SO question:
>>>>
>>> http://stackoverflow.com/questions/33839056/gluster-geo-replication-debugging-failures 
>>>
>>>> Thanks,
>>>>
>>>> Audrius.
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151125/71d537ea/attachment.html>


More information about the Gluster-users mailing list