[Gluster-users] __Geo-replication status is getting Faulty after few seconds
Diego Zuccato
diego.zuccato at unibo.it
Thu Feb 8 13:37:09 UTC 2024
That '1' means there's no corresponding file in the regular file
structure (outside .glusterfs).
IIUC it shouldn't happen, but it does (quite often). *Probably* it's
safe to just delete it, but wait for advice from more competent users.
Diego
Il 08/02/2024 13:42, Anant Saraswat ha scritto:
> Hi Everyone,
>
> As I was getting "OSError: [Errno 107] Transport endpoint is not
> connected: '.gfid/d53fad8f-84e9-4b24-9eb0-ccbcbdc4baa8' " error in the
> primary master node gsyncd log, So I started searching this file details
> and I found this file in the brick, under the .glusterfs folder on
> master1 node.
>
> Path on master1 -
> /opt/tier1data2019/brick/.glusterfs/d5/3f/d53fad8f-84e9-4b24-9eb0-ccbcbdc4baa8
>
> [root at master1 ~]# ls -lrt /opt/tier1data2019/brick/.glusterfs/d5/3f/
>
> -rw-r--r-- 2 root root 15996 Dec 14 10:10
> d53feba6-dc8b-4645-a86c-befabd0e5069
> -rw-r--r-- 2 root root 343111 Dec 18 10:55
> d53fed32-b47a-48bf-889e-140c69b04479
> -rw-r--r-- 2 root root 5060531 Dec 29 15:29
> d53f184d-91e8-4bc1-b6e7-bb5f27ef8b41
> -rw-r--r-- 2 root root 2149782 Jan 12 13:25
> d53ffee5-fa66-4493-8bdf-f2093b3f6ce7
> -rw-r--r-- 2 root root 1913460 Jan 18 10:40
> d53f799b-0e87-4800-a3cd-fac9e1a30b54
> -rw-r--r-- 2 root root 62940 Jan 22 09:35
> d53fb9d4-8c64-4a83-b968-bbbfb9af4224
> -rw-r--r-- 1 root root 174592 Jan 22 15:06
> d53fad8f-84e9-4b24-9eb0-ccbcbdc4baa8
> -rw-r--r-- 2 root root 5633 Jan 26 08:36
> d53f6bf6-9aac-476c-b8c5-0569fc8d5116
> -rw-r--r-- 2 root root 801740 Feb 8 11:40
> d53f71f8-e88b-4ece-b66e-228c2b08d6c8
>
> Now I have noticed two things:
>
> First, this file is only present on the primary master node (master1)
> and doesn't exist on master2 and master3 nodes.
>
> Second, this file has different file attributes than other files in the
> folder. If you check the second column of the above output, every file
> has "2", but this file has "1".
>
> Now, can someone please guide me why this file has "1" and what I should
> do next? Is it safe to copy this file to the remaining two master nodes,
> or should I delete it from master1?
>
> Many thanks,
> Anant
>
> ------------------------------------------------------------------------
> *From:* Gluster-users <gluster-users-bounces at gluster.org> on behalf of
> Anant Saraswat <anant.saraswat at techblue.co.uk>
> *Sent:* 08 February 2024 12:01 AM
> *To:* Aravinda <aravinda at kadalu.tech>
> *Cc:* gluster-users at gluster.org <gluster-users at gluster.org>
> *Subject:* Re: [Gluster-users] __Geo-replication status is getting
> Faulty after few seconds
>
> *EXTERNAL: Do not click links or open attachments if you do not
> recognize the sender.*
>
> Hi @Aravinda <mailto:aravinda at kadalu.tech>,
>
> I have checked the rsync version, and it's the same on primary and
> secondary nodes. We have rsync version 3.1.3, protocol version 31, on
> all servers. It's very strange that we have not made any changes, that
> we are aware of, and this geo-replication was working fine for the last
> 5 years, and suddenly it has stopped, and we are unable to understand
> the root cause of it.
>
>
> I have checked the tcpdump and I can see that the master node is sending
> RST to the secondary node when geo-replication connects, but we are not
> seeing any RST when we do the ssh using the root user from master to
> secondary node ourselves, which makes me think that geo-replication is
> able to connect to the secondary node but after that, it's not liking
> something and tries to reset the connection, and this is repeating in a
> loop.
>
>
> I have also enabled geo-replication debug logs and I am getting this
> error in the master node gsyncd logs.
>
>
> [2024-02-07 22:37:36.820978] D [repce(worker
> /opt/tier1data2019/brick):195:push] RepceClient: call
> 2563661:140414778891136:1707345456.8209238 entry_ops([{'op': 'CREATE',
> 'skip_entry': False, 'gfid': '3d57e1e4-7bd2-44f6-a6d1-d628208b3697',
> 'entry':
> '.gfid/9a39167c-6c28-470a-b699-11eeaaff8edd/app_docmerge8795785720233840105.docx', 'mode': 33188, 'uid': 0, 'gid': 0}, {'op': 'UNLINK', 'skip_entry': False, 'gfid': '3d57e1e4-7bd2-44f6-a6d1-d628208b3697', 'entry': '.gfid/9a39167c-6c28-470a-b699-11eeaaff8edd/app_docmerge8795785720233840105.docx'}, {'op': 'CREATE', 'skip_entry': False, 'gfid': '7bd35f91-1408-476d-869a-9936f2d94afc', 'entry': '.gfid/9a39167c-6c28-470a-b699-11eeaaff8edd/0c3fb22f-0fbe-4445-845b-9d94d84a9888', 'mode': 33188, 'uid': 0, 'gid': 0}, {'op': 'CREATE', 'skip_entry': False, 'gfid': '3837018c-2f5e-43d4-ab58-0ed8b7456e73', 'entry': '.gfid/861afb81-386a-4b5b-af37-cef63a55a436/26fcd7e7-2c8c-4dcb-96f2-2c8a0d79f3d4', 'mode': 33188, 'uid': 0, 'gid': 0}, {'op': 'CREATE', 'skip_entry': False, 'gfid': 'db311b10-b1e2-4b84-adea-a6746214aeda', 'entry': '.gfid/861afb81-386a-4b5b-af37-cef63a55a436/0526d0da-1f36-4203-8563-7e23aacf6237', 'mode': 33188, 'uid': 0, 'gid': 0}, {'op': 'CREATE', 'skip_entry': False, 'gfid': '9bbb253a-226a-44b1-a968-7cfa76cf9463', 'entry': '.gfid/e861ff10-696a-4b03-9716-39d9e7dd08d7/app_docmergeLLRenewalLetterDocusign_1_22_15_1_18_153.doc', 'mode': 33188, 'uid': 0, 'gid': 0}, {'op': 'UNLINK', 'skip_entry': False, 'gfid': '9bbb253a-226a-44b1-a968-7cfa76cf9463', 'entry': '.gfid/e861ff10-696a-4b03-9716-39d9e7dd08d7/app_docmergeLLRenewalLetterDocusign_1_22_15_1_18_153.doc'}, {'op': 'CREATE', 'skip_entry': False, 'gfid': 'f62d0c65-6ede-48ff-b9bf-c44a33e5e023', 'entry': '.gfid/e861ff10-696a-4b03-9716-39d9e7dd08d7/85530794-c15f-44d4-8660-87a14c2c9c8c', 'mode': 33188, 'uid': 0, 'gid': 0}, {'op': 'CREATE', 'skip_entry': False, 'gfid': 'fd3d0af6-8ef5-4b76-bb47-0bc508df0ed0', 'entry': '.gfid/e861ff10-696a-4b03-9716-39d9e7dd08d7/app_docmergeMOA_1_22_15_1_20_501.doc', 'mode': 33188, 'uid': 0, 'gid': 0}, {'op': 'UNLINK', 'skip_entry': False, 'gfid': 'fd3d0af6-8ef5-4b76-bb47-0bc508df0ed0', 'entry': '.gfid/e861ff10-696a-4b03-9716-39d9e7dd08d7/app_docmergeMOA_1_22_15_1_20_501.doc'}, {'op': 'CREATE', 'skip_entry': False, 'gfid': 'e93c5771-9676-40d4-90cd-f0586ec05dd9', 'entry': '.gfid/e861ff10-696a-4b03-9716-39d9e7dd08d7/cc372667-3b77-468f-bac6-671d4eb069e9', 'mode': 33188, 'uid': 0, 'gid': 0}, {'op': 'CREATE', 'skip_entry': False, 'gfid': '02045f44-68ff-4a35-a843-08939afc46a4', 'entry': '.gfid/e861ff10-696a-4b03-9716-39d9e7dd08d7/app_docmergeTTRenewalLetterASTNoFee-2022_1_22_15_1_19_530.doc', 'mode': 33188, 'uid': 0, 'gid': 0}, {'op': 'UNLINK', 'skip_entry': False, 'gfid': '02045f44-68ff-4a35-a843-08939afc46a4', 'entry': '.gfid/e861ff10-696a-4b03-9716-39d9e7dd08d7/app_docmergeTTRenewalLetterASTNoFee-2022_1_22_15_1_19_530.doc'}, {'op': 'CREATE', 'skip_entry': False, 'gfid': '6f5766c9-2dc3-4636-9041-9cf4ac64d26b', 'entry': '.gfid/e861ff10-696a-4b03-9716-39d9e7dd08d7/556a0e3c-510d-4396-8f32-335aafec1314', 'mode': 33188, 'uid': 0, 'gid': 0}, {'op': 'UNLINK', 'skip_entry': False, 'gfid': 'f78561f0-c9f2-4192-a82a-8368e0ad8b2b', 'entry': '.gfid/ec161c2e-bb32-4639-a7b2-9be961221d86/app_1705935977525.tmp'}, {'op': 'CREATE', 'skip_entry': False, 'gfid': 'd1e33edb-523e-41c1-a021-8bd3a5a2c7c0', 'entry': '.gfid/e861ff10-696a-4b03-9716-39d9e7dd08d7/c655e3e5-9d4c-43d7-9171-949f01612e6d', 'mode': 33188, 'uid': 0, 'gid': 0}, {'op': 'CREATE', 'skip_entry': False, 'gfid': 'b6f44b28-c2bf-4e70-b953-1c559ded7835', 'entry': '.gfid/9a39167c-6c28-470a-b699-11eeaaff8edd/app_docmerge7370453767656401681.docx', 'mode': 33188, 'uid': 0, 'gid': 0}, {'op': 'UNLINK', 'skip_entry': False, 'gfid': 'b6f44b28-c2bf-4e70-b953-1c559ded7835', 'entry': '.gfid/9a39167c-6c28-470a-b699-11eeaaff8edd/app_docmerge7370453767656401681.docx'}, {'op': 'CREATE', 'skip_entry': False, 'gfid': '2d845d9e-7a49-4200-a100-759fe831ba0e', 'entry': '.gfid/9a39167c-6c28-470a-b699-11eeaaff8edd/84d47d84-5749-4a19-8f73-293078d17c63', 'mode': 33188, 'uid': 0, 'gid': 0}, {'op': 'CREATE', 'skip_entry': False, 'gfid': '44554c17-21aa-427a-b796-7ecec6af2570', 'entry': '.gfid/9a39167c-6c28-470a-b699-11eeaaff8edd/app_docmerge8634804987715893755.docx', 'mode': 33188, 'uid': 0, 'gid': 0}, {'op': 'CREATE', 'skip_entry': False, 'gfid': '652bf5d7-3b7a-41d8-aa4f-e52296034821', 'entry': '.gfid/9a39167c-6c28-470a-b699-11eeaaff8edd/91a25682-69ea-4edc-9250-d6c7aac56853', 'mode': 33188, 'uid': 0, 'gid': 0}, {'op': 'UNLINK', 'skip_entry': False, 'gfid': '44554c17-21aa-427a-b796-7ecec6af2570', 'entry': '.gfid/9a39167c-6c28-470a-b699-11eeaaff8edd/app_docmerge8634804987715893755.docx'}, {'op': 'CREATE', 'skip_entry': False, 'gfid': '04720811-b90e-42b7-a5d1-656afd92e245', 'entry': '.gfid/9a39167c-6c28-470a-b699-11eeaaff8edd/a66cbc42-61dc-4896-bb69-c715f1a820db', 'mode': 33188, 'uid': 0, 'gid': 0}],) ...
>
> [2024-02-07 22:37:36.909606] D [repce(worker
> /opt/tier1data2019/brick):215:__call__] RepceClient: call
> 2563661:140414778891136:1707345456.8209238 entry_ops -> []
> [2024-02-07 22:37:36.911032] D [master(worker
> /opt/tier1data2019/brick):317:a_syncdata] _GMaster: files
> [{files={'.gfid/652bf5d7-3b7a-41d8-aa4f-e52296034821',
> '.gfid/2d845d9e-7a49-4200-a100-759fe831ba0e',
> '.gfid/3837018c-2f5e-43d4-ab58-0ed8b7456e73',
> '.gfid/e93c5771-9676-40d4-90cd-f0586ec05dd9',
> '.gfid/f62d0c65-6ede-48ff-b9bf-c44a33e5e023',
> '.gfid/7bd35f91-1408-476d-869a-9936f2d94afc',
> '.gfid/04720811-b90e-42b7-a5d1-656afd92e245',
> '.gfid/6f5766c9-2dc3-4636-9041-9cf4ac64d26b',
> '.gfid/db311b10-b1e2-4b84-adea-a6746214aeda',
> '.gfid/d1e33edb-523e-41c1-a021-8bd3a5a2c7c0'}}]
> [2024-02-07 22:37:36.911089] D [master(worker
> /opt/tier1data2019/brick):320:a_syncdata] _GMaster: candidate for
> syncing [{file=.gfid/652bf5d7-3b7a-41d8-aa4f-e52296034821}]
> [2024-02-07 22:37:36.911133] D [master(worker
> /opt/tier1data2019/brick):320:a_syncdata] _GMaster: candidate for
> syncing [{file=.gfid/2d845d9e-7a49-4200-a100-759fe831ba0e}]
> [2024-02-07 22:37:36.911169] D [master(worker
> /opt/tier1data2019/brick):320:a_syncdata] _GMaster: candidate for
> syncing [{file=.gfid/3837018c-2f5e-43d4-ab58-0ed8b7456e73}]
> [2024-02-07 22:37:36.911202] D [master(worker
> /opt/tier1data2019/brick):320:a_syncdata] _GMaster: candidate for
> syncing [{file=.gfid/e93c5771-9676-40d4-90cd-f0586ec05dd9}]
> [2024-02-07 22:37:36.911235] D [master(worker
> /opt/tier1data2019/brick):320:a_syncdata] _GMaster: candidate for
> syncing [{file=.gfid/f62d0c65-6ede-48ff-b9bf-c44a33e5e023}]
> [2024-02-07 22:37:36.911268] D [master(worker
> /opt/tier1data2019/brick):320:a_syncdata] _GMaster: candidate for
> syncing [{file=.gfid/7bd35f91-1408-476d-869a-9936f2d94afc}]
> [2024-02-07 22:37:36.911301] D [master(worker
> /opt/tier1data2019/brick):320:a_syncdata] _GMaster: candidate for
> syncing [{file=.gfid/04720811-b90e-42b7-a5d1-656afd92e245}]
> [2024-02-07 22:37:36.911333] D [master(worker
> /opt/tier1data2019/brick):320:a_syncdata] _GMaster: candidate for
> syncing [{file=.gfid/6f5766c9-2dc3-4636-9041-9cf4ac64d26b}]
> [2024-02-07 22:37:36.911366] D [master(worker
> /opt/tier1data2019/brick):320:a_syncdata] _GMaster: candidate for
> syncing [{file=.gfid/db311b10-b1e2-4b84-adea-a6746214aeda}]
> [2024-02-07 22:37:36.911398] D [master(worker
> /opt/tier1data2019/brick):320:a_syncdata] _GMaster: candidate for
> syncing [{file=.gfid/d1e33edb-523e-41c1-a021-8bd3a5a2c7c0}]
> [2024-02-07 22:37:36.911439] D [master(worker
> /opt/tier1data2019/brick):1344:process] _GMaster: processing change
> [{changelog=/var/lib/misc/gluster/gsyncd/tier1data_drtier1data_drtier1data/opt-tier1data2019-brick/.history/.processing/CHANGELOG.1705936007}]
> [2024-02-07 22:37:36.915193] E [syncdutils(worker
> /opt/tier1data2019/brick):346:log_raise_exception] <top>: Gluster Mount
> process exited [{error=ENOTCONN}]
> [2024-02-07 22:37:36.915252] E [syncdutils(worker
> /opt/tier1data2019/brick):363:log_raise_exception] <top>: FULL EXCEPTION
> TRACE:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 317,
> in main
> func(args)
> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 86,
> in subcmd_worker
> local.service_loop(remote)
> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
> 1298, in service_loop
> g3.crawlwrap(oneshot=True)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 604,
> in crawlwrap
> self.crawl()
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1614,
> in crawl
> self.changelogs_batch_process(changes)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1510,
> in changelogs_batch_process
> self.process(batch)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1345,
> in process
> self.process_change(change, done, retry)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1071,
> in process_change
> st = lstat(pt)
> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
> 589, in lstat
> return errno_wrap(os.lstat, [e], [ENOENT], [ESTALE, EBUSY])
> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
> 571, in errno_wrap
> return call(*arg)
> OSError: [Errno 107] Transport endpoint is not connected:
> '.gfid/d53fad8f-84e9-4b24-9eb0-ccbcbdc4baa8'
> [2024-02-07 22:37:37.344426] I [monitor(monitor):228:monitor] Monitor:
> worker died in startup phase [{brick=/opt/tier1data2019/brick}]
> [2024-02-07 22:37:37.346601] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
> Status Change [{status=Faulty}]
>
>
> Thanks,
> Anant
>
> ------------------------------------------------------------------------
> *From:* Aravinda <aravinda at kadalu.tech>
> *Sent:* 07 February 2024 2:54 PM
> *To:* Anant Saraswat <anant.saraswat at techblue.co.uk>
> *Cc:* Strahil Nikolov <hunter86_bg at yahoo.com>; gluster-users at gluster.org
> <gluster-users at gluster.org>
> *Subject:* Re: [Gluster-users] __Geo-replication status is getting
> Faulty after few seconds
>
> *EXTERNAL: Do not click links or open attachments if you do not
> recognize the sender.*
>
> It will keep track of last sync time if you change to non-root user. But
> I don't think the issue is related to root vs non-root user.
>
> Even in non-root user based Geo-rep, Primary volume is mounted using
> root user only. Only in the secondary node, it will use Glusterd
> mountbroker to allow mounting the Secondary volume as non-priviliaged user.
>
> Check the rsync version in Primary and secondary nodes. Please fix the
> versions if not matching.
>
> --
> Aravinda
> Kadalu Technologies
>
>
>
> ---- On Wed, 07 Feb 2024 20:11:47 +0530 *Anant Saraswat
> <anant.saraswat at techblue.co.uk>* wrote ---
>
> No, It was setup and running using the root user only.
>
> Do you think I should setup using a dedicated non-root user? will it
> keep the track of old files or will it consider it as a new
> geo-replication and copy all the files from the scratch?
>
> ------------------------------------------------------------------------
> *From:* Strahil Nikolov <hunter86_bg at yahoo.com
> <mailto:hunter86_bg at yahoo.com>>
> *Sent:* 07 February 2024 2:36 PM
> *To:* Anant Saraswat <anant.saraswat at techblue.co.uk
> <mailto:anant.saraswat at techblue.co.uk>>; Aravinda <aravinda at kadalu.tech
> <mailto:aravinda at kadalu.tech>>
> *Cc:* gluster-users at gluster.org
> <mailto:gluster-users at gluster.org> <gluster-users at gluster.org
> <mailto:gluster-users at gluster.org>>
> *Subject:* Re: [Gluster-users] __Geo-replication status is getting
> Faulty after few seconds
>
> *EXTERNAL: Do not click links or open attachments if you do not
> recognize the sender.*
>
> Have you tried setting up gluster georep with a dedicated non-root user ?
>
> Best Regards,
> Strahil Nikolov
>
> On Tue, Feb 6, 2024 at 16:38, Anant Saraswat
> <anant.saraswat at techblue.co.uk
> <mailto:anant.saraswat at techblue.co.uk>> wrote:
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> <https://urldefense.com/v3/__https://meet.google.com/cpu-eiue-hvk__;!!I_DbfM1H!Dm8_fHcUmz5wnOfTdrkMSb6PXqGdC_3VpklsIdfjPuKgee_Ds7JD__1KjwR4F62a67f5292of5PyQVk9y3-TRe_00eSiJw$>
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users
> <https://urldefense.com/v3/__https://lists.gluster.org/mailman/listinfo/gluster-users__;!!I_DbfM1H!Dm8_fHcUmz5wnOfTdrkMSb6PXqGdC_3VpklsIdfjPuKgee_Ds7JD__1KjwR4F62a67f5292of5PyQVk9y3-TRe-GwoljEQ$>
>
>
> DISCLAIMER: This email and any files transmitted with it are
> confidential and intended solely for the use of the individual or entity
> to whom they are addressed. If you have received this email in error,
> please notify the sender. This message contains confidential information
> and is intended only for the individual named. If you are not the named
> addressee, you should not disseminate, distribute or copy this email.
> Please notify the sender immediately by email if you have received this
> email by mistake and delete this email from your system.
>
> If you are not the intended recipient, you are notified that disclosing,
> copying, distributing or taking any action in reliance on the contents
> of this information is strictly prohibited. Thanks for your cooperation.
>
>
>
> DISCLAIMER: This email and any files transmitted with it are
> confidential and intended solely for the use of the individual or entity
> to whom they are addressed. If you have received this email in error,
> please notify the sender. This message contains confidential information
> and is intended only for the individual named. If you are not the named
> addressee, you should not disseminate, distribute or copy this email.
> Please notify the sender immediately by email if you have received this
> email by mistake and delete this email from your system.
>
> If you are not the intended recipient, you are notified that disclosing,
> copying, distributing or taking any action in reliance on the contents
> of this information is strictly prohibited. Thanks for your cooperation.
>
> DISCLAIMER: This email and any files transmitted with it are
> confidential and intended solely for the use of the individual or entity
> to whom they are addressed. If you have received this email in error,
> please notify the sender. This message contains confidential information
> and is intended only for the individual named. If you are not the named
> addressee, you should not disseminate, distribute or copy this email.
> Please notify the sender immediately by email if you have received this
> email by mistake and delete this email from your system.
>
> If you are not the intended recipient, you are notified that disclosing,
> copying, distributing or taking any action in reliance on the contents
> of this information is strictly prohibited. Thanks for your cooperation.
>
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
More information about the Gluster-users
mailing list