[Gluster-devel] query about why glustershd can not afr_selfheal_recreate_entry because of "afr: Prevent null gfids in self-heal entry re-creation"
Ravishankar N
ravishankar at redhat.com
Tue Jan 16 15:12:50 UTC 2018
On 01/16/2018 02:22 PM, Lian, George (NSB - CN/Hangzhou) wrote:
>
> Hi,
>
> Thanks a lots for your update.
>
> I would like try to introduce more detail for which the issue came from.
>
> This issue is came from a test case in our team, it is the step like
> the following:
>
> 1)Setup a glusterfs ENV with replicate 2 storage server nodes and 2
> client nodes
>
> 2)Generate a split-brain file , sn-0 is normal, sn-1 is dirty.
>
Hi , sorry I did not understand the test case. What type of split-brain
did you create? (data/metadata or gfid or file type mismatch)?
>
> 3)Delete the directory before heal begin (in this phase, the normal
> correct file in sn-0 is deleted by “rm” command , dirty file is still
> there )
>
Delete from the backend brick directly?
>
> 4)After that, the self-heal process will always be failure with the
> log which attached in last mail
>
Maybe you can write a script or a .t file (like the ones in
https://github.com/gluster/glusterfs/tree/master/tests/basic/afr) so
that your test can be understood unambiguously.
> Also attach some command output FYI.
>
> From my understand , the Glusterfs maybe can’t handle the split-brain
> file in this case, could you share your comments and confirm whether
> do some enhancement for this case or not?
>
If you create a split-brain in gluster, self-heal cannot heal it. You
need to resolve it using one of the methods listed in
https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#heal-info-and-split-brain-resolution
Thanks,
Ravi
> /_rm -rf /mnt/export/testdir rm: cannot remove
> '/mnt/export/testdir/test file': No data available_//__/
>
> /__/
>
> /__/
>
> /_[root at sn-1:/root]_/
>
> /_# ls -l /mnt/export/testdir/_/
>
> /_ls: cannot access '/mnt/export/testdir/IORFILE_82_2': No data
> available_/
>
> /_total 0_/
>
> /_-????????? ? ? ? ? ? test_file_/
>
> /__/
>
> /_[root at sn-1:/root]_/
>
> /_# getfattr -m . -d -e hex /mnt/bricks/export/brick/testdir/_/
>
> /_getfattr: Removing leading '/' from absolute path names_/
>
> /_# file: mnt/bricks/export/brick/testdir/_/
>
> /_trusted.afr.dirty=0x000000000000000000000001_/
>
> /_trusted.afr.export-client-0=0x000000000000000000000054_/
>
> /_trusted.gfid=0xb217d6af49024f189a69e0ccf5207572_/
>
> /_trusted.glusterfs.dht=0x000000010000000000000000ffffffff_/
>
> /__/
>
> /_[root at sn-0:/var/log/glusterfs]_/
>
> /_# getfattr -m . -d -e hex /mnt/bricks/export/brick/testdir/_/
>
> /_getfattr: Removing leading '/' from absolute path names_/
>
> /_# file: mnt/bricks/export/brick/testdir/_/
>
> /_trusted.gfid=0xb217d6af49024f189a69e0ccf5207572_/
>
> /_trusted.glusterfs.dht=0x000000010000000000000000ffffffff_/
>
> /__/
>
> Best Regards
>
> George
>
> *From:*gluster-devel-bounces at gluster.org
> [mailto:gluster-devel-bounces at gluster.org] *On Behalf Of *Ravishankar N
> *Sent:* Tuesday, January 16, 2018 1:44 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou)
> <cynthia.zhou at nokia-sbell.com>; Gluster Devel <gluster-devel at gluster.org>
> *Subject:* Re: [Gluster-devel] query about why glustershd can not
> afr_selfheal_recreate_entry because of "afr: Prevent null gfids in
> self-heal entry re-creation"
>
> + gluster-devel
>
> On 01/15/2018 01:41 PM, Zhou, Cynthia (NSB - CN/Hangzhou) wrote:
>
> Hi glusterfs expert,
>
> Good day,
>
> When I do some test about glusterfs self-heal I find
> following prints showing when dir/file type get error it cannot
> get self-healed.
>
> *Could you help to check if it is an expected behavior ? because I
> find the code change **https://review.gluster.org/#/c/17981/**add
> check for iatt->ia_type, so what if a file’s ia_type get
> corrupted ? in this case it should not get self-healed* ?
>
>
> Yes, without knowing the ia-type , afr_selfheal_recreate_entry ()
> cannot decide what type of FOP to do (mkdir/link/mknod ) to create the
> appropriate file on the sink. You would need to find out why the
> source brick is not returning valid ia_type. i.e. why
> replies[source].poststat is not valid.
> Thanks,
> Ravi
>
>
> Thanks!
>
> //////////////////heal info output////////////////////////////
>
> [root at sn-0:/home/robot]
>
> # gluster v heal export info
>
> Brick sn-0.local:/mnt/bricks/export/brick
>
> Status: Connected
>
> Number of entries: 0
>
> Brick sn-1.local:/mnt/bricks/export/brick
>
> /testdir - Is in split-brain
>
> Status: Connected
>
> Number of entries: 1
>
> //////////////////////////////////////////sn-1 glustershd
> log///////////////////////////////////////////////////
>
> [2018-01-15 03:53:40.011422] I [MSGID: 108026]
> [afr-self-heal-entry.c:887:afr_selfheal_entry_do]
> 0-export-replicate-0: performing entry selfheal on
> b217d6af-4902-4f18-9a69-e0ccf5207572
>
> [2018-01-15 03:53:40.013994] W [MSGID: 114031]
> [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-export-client-1:
> remote operation failed. Path: (null)
> (00000000-0000-0000-0000-000000000000) [No data available]
>
> [2018-01-15 03:53:40.014025] E [MSGID: 108037]
> [afr-self-heal-entry.c:92:afr_selfheal_recreate_entry]
> 0-export-replicate-0: Invalid ia_type (0) or
> gfid(00000000-0000-0000-0000-000000000000). source brick=1,
> pargfid=00000000-0000-0000-0000-000000000000, name=IORFILE_82_2
>
> //////////////////////////////////////gdb attached to sn-1
> glustershd/////////////////////////////////////////////
>
> root at sn-1:/var/log/glusterfs]
>
> # gdb attach 2191
>
> GNU gdb (GDB) 8.0.1
>
> Copyright (C) 2017 Free Software Foundation, Inc.
>
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
>
> This is free software: you are free to change and redistribute it.
>
> There is NO WARRANTY, to the extent permitted by law. Type "show
> copying"
>
> and "show warranty" for details.
>
> This GDB was configured as "x86_64-linux-gnu".
>
> Type "show configuration" for configuration details.
>
> For bug reporting instructions, please see:
>
> <http://www.gnu.org/software/gdb/bugs/>.
>
> Find the GDB manual and other documentation resources online at:
>
> <http://www.gnu.org/software/gdb/documentation/>.
>
> For help, type "help".
>
> Type "apropos word" to search for commands related to "word"...
>
> attach: No such file or directory.
>
> Attaching to process 2191
>
> [New LWP 2192]
>
> [New LWP 2193]
>
> [New LWP 2194]
>
> [New LWP 2195]
>
> [New LWP 2196]
>
> [New LWP 2197]
>
> [New LWP 2239]
>
> [New LWP 2241]
>
> [New LWP 2243]
>
> [New LWP 2245]
>
> [New LWP 2247]
>
> [Thread debugging using libthread_db enabled]
>
> Using host libthread_db library "/lib64/libthread_db.so.1".
>
> 0x00007f90aca037bd in __pthread_join (threadid=140259279345408,
> thread_return=0x0) at pthread_join.c:90
>
> 90 pthread_join.c: No such file or directory.
>
> (gdb) break afr_selfheal_recreate_entry
>
> Breakpoint 1 at 0x7f90a3b56dec: file afr-self-heal-entry.c, line 73.
>
> (gdb) c
>
> Continuing.
>
> [Switching to Thread 0x7f90a1b8e700 (LWP 2241)]
>
> Thread 9 "glustershdheal" hit Breakpoint 1,
> afr_selfheal_recreate_entry (frame=0x7f90980018d0, dst=0,
> source=1, sources=0x7f90a1b8ceb0 "", dir=0x7f9098011940,
> name=0x7f909c015d48 "IORFILE_82_2",
>
> inode=0x7f9098001bd0, replies=0x7f90a1b8c890) at
> afr-self-heal-entry.c:73
>
> 73 afr-self-heal-entry.c: No such file or directory.
>
> (gdb) n
>
> 74 in afr-self-heal-entry.c
>
> (gdb) n
>
> 75 in afr-self-heal-entry.c
>
> (gdb) n
>
> 76 in afr-self-heal-entry.c
>
> (gdb) n
>
> 77 in afr-self-heal-entry.c
>
> (gdb) n
>
> 78 in afr-self-heal-entry.c
>
> (gdb) n
>
> 79 in afr-self-heal-entry.c
>
> (gdb) n
>
> 80 in afr-self-heal-entry.c
>
> (gdb) n
>
> 81 in afr-self-heal-entry.c
>
> (gdb) n
>
> 82 in afr-self-heal-entry.c
>
> (gdb) n
>
> 83 in afr-self-heal-entry.c
>
> (gdb) n
>
> 85 in afr-self-heal-entry.c
>
> (gdb) n
>
> 86 in afr-self-heal-entry.c
>
> (gdb) n
>
> 87 in afr-self-heal-entry.c
>
> (gdb) print iatt->ia_type
>
> $1 = IA_INVAL
>
> (gdb) print gf_uuid_is_null(iatt->ia_gfid)
>
> $2 = 1
>
> (gdb) bt
>
> #0 afr_selfheal_recreate_entry (frame=0x7f90980018d0, dst=0,
> source=1, sources=0x7f90a1b8ceb0 "", dir=0x7f9098011940,
> name=0x7f909c015d48 "IORFILE_82_2", inode=0x7f9098001bd0,
> replies=0x7f90a1b8c890)
>
> at afr-self-heal-entry.c:87
>
> #1 0x00007f90a3b57d20 in __afr_selfheal_merge_dirent
> (frame=0x7f90980018d0, this=0x7f90a4024610, fd=0x7f9098413090,
> name=0x7f909c015d48 "IORFILE_82_2", inode=0x7f9098001bd0,
>
> sources=0x7f90a1b8ceb0 "", healed_sinks=0x7f90a1b8ce70
> "\001\001A\230\220\177", locked_on=0x7f90a1b8ce50
> "\001\001\270\241\220\177", replies=0x7f90a1b8c890) at
> afr-self-heal-entry.c:360
>
> #2 0x00007f90a3b57da5 in __afr_selfheal_entry_dirent
> (frame=0x7f90980018d0, this=0x7f90a4024610, fd=0x7f9098413090,
> name=0x7f909c015d48 "IORFILE_82_2", inode=0x7f9098001bd0, source=-1,
>
> sources=0x7f90a1b8ceb0 "", healed_sinks=0x7f90a1b8ce70
> "\001\001A\230\220\177", locked_on=0x7f90a1b8ce50
> "\001\001\270\241\220\177", replies=0x7f90a1b8c890) at
> afr-self-heal-entry.c:379
>
> #3 0x00007f90a3b5881f in afr_selfheal_entry_dirent
> (frame=0x7f90980018d0, this=0x7f90a4024610, fd=0x7f9098413090,
> name=0x7f909c015d48 "IORFILE_82_2", parent_idx_inode=0x0,
> subvol=0x7f90a4022240,
>
> full_crawl=_gf_true) at afr-self-heal-entry.c:610
>
> #4 0x00007f90a3b58da8 in afr_selfheal_entry_do_subvol
> (frame=0x7f90980110f0, this=0x7f90a4024610, fd=0x7f9098413090,
> child=1) at afr-self-heal-entry.c:742
>
> #5 0x00007f90a3b5953a in afr_selfheal_entry_do
> (frame=0x7f90980110f0, this=0x7f90a4024610, fd=0x7f9098413090,
> source=-1, sources=0x7f90a1b8d810 "",
>
> healed_sinks=0x7f90a1b8d7d0 "\001\001\270\241\220\177") at
> afr-self-heal-entry.c:908
>
> #6 0x00007f90a3b59b79 in __afr_selfheal_entry
> (frame=0x7f90980110f0, this=0x7f90a4024610, fd=0x7f9098413090,
> locked_on=0x7f90a1b8d920 "\001\001\\Z") at afr-self-heal-entry.c:1002
>
> #7 0x00007f90a3b5a051 in afr_selfheal_entry
> (frame=0x7f90980110f0, this=0x7f90a4024610, inode=0x7f9098011940)
> at afr-self-heal-entry.c:1112
>
> #8 0x00007f90a3b519b1 in afr_selfheal_do (frame=0x7f90980110f0,
> this=0x7f90a4024610, gfid=0x7f90a1b8db20
> "\262\027֯I\002O\030\232i\340\314\365 urP۸\241\220\177") at
> afr-self-heal-common.c:2459
>
> #9 0x00007f90a3b51aa7 in afr_selfheal (this=0x7f90a4024610,
> gfid=0x7f90a1b8db20 "\262\027֯I\002O\030\232i\340\314\365
> urP۸\241\220\177") at afr-self-heal-common.c:2500
>
> #10 0x00007f90a3b5cf1f in afr_shd_selfheal (healer=0x7f90a4033510,
> child=1, gfid=0x7f90a1b8db20 "\262\027֯I\002O\030\232i\340\314\365
> urP۸\241\220\177") at afr-self-heald.c:334
>
> #11 0x00007f90a3b5d2c8 in afr_shd_index_heal
> (subvol=0x7f90a4022240, entry=0x7f909c0169c0,
> parent=0x7f90a1b8dde0, data=0x7f90a4033510) at afr-self-heald.c:431
>
> #12 0x00007f90adc74654 in syncop_mt_dir_scan
> (frame=0x7f90a407b4f0, subvol=0x7f90a4022240, loc=0x7f90a1b8dde0,
> pid=-6, data=0x7f90a4033510, fn=0x7f90a3b5d17c <afr_shd_index_heal>,
>
> xdata=0x7f90a4002380, max_jobs=1, max_qlen=1024) at
> syncop-utils.c:407
>
> #13 0x00007f90a3b5d4e1 in afr_shd_index_sweep
> (healer=0x7f90a4033510, vgfid=0x7f90a3b84f38
> "glusterfs.xattrop_index_gfid") at afr-self-heald.c:481
>
> #14 0x00007f90a3b5d5bd in afr_shd_index_sweep_all
> (healer=0x7f90a4033510) at afr-self-heald.c:504
>
> #15 0x00007f90a3b5d894 in afr_shd_index_healer
> (data=0x7f90a4033510) at afr-self-heald.c:584
>
> #16 0x00007f90aca024a5 in start_thread (arg=0x7f90a1b8e700) at
> pthread_create.c:465
>
> #17 0x00007f90ac2e959f in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>
> (gdb)
>
> Best regards,
> *Cynthia **(周琳)*
>
> MBB SM HETRAN SW3 MATRIX
>
> Storage
> Mobile: +86 (0)18657188311
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180116/474d1870/attachment-0001.html>
More information about the Gluster-devel
mailing list