[Gluster-devel] query about why glustershd can not afr_selfheal_recreate_entry because of "afr: Prevent null gfids in self-heal entry re-creation"

Ravishankar N ravishankar at redhat.com
Tue Jan 16 15:12:50 UTC 2018



On 01/16/2018 02:22 PM, Lian, George (NSB - CN/Hangzhou) wrote:
>
> Hi,
>
> Thanks a lots for your update.
>
> I would like try to introduce more detail for which the issue came from.
>
> This issue is came from a test case in our team, it is the step like 
> the following:
>
> 1)Setup a glusterfs ENV with replicate 2 storage server nodes and 2 
> client nodes
>
> 2)Generate a split-brain file , sn-0 is normal, sn-1 is dirty.
>
Hi , sorry I did not understand the test case. What type of split-brain 
did you create? (data/metadata or gfid or file type mismatch)?
>
> 3)Delete the directory before heal begin  (in this phase, the normal 
> correct file in sn-0 is deleted by “rm” command , dirty file is still 
> there )
>
Delete from the backend brick directly?
>
> 4)After that, the self-heal process will always be failure with the 
> log which attached in last mail
>
Maybe you can write a script or a .t file (like the ones in 
https://github.com/gluster/glusterfs/tree/master/tests/basic/afr) so 
that your test can be understood unambiguously.

> Also attach some command output FYI.
>
> From my understand , the Glusterfs maybe can’t handle the split-brain 
> file in this case, could you share your comments and confirm whether 
> do some enhancement for this case or not?
>
If you create a split-brain in gluster, self-heal cannot heal it. You 
need to resolve it using one of the methods listed in 
https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#heal-info-and-split-brain-resolution

Thanks,
Ravi

> /_rm -rf /mnt/export/testdir rm: cannot remove 
> '/mnt/export/testdir/test file': No data available_//__/
>
> /__/
>
> /__/
>
> /_[root at sn-1:/root]_/
>
> /_# ls -l /mnt/export/testdir/_/
>
> /_ls: cannot access '/mnt/export/testdir/IORFILE_82_2': No data 
> available_/
>
> /_total 0_/
>
> /_-????????? ? ? ? ?            ? test_file_/
>
> /__/
>
> /_[root at sn-1:/root]_/
>
> /_# getfattr -m . -d -e hex /mnt/bricks/export/brick/testdir/_/
>
> /_getfattr: Removing leading '/' from absolute path names_/
>
> /_# file: mnt/bricks/export/brick/testdir/_/
>
> /_trusted.afr.dirty=0x000000000000000000000001_/
>
> /_trusted.afr.export-client-0=0x000000000000000000000054_/
>
> /_trusted.gfid=0xb217d6af49024f189a69e0ccf5207572_/
>
> /_trusted.glusterfs.dht=0x000000010000000000000000ffffffff_/
>
> /__/
>
> /_[root at sn-0:/var/log/glusterfs]_/
>
> /_#  getfattr -m . -d -e hex /mnt/bricks/export/brick/testdir/_/
>
> /_getfattr: Removing leading '/' from absolute path names_/
>
> /_# file: mnt/bricks/export/brick/testdir/_/
>
> /_trusted.gfid=0xb217d6af49024f189a69e0ccf5207572_/
>
> /_trusted.glusterfs.dht=0x000000010000000000000000ffffffff_/
>
> /__/
>
> Best Regards
>
> George
>
> *From:*gluster-devel-bounces at gluster.org 
> [mailto:gluster-devel-bounces at gluster.org] *On Behalf Of *Ravishankar N
> *Sent:* Tuesday, January 16, 2018 1:44 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) 
> <cynthia.zhou at nokia-sbell.com>; Gluster Devel <gluster-devel at gluster.org>
> *Subject:* Re: [Gluster-devel] query about why glustershd can not 
> afr_selfheal_recreate_entry because of "afr: Prevent null gfids in 
> self-heal entry re-creation"
>
> + gluster-devel
>
> On 01/15/2018 01:41 PM, Zhou, Cynthia (NSB - CN/Hangzhou) wrote:
>
>     Hi glusterfs expert,
>
>             Good day,
>
>             When I do some test about glusterfs self-heal I find
>     following prints showing when dir/file type get error it cannot
>     get self-healed.
>
>     *Could you help to check if it is an expected behavior ? because I
>     find the code change **https://review.gluster.org/#/c/17981/**add
>     check for iatt->ia_type,  so what if a file’s ia_type get
>     corrupted ? in this case it should not get self-healed* ?
>
>
> Yes, without knowing the ia-type , afr_selfheal_recreate_entry () 
> cannot decide what type of FOP to do (mkdir/link/mknod ) to create the 
> appropriate file on the sink. You would need to find out why the 
> source brick is not returning valid ia_type. i.e. why 
> replies[source].poststat is not valid.
> Thanks,
> Ravi
>
>
>     Thanks!
>
>     //////////////////heal info output////////////////////////////
>
>     [root at sn-0:/home/robot]
>
>     # gluster v heal export info
>
>     Brick sn-0.local:/mnt/bricks/export/brick
>
>     Status: Connected
>
>     Number of entries: 0
>
>     Brick sn-1.local:/mnt/bricks/export/brick
>
>     /testdir - Is in split-brain
>
>     Status: Connected
>
>     Number of entries: 1
>
>     //////////////////////////////////////////sn-1 glustershd
>     log///////////////////////////////////////////////////
>
>     [2018-01-15 03:53:40.011422] I [MSGID: 108026]
>     [afr-self-heal-entry.c:887:afr_selfheal_entry_do]
>     0-export-replicate-0: performing entry selfheal on
>     b217d6af-4902-4f18-9a69-e0ccf5207572
>
>     [2018-01-15 03:53:40.013994] W [MSGID: 114031]
>     [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-export-client-1:
>     remote operation failed. Path: (null)
>     (00000000-0000-0000-0000-000000000000) [No data available]
>
>     [2018-01-15 03:53:40.014025] E [MSGID: 108037]
>     [afr-self-heal-entry.c:92:afr_selfheal_recreate_entry]
>     0-export-replicate-0: Invalid ia_type (0) or
>     gfid(00000000-0000-0000-0000-000000000000). source brick=1,
>     pargfid=00000000-0000-0000-0000-000000000000, name=IORFILE_82_2
>
>     //////////////////////////////////////gdb attached to sn-1
>     glustershd/////////////////////////////////////////////
>
>     root at sn-1:/var/log/glusterfs]
>
>     # gdb attach 2191
>
>     GNU gdb (GDB) 8.0.1
>
>     Copyright (C) 2017 Free Software Foundation, Inc.
>
>     License GPLv3+: GNU GPL version 3 or later
>     <http://gnu.org/licenses/gpl.html>
>
>     This is free software: you are free to change and redistribute it.
>
>     There is NO WARRANTY, to the extent permitted by law.  Type "show
>     copying"
>
>     and "show warranty" for details.
>
>     This GDB was configured as "x86_64-linux-gnu".
>
>     Type "show configuration" for configuration details.
>
>     For bug reporting instructions, please see:
>
>     <http://www.gnu.org/software/gdb/bugs/>.
>
>     Find the GDB manual and other documentation resources online at:
>
>     <http://www.gnu.org/software/gdb/documentation/>.
>
>     For help, type "help".
>
>     Type "apropos word" to search for commands related to "word"...
>
>     attach: No such file or directory.
>
>     Attaching to process 2191
>
>     [New LWP 2192]
>
>     [New LWP 2193]
>
>     [New LWP 2194]
>
>     [New LWP 2195]
>
>     [New LWP 2196]
>
>     [New LWP 2197]
>
>     [New LWP 2239]
>
>     [New LWP 2241]
>
>     [New LWP 2243]
>
>     [New LWP 2245]
>
>     [New LWP 2247]
>
>     [Thread debugging using libthread_db enabled]
>
>     Using host libthread_db library "/lib64/libthread_db.so.1".
>
>     0x00007f90aca037bd in __pthread_join (threadid=140259279345408,
>     thread_return=0x0) at pthread_join.c:90
>
>     90      pthread_join.c: No such file or directory.
>
>     (gdb) break afr_selfheal_recreate_entry
>
>     Breakpoint 1 at 0x7f90a3b56dec: file afr-self-heal-entry.c, line 73.
>
>     (gdb) c
>
>     Continuing.
>
>     [Switching to Thread 0x7f90a1b8e700 (LWP 2241)]
>
>     Thread 9 "glustershdheal" hit Breakpoint 1,
>     afr_selfheal_recreate_entry (frame=0x7f90980018d0, dst=0,
>     source=1, sources=0x7f90a1b8ceb0 "", dir=0x7f9098011940,
>     name=0x7f909c015d48 "IORFILE_82_2",
>
>         inode=0x7f9098001bd0, replies=0x7f90a1b8c890) at
>     afr-self-heal-entry.c:73
>
>     73      afr-self-heal-entry.c: No such file or directory.
>
>     (gdb) n
>
>     74      in afr-self-heal-entry.c
>
>     (gdb) n
>
>     75      in afr-self-heal-entry.c
>
>     (gdb) n
>
>     76      in afr-self-heal-entry.c
>
>     (gdb) n
>
>     77      in afr-self-heal-entry.c
>
>     (gdb) n
>
>     78      in afr-self-heal-entry.c
>
>     (gdb) n
>
>     79      in afr-self-heal-entry.c
>
>     (gdb) n
>
>     80      in afr-self-heal-entry.c
>
>     (gdb) n
>
>     81      in afr-self-heal-entry.c
>
>     (gdb) n
>
>     82      in afr-self-heal-entry.c
>
>     (gdb) n
>
>     83      in afr-self-heal-entry.c
>
>     (gdb) n
>
>     85      in afr-self-heal-entry.c
>
>     (gdb) n
>
>     86      in afr-self-heal-entry.c
>
>     (gdb) n
>
>     87      in afr-self-heal-entry.c
>
>     (gdb) print iatt->ia_type
>
>     $1 = IA_INVAL
>
>     (gdb) print gf_uuid_is_null(iatt->ia_gfid)
>
>     $2 = 1
>
>     (gdb) bt
>
>     #0  afr_selfheal_recreate_entry (frame=0x7f90980018d0, dst=0,
>     source=1, sources=0x7f90a1b8ceb0 "", dir=0x7f9098011940,
>     name=0x7f909c015d48 "IORFILE_82_2", inode=0x7f9098001bd0,
>     replies=0x7f90a1b8c890)
>
>         at afr-self-heal-entry.c:87
>
>     #1  0x00007f90a3b57d20 in __afr_selfheal_merge_dirent
>     (frame=0x7f90980018d0, this=0x7f90a4024610, fd=0x7f9098413090,
>     name=0x7f909c015d48 "IORFILE_82_2", inode=0x7f9098001bd0,
>
>         sources=0x7f90a1b8ceb0 "", healed_sinks=0x7f90a1b8ce70
>     "\001\001A\230\220\177", locked_on=0x7f90a1b8ce50
>     "\001\001\270\241\220\177", replies=0x7f90a1b8c890) at
>     afr-self-heal-entry.c:360
>
>     #2  0x00007f90a3b57da5 in __afr_selfheal_entry_dirent
>     (frame=0x7f90980018d0, this=0x7f90a4024610, fd=0x7f9098413090,
>     name=0x7f909c015d48 "IORFILE_82_2", inode=0x7f9098001bd0, source=-1,
>
>         sources=0x7f90a1b8ceb0 "", healed_sinks=0x7f90a1b8ce70
>     "\001\001A\230\220\177", locked_on=0x7f90a1b8ce50
>     "\001\001\270\241\220\177", replies=0x7f90a1b8c890) at
>     afr-self-heal-entry.c:379
>
>     #3  0x00007f90a3b5881f in afr_selfheal_entry_dirent
>     (frame=0x7f90980018d0, this=0x7f90a4024610, fd=0x7f9098413090,
>     name=0x7f909c015d48 "IORFILE_82_2", parent_idx_inode=0x0,
>     subvol=0x7f90a4022240,
>
>         full_crawl=_gf_true) at afr-self-heal-entry.c:610
>
>     #4  0x00007f90a3b58da8 in afr_selfheal_entry_do_subvol
>     (frame=0x7f90980110f0, this=0x7f90a4024610, fd=0x7f9098413090,
>     child=1) at afr-self-heal-entry.c:742
>
>     #5  0x00007f90a3b5953a in afr_selfheal_entry_do
>     (frame=0x7f90980110f0, this=0x7f90a4024610, fd=0x7f9098413090,
>     source=-1, sources=0x7f90a1b8d810 "",
>
>         healed_sinks=0x7f90a1b8d7d0 "\001\001\270\241\220\177") at
>     afr-self-heal-entry.c:908
>
>     #6  0x00007f90a3b59b79 in __afr_selfheal_entry
>     (frame=0x7f90980110f0, this=0x7f90a4024610, fd=0x7f9098413090,
>     locked_on=0x7f90a1b8d920 "\001\001\\Z") at afr-self-heal-entry.c:1002
>
>     #7  0x00007f90a3b5a051 in afr_selfheal_entry
>     (frame=0x7f90980110f0, this=0x7f90a4024610, inode=0x7f9098011940)
>     at afr-self-heal-entry.c:1112
>
>     #8  0x00007f90a3b519b1 in afr_selfheal_do (frame=0x7f90980110f0,
>     this=0x7f90a4024610, gfid=0x7f90a1b8db20
>     "\262\027֯I\002O\030\232i\340\314\365 urP۸\241\220\177") at
>     afr-self-heal-common.c:2459
>
>     #9  0x00007f90a3b51aa7 in afr_selfheal (this=0x7f90a4024610,
>     gfid=0x7f90a1b8db20 "\262\027֯I\002O\030\232i\340\314\365
>     urP۸\241\220\177") at afr-self-heal-common.c:2500
>
>     #10 0x00007f90a3b5cf1f in afr_shd_selfheal (healer=0x7f90a4033510,
>     child=1, gfid=0x7f90a1b8db20 "\262\027֯I\002O\030\232i\340\314\365
>     urP۸\241\220\177") at afr-self-heald.c:334
>
>     #11 0x00007f90a3b5d2c8 in afr_shd_index_heal
>     (subvol=0x7f90a4022240, entry=0x7f909c0169c0,
>     parent=0x7f90a1b8dde0, data=0x7f90a4033510) at afr-self-heald.c:431
>
>     #12 0x00007f90adc74654 in syncop_mt_dir_scan
>     (frame=0x7f90a407b4f0, subvol=0x7f90a4022240, loc=0x7f90a1b8dde0,
>     pid=-6, data=0x7f90a4033510, fn=0x7f90a3b5d17c <afr_shd_index_heal>,
>
>         xdata=0x7f90a4002380, max_jobs=1, max_qlen=1024) at
>     syncop-utils.c:407
>
>     #13 0x00007f90a3b5d4e1 in afr_shd_index_sweep
>     (healer=0x7f90a4033510, vgfid=0x7f90a3b84f38
>     "glusterfs.xattrop_index_gfid") at afr-self-heald.c:481
>
>     #14 0x00007f90a3b5d5bd in afr_shd_index_sweep_all
>     (healer=0x7f90a4033510) at afr-self-heald.c:504
>
>     #15 0x00007f90a3b5d894 in afr_shd_index_healer
>     (data=0x7f90a4033510) at afr-self-heald.c:584
>
>     #16 0x00007f90aca024a5 in start_thread (arg=0x7f90a1b8e700) at
>     pthread_create.c:465
>
>     #17 0x00007f90ac2e959f in clone () at
>     ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>
>     (gdb)
>
>     Best regards,
>     *Cynthia **(周琳)*
>
>     MBB SM HETRAN SW3 MATRIX
>
>     Storage
>     Mobile: +86 (0)18657188311
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180116/474d1870/attachment-0001.html>


More information about the Gluster-devel mailing list