[Gluster-devel] remaining entry in gluster volume heal info command even after reboot

Pranith Kumar Karampuri pkarampu at redhat.com
Wed Sep 5 09:32:00 UTC 2018


Looks like the test case is a bit involved and also has modifications
directly on the brick. Could you let us know if there is any reason to
touch the brick directly?

On Wed, Sep 5, 2018 at 2:53 PM Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.zhou at nokia-sbell.com> wrote:

> I will try to reproduce (reboot + ftest)and tell you later, but in
> following steps you can also simulate this issue locally, at least the
> remaining entry happened and the entry heal quit also because of all thee
> empty heald_sinks.(not sure if this is exactly the same from reboot+ftest
> reproduced one but I guess it should be the same)
>
>
>
>
>
> 1>   Stop client quorum by command “gluster v set <volname>
> cluster.quorum-type none”
>
> 2>   Isolate sn-0 from sn-1 and sn-2
>
> iptables -I OUTPUT -d sn-1.local -j DROP
>
> iptables -I OUTPUT -d sn-2.local -j DROP
>
> iptables -I INPUT -s sn-2.local -j DROP
>
> iptables -I INPUT -s sn-1.local -j DROP
>
> 3>   Touch /mnt/export/testdir/common.txt on sn-0
>
> 4>   Touch /mnt/export/testidir/common.txt on sn-1
>
> 5>   On sn-1 node,Delete all  /mnt/bricks/export/brick/testdir/common.txt
> metadata until getfattr returns empty,
>
> setfattr -x  trusted.afr.dirty
> /mnt/bricks/export/brick/testdir/common.txt
>
> setfattr -x  trusted.afr.export-client-0
> /mnt/bricks/export/brick/testdir/common.txt
>
> setfattr -x  trusted.gfid  /mnt/bricks/export/brick/testdir/common.txt
>
> setfattr -x  trusted.gfid2path.53be37be7f01389d
> /mnt/bricks/export/brick/testdir/common.txt
>
> then getfattr returns empty
>
> [root at sn-1:/home/robot]
>
> # getfattr -m . -d -e hex  /mnt/bricks/export/brick/testdir/common.txt
>
> [root at sn-1:/home/robot]
>
> 6>   then delete the corresponding entry(common.txt) in
> /mnt/bricks/export/brick/.glusterfs/indices/xattrop/
>
> [root at sn-1:/home/robot]
>
> # rm -rf
> /mnt/bricks/export/brick/.glusterfs/indices/xattrop/d0d237f7-0c43-4828-8720-dfb3792fe5fb
>
> 7>Restore network on sn-0 node.
>
> iptables -D OUTPUT -d  sn-1.local  -j DROP
>
>       iptables -D OUTPUT -d sn-2.local  -j DROP
>
>        iptables -D INPUT -s sn-1.local -j DROP
>
> iptables -D INPUT -s sn-2.local  -j DROP
>
> 7>   Do touch /mnt/export/testdir/common.txt on sn-0 node
>
> 8>   Gluster v heal export info will show following and keep for long time
>
> # gluster v heal export info
>
> Brick sn-0.local:/mnt/bricks/export/brick
>
> /testdir
>
> Status: Connected
>
> Number of entries: 1
>
>
>
> Brick sn-1.local:/mnt/bricks/export/brick
>
> Status: Connected
>
> Number of entries: 0
>
>
>
> Brick sn-2.local:/mnt/bricks/export/brick
>
> /testdir
>
> Status: Connected
>
> Number of entries: 1
>
>
>
>
>
>
>
> *From:* Pranith Kumar Karampuri <pkarampu at redhat.com>
> *Sent:* Wednesday, September 05, 2018 4:56 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com>
> *Cc:* Gluster Devel <gluster-devel at gluster.org>; Ravishankar N <
> ravishankar at redhat.com>
> *Subject:* Re: remaining entry in gluster volume heal info command even
> after reboot
>
>
>
>
>
> On Wed, Sep 5, 2018 at 1:27 PM Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.zhou at nokia-sbell.com> wrote:
>
> Hi glusterfs experts:
>
>        Good day!
>
>        Recently when I do some test on my gluster env, I found that there
> are some remaining entries in command “gluster v heal mstate info” *even
> after reboot*.
>
>     fstest_035ffc492ec43551a64087f9280ffe3e is a folder in /mnt/mstate
> and in this folder only one file(fstest_458bb82d8884ed5c9dadec4ed93bec4e)
> exists.
>
> When I dbg by gdb on sn-0 I found that:
>
> Parent dir changelog(fstest_035ffc492ec43551a64087f9280ffe3e) says need
> to heal entry, but the changelog/gfid/filetype of the only entry in parent
> dir shows there is nothing to be healed, so glustershd does nothing every
> round of heal. And this entry will remain.
>
> My gdb shows that each round of heal on sn-0 , it exits in function
> __afr_selfheal_entry (if (AFR_COUNT(healed_sinks, priv->child_count) ==
> 0)), because in this case all three healed_sinks are zero.
>
>
>
> What is the return value of this function in gdb?
>
>
>
>    Have you any idea how to solve this issue from glusterfs pov?
>      Thanks!
>
>
>
> [test steps]
>
>        Reboot three sn nodes( sn-0, sn-1, sn-2(arbiter)) sequentially, and
> on another node (with glusterfs clients) run fstest.
>
>
>
> [problem description]
>
>
>
> Remaining entries in “gluster v heal mstate info” command even after
> reboot sn-0 many times, the entries are still there!
>
>
>
> [root at sn-0:/home/robot]
>
> #  gluster v heal mstate info
>
> Brick sn-0.local:/mnt/bricks/mstate/brick
>
> /fstest_035ffc492ec43551a64087f9280ffe3e
>
> Status: Connected
>
> Number of entries: 1
>
>
>
> Brick sn-1.local:/mnt/bricks/mstate/brick
>
> Status: Connected
>
> Number of entries: 0
>
>
>
> Brick sn-2.local:/mnt/bricks/mstate/brick
>
> /fstest_035ffc492ec43551a64087f9280ffe3e
>
> Status: Connected
>
> Number of entries: 1
>
>
>
>
>
>
>
> ////////////////////////////////////////////////////////////////////some
> env informations///////////////////////////////////////////////////////
>
> # gluster v info mstate
>
> Volume Name: mstate
>
> Type: Replicate
>
> Volume ID: 1d896674-17a2-4ae7-aa7c-c6e22013df99
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: sn-0.local:/mnt/bricks/mstate/brick
>
> Brick2: sn-1.local:/mnt/bricks/mstate/brick
>
> Brick3: sn-2.local:/mnt/bricks/mstate/brick (arbiter)
>
> Options Reconfigured:
>
> performance.client-io-threads: off
>
> nfs.disable: on
>
> transport.address-family: inet
>
> cluster.server-quorum-type: none
>
> cluster.quorum-reads: no
>
> cluster.favorite-child-policy: mtime
>
> cluster.consistent-metadata: on
>
> network.ping-timeout: 42
>
> cluster.quorum-type: auto
>
> server.allow-insecure: on
>
> cluster.server-quorum-ratio: 51%
>
> [root at sn-1:/home/robot]
>
>
>
>
>
> [root at sn-2:/root]
>
> # getfattr -m . -d -e hex
> /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/
>
> getfattr: Removing leading '/' from absolute path names
>
> # file: mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/
>
> trusted.afr.dirty=0x000000000000000000000000
>
> trusted.afr.mstate-client-0=0x000000010000000000000003
>
> trusted.gfid=0xa0975560eaef4cb299467101de00446a
>
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>
>
>
> [root at sn-2:/root]
>
> # getfattr -m . -d -e hex
> /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/fstest_458bb82d8884ed5c9dadec4ed93bec4e
>
> getfattr: Removing leading '/' from absolute path names
>
> # file:
> mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/fstest_458bb82d8884ed5c9dadec4ed93bec4e
>
> trusted.afr.mstate-client-0=0x000000000000000000000000
>
> trusted.gfid=0x9fc20f587f094182816390f056f7370f
>
>
> trusted.gfid2path.864159d77373ad5f=0x61303937353536302d656165662d346362322d393934362d3731303164653030343436612f6673746573745f3435386262383264383838346564356339646164656334656439336265633465
>
>
>
> # cd /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e
>
> [root at sn-2
> :/mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e]
>
> # ls
>
> fstest_458bb82d8884ed5c9dadec4ed93bec4e
>
> [root at sn-2
> :/mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e]
>
> # stat fstest_458bb82d8884ed5c9dadec4ed93bec4e
>
>   File: fstest_458bb82d8884ed5c9dadec4ed93bec4e
>
>   Size: 0               Blocks: 8          IO Block: 4096   fifo
>
> Device: fd31h/64817d Inode: 22086       Links: 2
>
> Access: (0644/prw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
>
> Access: 2018-08-30 04:33:17.552870661 +0300
>
> Modify: 2018-08-30 04:33:17.552870661 +0300
>
> Change: 2018-08-30 04:33:17.553870661 +0300
>
> Birth: -
>
> [root at sn-2
> :/mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e]
>
> [root at sn-2:/root]
>
> # exit
>
> logout
>
> Connection to sn-2.local closed.
>
> [root at sn-0:/home/robot]
>
> # getfattr -m . -d -e hex
> /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/
>
> getfattr: Removing leading '/' from absolute path names
>
> # file: mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/
>
> trusted.afr.dirty=0x000000000000000000000000
>
> trusted.afr.mstate-client-2=0x00000000000000000000001c
>
> trusted.gfid=0xa0975560eaef4cb299467101de00446a
>
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>
>
>
> [root at sn-0:/home/robot]
>
> # getfattr -m . -d -e hex
> /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/fstest_458bb82d8884ed5c9dadec4ed93bec4e
>
> getfattr: Removing leading '/' from absolute path names
>
> # file:
> mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/fstest_458bb82d8884ed5c9dadec4ed93bec4e
>
> trusted.gfid=0x9fc20f587f094182816390f056f7370f
>
>
> trusted.gfid2path.864159d77373ad5f=0x61303937353536302d656165662d346362322d393934362d3731303164653030343436612f6673746573745f3435386262383264383838346564356339646164656334656439336265633465
>
>
>
> # ls /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e
>
> fstest_458bb82d8884ed5c9dadec4ed93bec4e
>
> [root at sn-0:/home/robot]
>
> # stat
> /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/fstest_458bb82d8884ed5c9dadec4ed93bec4e
>
>   File:
> /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/fstest_458bb82d8884ed5c9dadec4ed93bec4e
>
>   Size: 0               Blocks: 8          IO Block: 4096   fifo
>
> Device: fd31h/64817d Inode: 21899       Links: 2
>
> Access: (0644/prw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
>
> Access: 2018-08-30 04:33:17.552870661 +0300
>
> Modify: 2018-08-30 04:33:17.552870661 +0300
>
> Change: 2018-08-30 04:33:17.263809048 +0300
>
> Birth: -
>
> [root at sn-0:/home/robot]
>
> [root at sn-0:/home/robot]
>
> # ssh sn-1.local
>
>
>
> USAGE OF THE ROOT ACCOUNT AND THE FULL BASH IS RECOMMENDED ONLY FOR
> LIMITED USE. PLEASE USE A NON-ROOT ACCOUNT AND THE SCLI SHELL (fsclish)
> AND/OR LIMITED BASH SHELL.
>
>
>
> Read /opt/nokia/share/security/readme_root.txt for more details.
>
>
>
> [root at sn-1:/root]
>
> # getfattr -m . -d -e hex
> /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/
>
> getfattr: Removing leading '/' from absolute path names
>
> # file: mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/
>
> trusted.afr.dirty=0x000000000000000000000000
>
> trusted.afr.mstate-client-0=0x000000000000000000000000
>
> trusted.afr.mstate-client-2=0x000000000000000000000000
>
> trusted.gfid=0xa0975560eaef4cb299467101de00446a
>
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>
>
>
> [root at sn-1:/root]
>
> # getfattr -m . -d -e hex
> /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/fstest_458bb82d8884ed5c9dadec4ed93bec4e
>
> getfattr: Removing leading '/' from absolute path names
>
> # file:
> mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/fstest_458bb82d8884ed5c9dadec4ed93bec4e
>
> trusted.afr.mstate-client-0=0x000000000000000000000000
>
> trusted.gfid=0x9fc20f587f094182816390f056f7370f
>
>
> trusted.gfid2path.864159d77373ad5f=0x61303937353536302d656165662d346362322d393934362d3731303164653030343436612f6673746573745f3435386262383264383838346564356339646164656334656439336265633465
>
>
>
> [root at sn-1:/root]
>
> [root at sn-1:/root]
>
> # ls /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e
>
> fstest_458bb82d8884ed5c9dadec4ed93bec4e
>
> [root at sn-1:/root]
>
> # stat
> /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/fstest_458bb82d8884ed5c9dadec4ed93bec4e
>
>
>   File:
> /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/fstest_458bb82d8884ed5c9dadec4ed93bec4e
>
>   Size: 0               Blocks: 8          IO Block: 4096   fifo
>
> Device: fd31h/64817d Inode: 22168       Links: 2
>
> Access: (0644/prw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
>
> Access: 2018-08-30 04:33:17.552870661 +0300
>
> Modify: 2018-08-30 04:33:17.552870661 +0300
>
> Change: 2018-08-30 04:33:17.037673648 +0300
>
> Birth: -
>
> [root at sn-1:/root]
>
>
>
>
>
>
>
>
>
> --
>
> Pranith
>


-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180905/913cea82/attachment-0001.html>


More information about the Gluster-devel mailing list