[Bugs] [Bug 1661889] New: Metadata heal picks different brick each time as source if there are no pending xattrs.
bugzilla at redhat.com
bugzilla at redhat.com
Mon Dec 24 09:13:44 UTC 2018
https://bugzilla.redhat.com/show_bug.cgi?id=1661889
Bug ID: 1661889
Summary: Metadata heal picks different brick each time as
source if there are no pending xattrs.
Product: GlusterFS
Version: mainline
Status: NEW
Component: replicate
Severity: medium
Assignee: bugs at gluster.org
Reporter: ravishankar at redhat.com
CC: bugs at gluster.org
Target Milestone: ---
Classification: Community
Description of problem:
There were a few instances reported both upstream and downstream where a RHHI
setup had missing shard xattrs on the file for all 3 copies of the replica,
potentially leading to VM pause.
Comments that I had made on a downstream BZ regarding this problem:
> As for the xattrs missing on all bricks of the replica, even
> though metadata heal does a removexattr and setxattr as part of healing, it
> does so only on the 'sink' bricks. The xattr must still remain on the
> 'source' brick. I'm going through the code and seeing if there is a
> possibility of picking the brick where say the removexattr suceeded and
> setxattr failed, as a source for a subsequent spurious metadata heal so that
> it gets removed on all bricks.
> Okay, so I found one corner case where xattrs can go missing from all
> bricks. If there is a metadata heal triggered (genuine or spurious as in
> this case due to mismatching bitrot xattrs) and there are no afr pending
> xattrs indicating which brick(s) is/are good and bad, then all bricks are
> considered sources. afr_choose_source_by_policy() then picks the local
> brick as a source, and the other ones are considered sinks and the metadata
> heal is initiated.
>
> One mount can pick up one local brick (say brick1) as source. During
> metadata heal, the removexattr succeeds on 2 sink bricks (brick2 and brick3)
> but setxattr fails because of say ENOTCONN. Thus 2 bricks have their shard
> xattrs missing.
> In RHHI setup, it can so happen that another mount which is local to one of
> the 2 sink bricks can again trigger metadata heal on the same file, this
> time picking one of the bad bricks (say brick2) as a source. Brick1 is now a
> sink for this heal and the shard xattr gets removed from it, resulting in
> all 3 bricks left without the xattr. Let me see what is the best way to fix
> this.
Version-Release number of selected component (if applicable):
It had a high chance of occurring in glusterfs 3.8 (RHGS-3.3.1) if bitrot was
enabled and then disabled, which caused spurious metadata heals to be launched
during each lookup on the file. (The birtot bug itself has been fixed in
subsequent releases).
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list