[Bugs] [Bug 1686364] New: [ovirt-gluster] Rolling gluster upgrade from 3.12.5 to 5.3 led to shard on-disk xattrs disappearing

Thu Mar 7 10:38:51 UTC 2019

https://bugzilla.redhat.com/show_bug.cgi?id=1686364

            Bug ID: 1686364
           Summary: [ovirt-gluster] Rolling gluster upgrade from 3.12.5 to
                    5.3 led to shard on-disk xattrs disappearing
           Product: GlusterFS
           Version: 6
            Status: NEW
         Component: core
          Keywords: Triaged
          Severity: high
          Priority: high
          Assignee: bugs at gluster.org
          Reporter: atumball at redhat.com
                CC: bugs at gluster.org, kdhananj at redhat.com,
                    sabose at redhat.com
        Depends On: 1684385
            Blocks: 1672818 (glusterfs-6.0)
  Target Milestone: ---
    Classification: Community

+++ This bug was initially created as a clone of Bug #1684385 +++

Description of problem:

When gluster bits were upgraded in a hyperconverged ovirt-gluster setup, one
node at a time in online mode from 3.12.5 to 5.3, the following log messages
were seen -

[2019-02-26 16:24:25.126963] E [shard.c:556:shard_modify_size_and_block_count]
(-->/usr/lib64/glusterfs/5.3/xlator/cluster/distribute.so(+0x82a45)
[0x7ff71d05ea45] -->/usr/lib64/glusterfs/5.3/xlator/features/shard.so(+0x5c77)
[0x7ff71cdb4c77] -->/usr/lib64/glusterfs/5.3/xlator/features/shard.so(+0x592e)
[0x7ff71cdb492e] ) 0-engine-shard: Failed to get
trusted.glusterfs.shard.file-size for 3ad3f0c6-a4e6-4b17-bd29-97c32ecc54d7

Version-Release number of selected component (if applicable):

How reproducible:
1/1

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:
shard.file.size xattr should always be accessible.

Additional info:

--- Additional comment from Krutika Dhananjay on 2019-03-01 07:13:48 UTC ---

[root at tendrl25 glusterfs]# gluster v info engine

Volume Name: engine
Type: Replicate
Volume ID: bb26f648-2842-4182-940e-6c8ede02195f
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: tendrl27.lab.eng.blr.redhat.com:/gluster_bricks/engine/engine
Brick2: tendrl26.lab.eng.blr.redhat.com:/gluster_bricks/engine/engine
Brick3: tendrl25.lab.eng.blr.redhat.com:/gluster_bricks/engine/engine
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 30
performance.strict-o-direct: on
cluster.granular-entry-heal: enable

--- Additional comment from Krutika Dhananjay on 2019-03-01 07:23:02 UTC ---

On further investigation, it was found that the shard xattrs were genuinely
missing on all 3 replicas -

[root at tendrl27 ~]# getfattr -d -m . -e hex
/gluster_bricks/engine/engine/36ea5b11-19fb-4755-b664-088f6e5c4df2/dom_md/ids
getfattr: Removing leading '/' from absolute path names
# file:
gluster_bricks/engine/engine/36ea5b11-19fb-4755-b664-088f6e5c4df2/dom_md/ids
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.engine-client-1=0x000000000000000000000000
trusted.afr.engine-client-2=0x000000000000000000000000
trusted.gfid=0x3ad3f0c6a4e64b17bd2997c32ecc54d7
trusted.gfid2path.5f2a4f417210b896=0x64373265323737612d353761642d343136322d613065332d6339346463316231366230322f696473

[root at localhost ~]# getfattr -d -m . -e hex
/gluster_bricks/engine/engine/36ea5b11-19fb-4755-b664-088f6e5c4df2/dom_md/ids
getfattr: Removing leading '/' from absolute path names
# file:
gluster_bricks/engine/engine/36ea5b11-19fb-4755-b664-088f6e5c4df2/dom_md/ids
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.engine-client-0=0x0000000e0000000000000000
trusted.afr.engine-client-2=0x000000000000000000000000
trusted.gfid=0x3ad3f0c6a4e64b17bd2997c32ecc54d7
trusted.gfid2path.5f2a4f417210b896=0x64373265323737612d353761642d343136322d613065332d6339346463316231366230322f696473

[root at tendrl25 ~]# getfattr -d -m . -e hex
/gluster_bricks/engine/engine/36ea5b11-19fb-4755-b664-088f6e5c4df2/dom_md/ids
getfattr: Removing leading '/' from absolute path names
# file:
gluster_bricks/engine/engine/36ea5b11-19fb-4755-b664-088f6e5c4df2/dom_md/ids
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.engine-client-0=0x000000100000000000000000
trusted.afr.engine-client-1=0x000000000000000000000000
trusted.gfid=0x3ad3f0c6a4e64b17bd2997c32ecc54d7
trusted.gfid2path.5f2a4f417210b896=0x64373265323737612d353761642d343136322d613065332d6339346463316231366230322f696473

Also from the logs, it appears the file underwent metadata self-heal moments
before these errors started to appear-
[2019-02-26 13:35:37.253896] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-engine-replicate-0:
performing metadata selfheal on 3ad3f0c6-a4e6-4b17-bd29-97c32ecc54d7
[2019-02-26 13:35:37.254734] W [MSGID: 101016] [glusterfs3.h:752:dict_to_xdr]
0-dict: key 'trusted.glusterfs.shard.file-size' is not sent on wire [Invalid
argument]
[2019-02-26 13:35:37.254749] W [MSGID: 101016] [glusterfs3.h:752:dict_to_xdr]
0-dict: key 'trusted.glusterfs.shard.block-size' is not sent on wire [Invalid
argument]
[2019-02-26 13:35:37.255777] I [MSGID: 108026]
[afr-self-heal-common.c:1729:afr_log_selfheal] 0-engine-replicate-0: Completed
metadata selfheal on 3ad3f0c6-a4e6-4b17-bd29-97c32ecc54d7. sources=[0]  sinks=2
[2019-02-26 13:35:37.258032] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-engine-replicate-0:
performing metadata selfheal on 3ad3f0c6-a4e6-4b17-bd29-97c32ecc54d7
[2019-02-26 13:35:37.258792] W [MSGID: 101016] [glusterfs3.h:752:dict_to_xdr]
0-dict: key 'trusted.glusterfs.shard.file-size' is not sent on wire [Invalid
argument]
[2019-02-26 13:35:37.258807] W [MSGID: 101016] [glusterfs3.h:752:dict_to_xdr]
0-dict: key 'trusted.glusterfs.shard.block-size' is not sent on wire [Invalid
argument]
[2019-02-26 13:35:37.259633] I [MSGID: 108026]
[afr-self-heal-common.c:1729:afr_log_selfheal] 0-engine-replicate-0: Completed
metadata selfheal on 3ad3f0c6-a4e6-4b17-bd29-97c32ecc54d7. sources=[0]  sinks=2 

Metadata heal as we know does three things - 1. bulk getxattr from source
brick; 2. removexattr on sink bricks 3. bulk setxattr on the sink bricks

But what's clear from these logs is the dict_to_xdr() messages at the time of
metadata heal, indicating that the shard xattrs were possibly not "sent on
wire" as part of step 3.
Turns out due to the newly introduced dict_to_xdr() code in 5.3 which is absent
in 3.12.5.

The bricks were upgraded to 5.3 in the order tendrl25 followed by tendrl26 with
tendrl27 still at 3.12.5 when this issue was hit -

Tendrl25:
[2019-02-26 12:47:53.595647] I [MSGID: 100030] [glusterfsd.c:2715:main]
0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 5.3 (args:
/usr/sbin/glusterfsd -s tendrl25.lab.eng.blr.redhat.com --volfile-id
engine.tendrl25.lab.eng.blr.redhat.com.gluster_bricks-engine-engine -p
/var/run/gluster/vols/engine/tendrl25.lab.eng.blr.redhat.com-gluster_bricks-engine-engine.pid
-S /var/run/gluster/aae83600c9a783dd.socket --brick-name
/gluster_bricks/engine/engine -l
/var/log/glusterfs/bricks/gluster_bricks-engine-engine.log --xlator-option
*-posix.glusterd-uuid=9373b871-cfce-41ba-a815-0b330f6975c8 --process-name brick
--brick-port 49153 --xlator-option engine-server.listen-port=49153)

Tendrl26:
[2019-02-26 13:35:05.718052] I [MSGID: 100030] [glusterfsd.c:2715:main]
0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 5.3 (args:
/usr/sbin/glusterfsd -s tendrl26.lab.eng.blr.redhat.com --volfile-id
engine.tendrl26.lab.eng.blr.redhat.com.gluster_bricks-engine-engine -p
/var/run/gluster/vols/engine/tendrl26.lab.eng.blr.redhat.com-gluster_bricks-engine-engine.pid
-S /var/run/gluster/8010384b5524b493.socket --brick-name
/gluster_bricks/engine/engine -l
/var/log/glusterfs/bricks/gluster_bricks-engine-engine.log --xlator-option
*-posix.glusterd-uuid=18fa886f-8d1a-427c-a5e6-9a4e9502ef7c --process-name brick
--brick-port 49153 --xlator-option engine-server.listen-port=49153)

Tendrl27:
[root at tendrl27 bricks]# rpm -qa | grep gluster
glusterfs-fuse-3.12.15-1.el7.x86_64
glusterfs-libs-3.12.15-1.el7.x86_64
glusterfs-3.12.15-1.el7.x86_64
glusterfs-server-3.12.15-1.el7.x86_64
glusterfs-client-xlators-3.12.15-1.el7.x86_64
glusterfs-api-3.12.15-1.el7.x86_64
glusterfs-events-3.12.15-1.el7.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.4.x86_64
glusterfs-gnfs-3.12.15-1.el7.x86_64
glusterfs-geo-replication-3.12.15-1.el7.x86_64
glusterfs-cli-3.12.15-1.el7.x86_64
vdsm-gluster-4.20.46-1.el7.x86_64
python2-gluster-3.12.15-1.el7.x86_64
glusterfs-rdma-3.12.15-1.el7.x86_64

And as per the metadata heal logs, the source was brick0 (corresponding to
tendrl27) and sink was brick 2 (corresponding to tendrl 25).
This means step 1 of metadata heal did a getxattr on tendrl27 which was still
at 3.12.5 and got the dicts with a certain format which didn't have the "value"
type (because it's only introduced in 5.3).
And this same dict was used for setxattr in step 3 which silently fails to add
"trusted.glusterfs.shard.block-size" and "trusted.glusterfs.shard.file-size"
xattrs to the setxattr request because of the dict_to_xdr() conversion failure
in protocol/client but succeeds the overall operation. So afr thought the heal
succeeded although the xattr that needed heal was never sent over the wire.
This led to one or more files ending up with shard xattrs removed on-disk
failing every other operation on it pretty much.

--- Additional comment from Krutika Dhananjay on 2019-03-01 07:29:29 UTC ---

So the backward compatibility was broken with the introduction of the following
patch -

Patch that broke this compatibility - 

https://review.gluster.org/c/glusterfs/+/19098

commit 303cc2b54797bc5371be742543ccb289010c92f2
Author: Amar Tumballi <amarts at redhat.com>
Date:   Fri Dec 22 13:12:42 2017 +0530

    protocol: make on-wire-change of protocol using new XDR definition.

    With this patchset, some major things are changed in XDR, mainly:

    * Naming: Instead of gfs3/gfs4 settle for gfx_ for xdr structures
    * add iattx as a separate structure, and add conversion methods
    * the *_rsp structure is now changed, and is also reduced in number
      (ie, no need for different strucutes if it is similar to other response).
    * use proper XDR methods for sending dict on wire.

    Also, with the change of xdr structure, there are changes needed
    outside of xlator protocol layer to handle these properly. Mainly
    because the abstraction was broken to support 0-copy RDMA with payload
    for write and read FOP. This made transport layer know about the xdr
    payload, hence with the change of xdr payload structure, transport layer
    needed to know about the change.

    Updates #384

    Change-Id: I1448fbe9deab0a1b06cb8351f2f37488cefe461f
    Signed-off-by: Amar Tumballi <amarts at redhat.com>

Any operation in a heterogeneous cluster which reads xattrs on-disk and
subsequently writes it (like metadata heal for instance) will cause one or more
on-disk xattrs to disappear.

In fact logs suggest even dht on-disk layouts vanished -

[2019-02-26 13:35:30.253348] I [MSGID: 109092]
[dht-layout.c:744:dht_layout_dir_mismatch] 0-engine-dht:
/36ea5b11-19fb-4755-b664-088f6e5c4df2: Disk layout missing, gfid =
d0735acd-14ec-4ef9-8f5f-6a3c4ae12c08

--- Additional comment from Worker Ant on 2019-03-05 03:16:15 UTC ---

REVIEW: https://review.gluster.org/22300 (dict: handle STR_OLD data type in xdr
conversions) posted (#1) for review on master by Amar Tumballi

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1672818
[Bug 1672818] GlusterFS 6.0 tracker
https://bugzilla.redhat.com/show_bug.cgi?id=1684385
[Bug 1684385] [ovirt-gluster] Rolling gluster upgrade from 3.12.5 to 5.3 led to
shard on-disk xattrs disappearing
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.