[Bugs] [Bug 1473150] New: features/shard: Lookup on shard 18 failed. Base file gfid = b00f5de2-d811-44fe-80e5-1f382908a55a [No data available]", the [ No data available]

bugzilla at redhat.com bugzilla at redhat.com
Thu Jul 20 06:44:41 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1473150

            Bug ID: 1473150
           Summary: features/shard:Lookup on shard 18 failed. Base file
                    gfid = b00f5de2-d811-44fe-80e5-1f382908a55a [No data
                    available]", the [No data available]
           Product: GlusterFS
           Version: mainline
         Component: sharding
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: jianwei1216 at qq.com
        QA Contact: bugs at gluster.org
                CC: bugs at gluster.org



Description of problem:
  Using postmark test the write-behind+shard+dht+ec(2*(2+1)) cluster storage, I
found more ERROR messages in the /var/log/mnt-test-.log,
such as "[shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on
shard 18 failed. Base file gfid = b00f5de2-d811-44fe-80e5-1f382908a55a [No data
available]", the [No data available] errno=61.
  This errno(61) is not handled in shard_writev(), it lead to shard_writev()
failed and callback up layer xlator failed (op_ret=-1, op_errno=61).
  Usually, shard_writev() STACK_UNWIND(op_ret=-1, op_errno=61) failed, the file
should be ERROR when md5sum, but it is right!! I see the codes
wb_fulfill_cbk(), it add the wb_request to TODO list and retry writev again, So
the file md5sum is right, not error.
  ===>Close the write-behind, I can't reproduce.<====

  The errno(61) generates by the [storage/posix]posix_lookup(), because the
buf.ia_gfid is NULL and set errno=ENODATA.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 posix_lookup()
 {
        op_ret = -1;
        if (gf_uuid_is_null (loc->pargfid) || (loc->name == NULL)) { // not run
                /* nameless lookup */
                MAKE_INODE_HANDLE (real_path, this, loc, &buf);
        } else {  // run here
                MAKE_ENTRY_HANDLE (real_path, par_path, this, loc, &buf); //<1>
get buf failed, op_ret=-1

                if (gf_uuid_is_null (loc->inode->gfid)) { // local->inode->gfid
is NULL
                        posix_gfid_heal (this, real_path, loc, xdata); //<2>
run here 
                        {
                              if (!xattr_req)
                                    goto out;

                              if (sys_lstat (path, &stat) != 0)
                                    goto out; //<3> here return ret=-1 and
errno=2(ENOENT), so GOTO OUT, it indicate the path file is not exists

                              ret = sys_lgetxattr (path, GFID_XATTR_KEY,
uuid_curr, 16);
                              if (ret != 16) {
                                        if (is_fresh_file (&stat)) {
                                                 ret = -1;
                                                 errno = ENOENT;
                                                 goto out;
                                        }
                              }

                             ret = posix_gfid_set (this, path, loc, xattr_req);
                        out:
                               return ret;
                        }

                        MAKE_ENTRY_HANDLE (real_path, par_path, this, //<4> BUT
run here, get buf success, op_ret=0, but it's buf->ia_gfid is NULL, it indicate
the file is exist but not set GFID.
                                           loc, &buf);
                }
        }

  out:
        if (!op_ret && !gfidless && gf_uuid_is_null (buf.ia_gfid)) { // <5> run
here, op_ret=0, gfidless==0, buf.ia_gfid==NULL
                gf_msg (this->name, GF_LOG_ERROR, ENODATA, P_MSG_NULL_GFID,
                        "buf->ia_gfid is null for "
                        "%s", (real_path) ? real_path: "");
                op_ret = -1;
                op_errno = ENODATA; //<6> run here, op_errno=ENODATA
        }
  }

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<



Version-Release number of selected component (if applicable):
  git url: https://review.gluster.org/glusterfs
  branch: master (commit 11eb8ba870457337c6067284dde4277e09764c0a)


GlusterFS configure summary
===========================
FUSE client          : yes
Infiniband verbs     : yes
epoll IO multiplex   : yes
argp-standalone      : no
fusermount           : yes
readline             : yes
georeplication       : yes
Linux-AIO            : yes
Enable Debug         : yes
Block Device xlator  : yes
glupy                : yes
Use syslog           : yes
XML output           : yes
Encryption xlator    : yes
Unit Tests           : no
Track priv ports     : yes
POSIX ACLs           : yes
Data Classification  : yes
SELinux features     : yes
firewalld-config     : no
Experimental xlators : yes
Events               : yes
EC dynamic support   : x64 sse avx
Use memory pools     : yes
Nanosecond m/atimes  : yes
Legacy gNFS server   : no

How reproducible:
posmark
write-behind + shard + dht + ec
Three CentOS7.3 nodes and 2 disks on every node.
X86_64

Steps to Reproduce:
1.create a cluster storage service write-behind + shard + dht + ec (2 * (2+1)) 
Volume Name: test
Type: Distributed-Disperse
Volume ID: a110d762-6ce0-4309-b9b2-29c4b9d3dd84
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: node-1.test:/mnt/disk1
Brick2: node-2.test:/mnt/disk3
Brick3: node-3.test:/mnt/disk5
Brick4: node-1.test:/mnt/disk2
Brick5: node-2.test:/mnt/disk4
Brick6: node-3.test:/mnt/disk6
Options Reconfigured:
performance.nfs.write-behind: disable
network.ping-timeout: 20
performance.least-prio-threads: 64
performance.high-prio-threads: 64
cluster.readdir-optimize: on
performance.io-thread-count: 64
features.lock-heal: on
nfs.rpc-auth-reject: *.*.*.*
server.allow-insecure: on
cluster.heal-timeout: 60
performance.stat-prefetch: off
performance.low-prio-threads: 64
features.shard: on
transport.address-family: inet
nfs.disable: on

mount.gluster 127.0.0.1:/test /mnt/test

2. Terminal-1# ./postmark conf
[conf]
set size 104800000 104800000
set number 66
set location /mnt/test
set read 65536
set write 65536
set bias create -1
set bias read 0
set transactions 1
run result.txt
show
quit
3.Terminal-2# cd /var/log/glusterfs after postmark over.
grep ' E ' /var/log/glusterfs/mnt-test-.log
[2017-07-20 05:53:06.944290] E [MSGID: 133010]
[shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 18
failed. Base file gfid = b00f5de2-d811-44fe-80e5-1f382908a55a [No data
available]
[2017-07-20 05:53:25.095401] E [MSGID: 133010]
[shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 7
failed. Base file gfid = 1cbf5af2-b8a8-4327-a450-7335b7f84c52 [No data
available]
[2017-07-20 05:53:38.661017] E [MSGID: 133010]
[shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 4
failed. Base file gfid = 7cf1712c-0576-428f-a5ce-052eb5b6eddb [No data
available]
[2017-07-20 05:53:43.018596] E [MSGID: 133010]
[shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 20
failed. Base file gfid = acf750ff-c737-4806-a5e6-d2538579b984 [No data
available]
[2017-07-20 05:54:21.841236] E [MSGID: 133010]
[shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 21
failed. Base file gfid = 67e415ea-ba27-4a2c-89d0-c5330a624d9d [No data
available]
[2017-07-20 05:54:32.529585] E [MSGID: 133010]
[shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 4
failed. Base file gfid = f437eda1-ddd0-409b-b6d1-32201f478cb0 [No data
available]
[2017-07-20 05:54:33.727440] E [MSGID: 133010]
[shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 22
failed. Base file gfid = f437eda1-ddd0-409b-b6d1-32201f478cb0 [No data
available]
[2017-07-20 05:54:43.243084] E [MSGID: 133010]
[shard.c:1720:shard_common_lookup_shards_cbk] 0-test-shard: Lookup on shard 15
failed. Base file gfid = 4f56cfac-9062-4c37-8c30-e660b807b6ad [No data
available]

Actual results:
shard_common_lookup_shards_cbk() recieve the op_ret=-1, op_errno=ENODATA;

Expected results:
shard_common_lookup_shards_cbk() should handle the op_errno=ENODATA or
posix_lookup() should handle the ENODATA. After all ENODATA not indicate file
not exist and shard can be to writev().

Additional info:

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list