[Bugs] [Bug 1729223] New: Ganesha-gfapi logs are flooded with error messages related to "gf_uuid_is_null(gfid)) [Invalid argument]" when lookups are running from multiple clients
bugzilla at redhat.com
bugzilla at redhat.com
Thu Jul 11 15:53:53 UTC 2019
https://bugzilla.redhat.com/show_bug.cgi?id=1729223
Bug ID: 1729223
Summary: Ganesha-gfapi logs are flooded with error messages
related to "gf_uuid_is_null(gfid)) [Invalid argument]"
when lookups are running from multiple clients
Product: GlusterFS
Version: 4.1
Hardware: All
OS: All
Status: NEW
Component: libgfapi
Keywords: Triaged
Severity: high
Priority: medium
Assignee: bugs at gluster.org
Reporter: skoduri at redhat.com
QA Contact: bugs at gluster.org
CC: bugs at gluster.org, dang at redhat.com, ffilz at redhat.com,
grajoria at redhat.com, jthottan at redhat.com,
mbenjamin at redhat.com, msaini at redhat.com, pasik at iki.fi,
rhs-bugs at redhat.com, sankarshan at redhat.com,
skoduri at redhat.com, storage-qa-internal at redhat.com
Depends On: 1718316, 1720635, 1720636
Blocks: 1717784, 1726935
Target Milestone: ---
Classification: Community
+++ This bug was initially created as a clone of Bug #1720635 +++
+++ This bug was initially created as a clone of Bug #1718316 +++
+++ This bug was initially created as a clone of Bug #1717784 +++
Description of problem:
=========================
Ganesha-gfapi logs are flooded with errors messages related to
gf_uuid_is_null(gfid), when linux untars and lookups are running from multiple
clients-
---------
[2019-06-06 07:56:12.503603] E [glfs-handleops.c:1892:glfs_h_find_handle]
(-->/lib64/libgfapi.so.0(+0xe0ae) [0x7f7e91e8b0ae]
-->/lib64/libgfapi.so.0(+0x258f1) [0x7f7e91ea28f1]
-->/lib64/libgfapi.so.0(+0x257c4) [0x7f7e91ea27c4] ) 0-glfs_h_find_handle:
invalid argument: !(gf_uuid_is_null(gfid)) [Invalid argument]
---------
Version-Release number of selected component (if applicable):
===========================
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.7 Beta (Maipo)
# rpm -qa | grep ganesha
nfs-ganesha-2.7.3-3.el7rhgs.x86_64
glusterfs-ganesha-6.0-3.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.7.3-3.el7rhgs.x86_64
nfs-ganesha-gluster-2.7.3-3.el7rhgs.x86_64
How reproducible:
=====================
2/2
Steps to Reproduce:
======================
1.Create 4 node Ganesha cluster
2.Create 4*3 Distribute-replicate Volume.Export the volume via Ganesha
3.Mount the volume on 4 clients via v4.1 protocol
4.Run the following workload
Client 1: Run linux untars
Client 2: du -sh in loop
Client 3: ls -lRt in loop
Client 4: find's in loop
Actual results:
==================
While test is running,ganesha-gfapi logs are flooded with errors related to
"gf_uuid_is_null"
======
[2019-06-03 16:54:19.829136] E [glfs-handleops.c:1892:glfs_h_find_handle]
(-->/lib64/libgfapi.so.0(+0xe0ae) [0x7ff6902d00ae]
-->/lib64/libgfapi.so.0(+0x2594a) [0x7ff6902e794a]
-->/lib64/libgfapi.so.0(+0x257c4) [0x7ff6902e77c4] ) 0-glfs_h_find_handle:
invalid argument: !(gf_uuid_is_null(gfid)) [Invalid argument]
[2019-06-03 16:54:20.006163] E [glfs-handleops.c:1892:glfs_h_find_handle]
(-->/lib64/libgfapi.so.0(+0xe0ae) [0x7ff6902d00ae]
-->/lib64/libgfapi.so.0(+0x2594a) [0x7ff6902e794a]
-->/lib64/libgfapi.so.0(+0x257c4) [0x7ff6902e77c4] ) 0-glfs_h_find_handle:
invalid argument: !(gf_uuid_is_null(gfid)) [Invalid argument]
[2019-06-03 16:54:20.320293] E [glfs-handleops.c:1892:glfs_h_find_handle]
(-->/lib64/libgfapi.so.0(+0xe0ae) [0x7ff6902d00ae]
-->/lib64/libgfapi.so.0(+0x2594a) [0x7ff6902e794a]
-->/lib64/libgfapi.so.0(+0x257c4) [0x7ff6902e77c4] ) 0-glfs_h_find_handle:
invalid argument: !(gf_uuid_is_null(gfid)) [Invalid argument]
=====
# cat /var/log/ganesha/ganesha-gfapi.log | grep gf_uuid_is_null | wc -l
605340
Expected results:
===================
There should not be error messages in ganesha-gfapi.logs
Additional info:
===================
On narrowing down the test scenario,Seems to be the error messages are coming
when only du -sh and ls -lRt are running in loop from two different clients
--- Additional comment from RHEL Product and Program Management on 2019-06-06
08:10:27 UTC ---
This bug is automatically being proposed for the next minor release of Red Hat
Gluster Storage by setting the release flag 'rhgs‑3.5.0' to '?'.
If this bug should be proposed for a different release, please manually change
the proposed release flag.
--- Additional comment from Soumya Koduri on 2019-06-06 09:48:36 UTC ---
@Manisha,
are these clients connected to different NFS-Ganesha servers? On which machine
did you observe these errors? I do not see such messages in the sosreports
uploaded.
>>> On narrowing down the test scenario,Seems to be the error messages are coming when only du -sh and ls -lRt are running in loop from two different clients
Does this mean, these messages are not seen with just linux untar test?
--- Additional comment from Manisha Saini on 2019-06-06 10:16:00 UTC ---
(In reply to Soumya Koduri from comment #3)
> @Manisha,
>
> are these clients connected to different NFS-Ganesha servers? On which
> machine did you observe these errors? I do not see such messages in the
> sosreports uploaded.
Hi soumya,
All the clients are connected to single server VIP
I see there is some issue with how sosreport collecting ganesha logs.All logs
are not captured as part of sosreport.
>
> >>> On narrowing down the test scenario,Seems to be the error messages are coming when only du -sh and ls -lRt are running in loop from two different clients
>
> Does this mean, these messages are not seen with just linux untar test?
No.Not seen with only untars
--- Additional comment from Soumya Koduri on 2019-06-07 10:08:03 UTC ---
Thanks Manisha for sharing the setup and logs.
"0-glfs_h_find_handle: invalid argument: !(gf_uuid_is_null(gfid)) [Invalid
argument] "
The above message is logged while processing upcall requests. Somehow the gfid
passed has become NULL. IMO there are two issues to be considered here -
> there are so many upcall requests generated even though there is only single server serving all the clients.
Seems like the data being accessed is huge and hence the server is trying clean
up the inodes from the lru list. While destroying a inode, upcall xlator sends
cache invalidation request to all its clients to notify that the particular
file/inode entry is no more cached by the server. This logic can be optimized a
bit here.
For nameless lookups, server generates a dummy inode (say inodeD) and later
links it to inode (if there is no entry already present for that file/dir) in
the cbk path. So as part of lookup_cbk, though the inode (inodeD) received is
invalid, upcall xlator creates an inode_ctx entry as it eventually can get
linked to the inode table. However in certain cases, if there is already an
inode (say inodeC) present for that particular file, this new inode (inodeD)
created will be purged, which results in sending upcall notifications to the
clients.
in Manisha's testcase, as the data created is huge and being looked up in a
loop, there are many such dummy inode entries getting purged resulting in huge
number of upcall notifications sent to the client. We can avoid this issue to
an extent by checking if the inode is valid or not (i.e, linked or not) before
sending callback notifications.
note - this has been day-1 issue but good to be fixed.
* Another issue is gfid becoming NULL in upcall args.
> I couldn't reproduce this issue on my setup. However seems like in upcall xlator we already check if the gfid is not NULL before sending notification.
GF_VALIDATE_OR_GOTO("upcall_client_cache_invalidate",
!(gf_uuid_is_null(gfid)), out);
So that means somewhere in the client processing, gfid has become NULL. From
further code-reading I see a potential issue in upcall processing callback
function -
In glfs_cbk_upcall_data(),
--
args->fs = fs;
args->upcall_data = gf_memdup(upcall_data, sizeof(*upcall_data));
--
gf_memdup() may not be the right routine to use here as upcall_data structure
contains pointers to other data. This definitely needs to be fixed. However
would like to re-confirm if this caused gfid to become NULL. Request Manisha to
share setup (if possible) while the tests going on to confirm this theory.
Thanks!
--- Additional comment from Worker Ant on 2019-06-07 14:09:44 UTC ---
REVIEW: https://review.gluster.org/22839 (gfapi: fix incorrect initialization
of upcall syncop arguments) posted (#1) for review on master by soumya k
--- Additional comment from Worker Ant on 2019-06-14 12:11:53 UTC ---
REVIEW: https://review.gluster.org/22871 (gfapi: fix incorrect initialization
of upcall syncop arguments) posted (#1) for review on release-6 by soumya k
--- Additional comment from Worker Ant on 2019-07-05 09:20:16 UTC ---
REVIEW: https://review.gluster.org/22871 (gfapi: fix incorrect initialization
of upcall syncop arguments) merged (#7) on release-6 by hari gowtham
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1717784
[Bug 1717784] Ganesha-gfapi logs are flooded with error messages related to
"gf_uuid_is_null(gfid)) [Invalid argument]" when lookups are running from
multiple clients
https://bugzilla.redhat.com/show_bug.cgi?id=1718316
[Bug 1718316] Ganesha-gfapi logs are flooded with error messages related to
"gf_uuid_is_null(gfid)) [Invalid argument]" when lookups are running from
multiple clients
https://bugzilla.redhat.com/show_bug.cgi?id=1720635
[Bug 1720635] Ganesha-gfapi logs are flooded with error messages related to
"gf_uuid_is_null(gfid)) [Invalid argument]" when lookups are running from
multiple clients
https://bugzilla.redhat.com/show_bug.cgi?id=1720636
[Bug 1720636] Ganesha-gfapi logs are flooded with error messages related to
"gf_uuid_is_null(gfid)) [Invalid argument]" when lookups are running from
multiple clients
https://bugzilla.redhat.com/show_bug.cgi?id=1726935
[Bug 1726935] (glusterfs-6.4) - GlusterFS 6.4 tracker
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list