[Bugs] [Bug 1720635] New: Ganesha-gfapi logs are flooded with error messages related to "gf_uuid_is_null(gfid)) [Invalid argument]" when lookups are running from multiple clients

bugzilla at redhat.com bugzilla at redhat.com
Fri Jun 14 12:09:29 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1720635

            Bug ID: 1720635
           Summary: Ganesha-gfapi logs are flooded with error messages
                    related to "gf_uuid_is_null(gfid)) [Invalid argument]"
                    when lookups are running from multiple clients
           Product: GlusterFS
           Version: 6
          Hardware: All
                OS: All
            Status: NEW
         Component: libgfapi
          Keywords: Triaged
          Severity: high
          Priority: medium
          Assignee: bugs at gluster.org
          Reporter: skoduri at redhat.com
        QA Contact: bugs at gluster.org
                CC: bugs at gluster.org, dang at redhat.com, ffilz at redhat.com,
                    grajoria at redhat.com, jthottan at redhat.com,
                    mbenjamin at redhat.com, msaini at redhat.com, pasik at iki.fi,
                    rhs-bugs at redhat.com, sankarshan at redhat.com,
                    skoduri at redhat.com, storage-qa-internal at redhat.com
        Depends On: 1718316
            Blocks: 1717784
  Target Milestone: ---
    Classification: Community



+++ This bug was initially created as a clone of Bug #1718316 +++

+++ This bug was initially created as a clone of Bug #1717784 +++

Description of problem:
=========================

Ganesha-gfapi logs are flooded with errors messages related to
gf_uuid_is_null(gfid), when linux untars and lookups are running from multiple
clients-

---------
[2019-06-06 07:56:12.503603] E [glfs-handleops.c:1892:glfs_h_find_handle]
(-->/lib64/libgfapi.so.0(+0xe0ae) [0x7f7e91e8b0ae]
-->/lib64/libgfapi.so.0(+0x258f1) [0x7f7e91ea28f1]
-->/lib64/libgfapi.so.0(+0x257c4) [0x7f7e91ea27c4] ) 0-glfs_h_find_handle:
invalid argument: !(gf_uuid_is_null(gfid)) [Invalid argument]

---------


Version-Release number of selected component (if applicable):
===========================
# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.7 Beta (Maipo)

# rpm -qa | grep ganesha
nfs-ganesha-2.7.3-3.el7rhgs.x86_64
glusterfs-ganesha-6.0-3.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.7.3-3.el7rhgs.x86_64
nfs-ganesha-gluster-2.7.3-3.el7rhgs.x86_64



How reproducible:
=====================
2/2


Steps to Reproduce:
======================
1.Create 4 node Ganesha cluster
2.Create 4*3 Distribute-replicate Volume.Export the volume via Ganesha
3.Mount the volume on 4 clients via v4.1 protocol 
4.Run the following workload

Client 1: Run linux untars
Client 2: du -sh in loop
Client 3: ls -lRt in loop
Client 4: find's in loop

Actual results:
==================

While test is running,ganesha-gfapi logs are flooded with errors related to
"gf_uuid_is_null"

======
[2019-06-03 16:54:19.829136] E [glfs-handleops.c:1892:glfs_h_find_handle]
(-->/lib64/libgfapi.so.0(+0xe0ae) [0x7ff6902d00ae]
-->/lib64/libgfapi.so.0(+0x2594a) [0x7ff6902e794a]
-->/lib64/libgfapi.so.0(+0x257c4) [0x7ff6902e77c4] ) 0-glfs_h_find_handle:
invalid argument: !(gf_uuid_is_null(gfid)) [Invalid argument]
[2019-06-03 16:54:20.006163] E [glfs-handleops.c:1892:glfs_h_find_handle]
(-->/lib64/libgfapi.so.0(+0xe0ae) [0x7ff6902d00ae]
-->/lib64/libgfapi.so.0(+0x2594a) [0x7ff6902e794a]
-->/lib64/libgfapi.so.0(+0x257c4) [0x7ff6902e77c4] ) 0-glfs_h_find_handle:
invalid argument: !(gf_uuid_is_null(gfid)) [Invalid argument]
[2019-06-03 16:54:20.320293] E [glfs-handleops.c:1892:glfs_h_find_handle]
(-->/lib64/libgfapi.so.0(+0xe0ae) [0x7ff6902d00ae]
-->/lib64/libgfapi.so.0(+0x2594a) [0x7ff6902e794a]
-->/lib64/libgfapi.so.0(+0x257c4) [0x7ff6902e77c4] ) 0-glfs_h_find_handle:
invalid argument: !(gf_uuid_is_null(gfid)) [Invalid argument]
=====

# cat /var/log/ganesha/ganesha-gfapi.log | grep gf_uuid_is_null | wc -l
605340


Expected results:
===================
There should not be error messages in ganesha-gfapi.logs

Additional info:
===================

On narrowing down the test scenario,Seems to be the error messages are coming
when only du -sh and ls -lRt are running in loop from two different clients

--- Additional comment from RHEL Product and Program Management on 2019-06-06
08:10:27 UTC ---

This bug is automatically being proposed for the next minor release of Red Hat
Gluster Storage by setting the release flag 'rhgs‑3.5.0' to '?'. 

If this bug should be proposed for a different release, please manually change
the proposed release flag.

--- Additional comment from Soumya Koduri on 2019-06-06 09:48:36 UTC ---

@Manisha,

are these clients connected to different NFS-Ganesha servers? On which machine
did you observe these errors? I do not see such messages in the sosreports
uploaded. 

>>> On narrowing down the test scenario,Seems to be the error messages are coming when only du -sh and ls -lRt are running in loop from two different clients

Does this mean, these messages are not seen with just linux untar test?

--- Additional comment from Manisha Saini on 2019-06-06 10:16:00 UTC ---

(In reply to Soumya Koduri from comment #3)
> @Manisha,
> 
> are these clients connected to different NFS-Ganesha servers? On which
> machine did you observe these errors? I do not see such messages in the
> sosreports uploaded. 

Hi soumya,

All the clients are connected to single server VIP 
I see there is some issue with how sosreport collecting ganesha logs.All logs
are not captured as part of sosreport.
> 
> >>> On narrowing down the test scenario,Seems to be the error messages are coming when only du -sh and ls -lRt are running in loop from two different clients
> 
> Does this mean, these messages are not seen with just linux untar test?

No.Not seen with only untars

--- Additional comment from Soumya Koduri on 2019-06-07 10:08:03 UTC ---

Thanks Manisha for sharing the setup and logs.

"0-glfs_h_find_handle: invalid argument: !(gf_uuid_is_null(gfid)) [Invalid
argument] "

The above message is logged while processing upcall requests. Somehow the gfid
passed has become NULL. IMO there are two issues to be considered here -

> there are so many upcall requests generated even though there is only single server serving all the clients.

Seems like the data being accessed is huge and hence the server is trying clean
up the inodes from the lru list. While destroying a inode, upcall xlator sends
cache invalidation request to all its clients to notify that the particular
file/inode entry is no more cached by the server. This logic can be optimized a
bit here. 
For nameless lookups, server generates a dummy inode (say inodeD) and later
links it to inode (if there is no entry already present for that file/dir) in
the cbk path. So as part of lookup_cbk, though the inode (inodeD) received is
invalid, upcall xlator creates an inode_ctx entry as it eventually can get
linked to the inode table. However in certain cases, if there is already an
inode (say inodeC) present for that particular file, this new inode (inodeD)
created will be purged, which results in sending upcall notifications to the
clients.

in Manisha's testcase, as the data created is huge and being looked up in a
loop, there are many such dummy inode entries getting purged resulting in huge
number of upcall notifications sent to the client. We can avoid this issue to
an extent by checking if the inode is valid or not (i.e, linked or not) before
sending callback notifications. 

note - this has been day-1 issue but good to be fixed.

* Another issue is gfid becoming NULL in upcall args. 

> I couldn't reproduce this issue on my setup. However seems like in upcall xlator we already check if the gfid is not NULL before sending notification.

    GF_VALIDATE_OR_GOTO("upcall_client_cache_invalidate",
                        !(gf_uuid_is_null(gfid)), out);

So that means somewhere in the client processing, gfid has become NULL. From
further code-reading I see a potential issue in upcall processing callback
function -

In glfs_cbk_upcall_data(),

--
    args->fs = fs;
    args->upcall_data = gf_memdup(upcall_data, sizeof(*upcall_data));
--

gf_memdup() may not be the right routine to use here as upcall_data structure
contains pointers to other data. This definitely needs to be fixed. However
would like to re-confirm if this caused gfid to become NULL. Request Manisha to
share setup (if possible) while the tests going on to confirm this theory.
Thanks!

--- Additional comment from Worker Ant on 2019-06-07 14:09:44 UTC ---

REVIEW: https://review.gluster.org/22839 (gfapi: fix incorrect initialization
of upcall syncop arguments) posted (#1) for review on master by soumya k


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1717784
[Bug 1717784] Ganesha-gfapi logs are flooded with error messages related to
"gf_uuid_is_null(gfid)) [Invalid argument]" when lookups are running from
multiple clients
https://bugzilla.redhat.com/show_bug.cgi?id=1718316
[Bug 1718316] Ganesha-gfapi logs are flooded with error messages related to
"gf_uuid_is_null(gfid)) [Invalid argument]" when lookups are running from
multiple clients
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list