[Gluster-devel] Spurious failure of ./tests/bugs/glusterd/bug-913555.t

Raghavendra Gowdappa rgowdapp at redhat.com
Wed Oct 19 04:13:09 UTC 2016



----- Original Message -----
> From: "Atin Mukherjee" <amukherj at redhat.com>
> To: "Oleksandr Natalenko" <oleksandr at natalenko.name>, "Nithya Balachandran" <nbalacha at redhat.com>, "Raghavendra
> Gowdappa" <rgowdapp at redhat.com>, "Shyam Ranganathan" <srangana at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Tuesday, October 18, 2016 9:58:07 PM
> Subject: Re: [Gluster-devel] Spurious failure of ./tests/bugs/glusterd/bug-913555.t
> 
> Final reminder before I take out the test case from the test file.
> 
> On Thursday 13 October 2016, Atin Mukherjee <amukherj at redhat.com> wrote:
> 
> >
> >
> > On Wednesday 12 October 2016, Atin Mukherjee <amukherj at redhat.com> wrote:
> >
> >> So the test fails (intermittently) in check_fs which tries to do a df on
> >> the mount point for a volume which is carved out of three bricks from 3
> >> nodes and one node is completely down. A quick look at the mount log
> >> reveals the following:
> >>
> >> [2016-10-10 13:58:59.279446]:++++++++++
> >> G_LOG:./tests/bugs/glusterd/bug-913555.t:
> >> TEST: 48 0 check_fs /mnt/glusterfs/0 ++++++++++
> >> [2016-10-10 13:58:59.287973] W [MSGID: 114031]
> >> [client-rpc-fops.c:2930:client3_3_lookup_cbk] 0-patchy-client-2:
> >> remote operation failed. Path: / (00000000-0000-0000-0000-000000000001)
> >> [Transport endpoint is not connected]
> >> [2016-10-10 13:58:59.288326] I [MSGID: 109063]
> >> [dht-layout.c:713:dht_layout_normalize] 0-patchy-dht: Found anomalies in
> >> / (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0
> >> [2016-10-10 13:58:59.288352] W [MSGID: 109005]
> >> [dht-selfheal.c:2102:dht_selfheal_directory] 0-patchy-dht: Directory
> >> selfheal failed: 1 subvolumes down.Not fixing. path = /, gfid =
> >> [2016-10-10 13:58:59.288643] W [MSGID: 114031]
> >> [client-rpc-fops.c:2930:client3_3_lookup_cbk] 0-patchy-client-2:
> >> remote operation failed. Path: / (00000000-0000-0000-0000-000000000001)
> >> [Transport endpoint is not connected]
> >> [2016-10-10 13:58:59.288927] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk]
> >> 0-fuse: 00000000-0000-0000-0000-           000000000001: failed to
> >> resolve (Stale file handle)
> >> [2016-10-10 13:58:59.288949] W [fuse-bridge.c:2597:fuse_opendir_resume]
> >> 0-glusterfs-fuse: 7: OPENDIR (00000000-0000- 0000-0000-000000000001)
> >> resolution failed
> >> [2016-10-10 13:58:59.289505] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk]
> >> 0-fuse: 00000000-0000-0000-0000-           000000000001: failed to
> >> resolve (Stale file handle)
> >> [2016-10-10 13:58:59.289524] W [fuse-bridge.c:3137:fuse_statfs_resume]
> >> 0-glusterfs-fuse: 8: STATFS (00000000-0000-   0000-0000-000000000001)
> >> resolution fail
> >>
> >> DHT  team - are these anomalies expected here? I also see opendir and
> >> statfs failing here too.

Not sure whether anomalies are expected or not. But the thing is they've no bearing on statfs. Irrespective of self-heal results, lookup is successful (if it is successful on at least one subvol). So, I don't see a DHT issue here. However, the logs point out that resolution of root gfid failed and hence statfs couldn't be resumed. It would be worthwhile to look into where/why lookup on gfid 0x1 failed.

> >>
> >
> > Any luck with this? I don't see any relevance of having a check_fs test
> > w.r.t the bug this test case is tagged to. If I don't get to hear on this
> > in few days, I'd go ahead and remove this check from the test to avoid the
> > spurious failure.
> >
> >
> >>
> >>
> >> On Wed, Oct 12, 2016 at 12:18 PM, Atin Mukherjee <amukherj at redhat.com>
> >> wrote:
> >>
> >>> I will take a look at it in sometime.
> >>>
> >>> On Wed, Oct 12, 2016 at 12:08 PM, Oleksandr Natalenko <
> >>> oleksandr at natalenko.name> wrote:
> >>>
> >>>> Hello.
> >>>>
> >>>> Vijay asked me to drop a note about spurious failure of
> >>>> ./tests/bugs/glusterd/bug-913555.t test. Here are the examples:
> >>>>
> >>>> * https://build.gluster.org/job/centos6-regression/1069/consoleFull
> >>>> * https://build.gluster.org/job/centos6-regression/1076/consoleFull
> >>>>
> >>>> Could someone take a look at it?
> >>>>
> >>>> Also, last two tests were broken because of this:
> >>>>
> >>>> ===
> >>>> Slave went offline during the build
> >>>> ===
> >>>>
> >>>> See these builds for details:
> >>>>
> >>>> * https://build.gluster.org/job/centos6-regression/1077/consoleFull
> >>>> * https://build.gluster.org/job/centos6-regression/1078/consoleFull
> >>>>
> >>>> Was that intentionally?
> >>>>
> >>>> Thanks.
> >>>>
> >>>> Regards,
> >>>>   Oleksandr
> >>>> _______________________________________________
> >>>> Gluster-devel mailing list
> >>>> Gluster-devel at gluster.org
> >>>> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> --Atin
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> --Atin
> >>
> >
> >
> > --
> > --Atin
> >
> 
> 
> --
> --Atin
> 


More information about the Gluster-devel mailing list