[Gluster-devel] Spurious failures because of nfs and snapshots

Pranith Kumar Karampuri pkarampu at redhat.com
Wed May 21 01:16:57 UTC 2014


Hey,
    Seems like even after this fix is merged, the regression tests are failing for the same script. You can check the logs at http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a14%3a06%3a46.tgz

Relevant logs:
[2014-05-20 20:17:07.026045]  : volume create patchy build.gluster.org:/d/backends/patchy1 build.gluster.org:/d/backends/patchy2 : SUCCESS
[2014-05-20 20:17:08.030673]  : volume start patchy : SUCCESS
[2014-05-20 20:17:08.279148]  : volume barrier patchy enable : SUCCESS
[2014-05-20 20:17:08.476785]  : volume barrier patchy enable : FAILED : Failed to reconfigure barrier.
[2014-05-20 20:17:08.727429]  : volume barrier patchy disable : SUCCESS
[2014-05-20 20:17:08.926995]  : volume barrier patchy disable : FAILED : Failed to reconfigure barrier.

Pranith

----- Original Message -----
> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> To: "Gluster Devel" <gluster-devel at gluster.org>
> Cc: "Joseph Fernandes" <josferna at redhat.com>, "Vijaikumar M" <vmallika at redhat.com>
> Sent: Tuesday, May 20, 2014 3:41:11 PM
> Subject: Re: Spurious failures because of nfs and snapshots
> 
> hi,
>     Please resubmit the patches on top of http://review.gluster.com/#/c/7753
>     to prevent frequent regression failures.
> 
> Pranith
> ----- Original Message -----
> > From: "Vijaikumar M" <vmallika at redhat.com>
> > To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> > Cc: "Joseph Fernandes" <josferna at redhat.com>, "Gluster Devel"
> > <gluster-devel at gluster.org>
> > Sent: Monday, May 19, 2014 2:40:47 PM
> > Subject: Re: Spurious failures because of nfs and snapshots
> > 
> > Brick disconnected with ping-time out:
> > 
> > Here is the log message
> > [2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main]
> > 0-/build/install/sbin/glusterfsd: Started running /build/install/sbi
> > n/glusterfsd version 3.5qa2 (args: /build/install/sbin/glusterfsd -s
> > build.gluster.org --volfile-id /snaps/patchy_snap1/3f2ae3fbb4a74587b1a9
> > 1013f07d327f.build.gluster.org.var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3
> > -p /var/lib/glusterd/snaps/patchy_snap1/3f2ae3f
> > bb4a74587b1a91013f07d327f/run/build.gluster.org-var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.pid
> > -S /var/run/51fe50a6faf0aa    e006c815da946caf3a.socket --brick-name
> > /var/run/gluster/snaps/3f2ae3fbb4a74587b1a91013f07d327f/brick3 -l
> > /build/install/var/log/glusterfs/br
> > icks/var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.log
> > --xlator-option *-posix.glusterd-uuid=494ef3cd-15fc-4c8c-8751-2d441ba
> > 7b4b0 --brick-port 49164 --xlator-option
> > 3f2ae3fbb4a74587b1a91013f07d327f-server.listen-port=49164)
> >    2 [2014-05-19 04:29:38.141118] I
> > [rpc-clnt.c:988:rpc_clnt_connection_init] 0-glusterfs: defaulting
> > ping-timeout to 30secs
> >    3 [2014-05-19 04:30:09.139521] C
> > [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-glusterfs: server
> > 10.3.129.13:24007 has not responded in the last 30 seconds, disconnecting.
> > 
> > 
> > 
> > Patch 'http://review.gluster.org/#/c/7753/' will fix the problem, where
> > ping-timer will be disabled by default for all the rpc connection except
> > for glusterd-glusterd (set to 30sec) and client-glusterd (set to 42sec).
> > 
> > 
> > Thanks,
> > Vijay
> > 
> > 
> > On Monday 19 May 2014 11:56 AM, Pranith Kumar Karampuri wrote:
> > > The latest build failure also has the same issue:
> > > Download it from here:
> > > http://build.gluster.org:443/logs/glusterfs-logs-20140518%3a22%3a27%3a31.tgz
> > >
> > > Pranith
> > >
> > > ----- Original Message -----
> > >> From: "Vijaikumar M" <vmallika at redhat.com>
> > >> To: "Joseph Fernandes" <josferna at redhat.com>
> > >> Cc: "Pranith Kumar Karampuri" <pkarampu at redhat.com>, "Gluster Devel"
> > >> <gluster-devel at gluster.org>
> > >> Sent: Monday, 19 May, 2014 11:41:28 AM
> > >> Subject: Re: Spurious failures because of nfs and snapshots
> > >>
> > >> Hi Joseph,
> > >>
> > >> In the log mentioned below, it say ping-time is set to default value
> > >> 30sec.I think issue is different.
> > >> Can you please point me to the logs where you where able to re-create
> > >> the problem.
> > >>
> > >> Thanks,
> > >> Vijay
> > >>
> > >>
> > >>
> > >> On Monday 19 May 2014 09:39 AM, Pranith Kumar Karampuri wrote:
> > >>> hi Vijai, Joseph,
> > >>>       In 2 of the last 3 build failures,
> > >>>       http://build.gluster.org/job/regression/4479/console,
> > >>>       http://build.gluster.org/job/regression/4478/console this
> > >>>       test(tests/bugs/bug-1090042.t) failed. Do you guys think it is
> > >>>       better
> > >>>       to revert this test until the fix is available? Please send a
> > >>>       patch
> > >>>       to revert the test case if you guys feel so. You can re-submit it
> > >>>       along with the fix to the bug mentioned by Joseph.
> > >>>
> > >>> Pranith.
> > >>>
> > >>> ----- Original Message -----
> > >>>> From: "Joseph Fernandes" <josferna at redhat.com>
> > >>>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> > >>>> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> > >>>> Sent: Friday, 16 May, 2014 5:13:57 PM
> > >>>> Subject: Re: Spurious failures because of nfs and snapshots
> > >>>>
> > >>>>
> > >>>> Hi All,
> > >>>>
> > >>>> tests/bugs/bug-1090042.t :
> > >>>>
> > >>>> I was able to reproduce the issue i.e when this test is done in a loop
> > >>>>
> > >>>> for i in {1..135} ; do  ./bugs/bug-1090042.t
> > >>>>
> > >>>> When checked the logs
> > >>>> [2014-05-16 10:49:49.003978] I
> > >>>> [rpc-clnt.c:973:rpc_clnt_connection_init]
> > >>>> 0-management: setting frame-timeout to 600
> > >>>> [2014-05-16 10:49:49.004035] I
> > >>>> [rpc-clnt.c:988:rpc_clnt_connection_init]
> > >>>> 0-management: defaulting ping-timeout to 30secs
> > >>>> [2014-05-16 10:49:49.004303] I
> > >>>> [rpc-clnt.c:973:rpc_clnt_connection_init]
> > >>>> 0-management: setting frame-timeout to 600
> > >>>> [2014-05-16 10:49:49.004340] I
> > >>>> [rpc-clnt.c:988:rpc_clnt_connection_init]
> > >>>> 0-management: defaulting ping-timeout to 30secs
> > >>>>
> > >>>> The issue is with ping-timeout and is tracked under the bug
> > >>>>
> > >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1096729
> > >>>>
> > >>>>
> > >>>> The workaround is mentioned in
> > >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1096729#c8
> > >>>>
> > >>>>
> > >>>> Regards,
> > >>>> Joe
> > >>>>
> > >>>> ----- Original Message -----
> > >>>> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> > >>>> To: "Gluster Devel" <gluster-devel at gluster.org>
> > >>>> Cc: "Joseph Fernandes" <josferna at redhat.com>
> > >>>> Sent: Friday, May 16, 2014 6:19:54 AM
> > >>>> Subject: Spurious failures because of nfs and snapshots
> > >>>>
> > >>>> hi,
> > >>>>       In the latest build I fired for review.gluster.com/7766
> > >>>>       (http://build.gluster.org/job/regression/4443/console) failed
> > >>>>       because
> > >>>>       of
> > >>>>       spurious failure. The script doesn't wait for nfs export to be
> > >>>>       available. I fixed that, but interestingly I found quite a few
> > >>>>       scripts
> > >>>>       with same problem. Some of the scripts are relying on 'sleep 5'
> > >>>>       which
> > >>>>       also could lead to spurious failures if the export is not
> > >>>>       available
> > >>>>       in 5
> > >>>>       seconds. We found that waiting for 20 seconds is better, but
> > >>>>       'sleep
> > >>>>       20'
> > >>>>       would unnecessarily delay the build execution. So if you guys
> > >>>>       are
> > >>>>       going
> > >>>>       to write any scripts which has to do nfs mounts, please do it
> > >>>>       the
> > >>>>       following way:
> > >>>>
> > >>>> EXPECT_WITHIN 20 "1" is_nfs_export_available;
> > >>>> TEST mount -t nfs -o vers=3 $H0:/$V0 $N0;
> > >>>>
> > >>>> Please review http://review.gluster.com/7773 :-)
> > >>>>
> > >>>> I saw one more spurious failure in a snapshot related script
> > >>>> tests/bugs/bug-1090042.t on the next build fired by Niels.
> > >>>> Joesph (CCed) is debugging it. He agreed to reply what he finds and
> > >>>> share
> > >>>> it
> > >>>> with us so that we won't introduce similar bugs in future.
> > >>>>
> > >>>> I encourage you guys to share what you fix to prevent spurious
> > >>>> failures
> > >>>> in
> > >>>> future.
> > >>>>
> > >>>> Thanks
> > >>>> Pranith
> > >>>>
> > >>
> > 
> > 
> 


More information about the Gluster-devel mailing list