[Gluster-devel] Spurious failures because of nfs and snapshots

Pranith Kumar Karampuri pkarampu at redhat.com
Wed May 21 10:26:22 UTC 2014



----- Original Message -----
> From: "Atin Mukherjee" <amukherj at redhat.com>
> To: gluster-devel at gluster.org, "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> Sent: Wednesday, May 21, 2014 3:39:21 PM
> Subject: Re: Fwd: Re: [Gluster-devel] Spurious failures because of nfs and snapshots
> 
> 
> 
> On 05/21/2014 11:42 AM, Atin Mukherjee wrote:
> > 
> > 
> > On 05/21/2014 10:54 AM, SATHEESARAN wrote:
> >> Guys,
> >>
> >> This is the issue pointed out by Pranith with regard to Barrier.
> >> I was reading through it.
> >>
> >> But I wanted to bring it to concern
> >>
> >> -- S
> >>
> >>
> >> -------- Original Message --------
> >> Subject: 	Re: [Gluster-devel] Spurious failures because of nfs and
> >> snapshots
> >> Date: 	Tue, 20 May 2014 21:16:57 -0400 (EDT)
> >> From: 	Pranith Kumar Karampuri <pkarampu at redhat.com>
> >> To: 	Vijaikumar M <vmallika at redhat.com>, Joseph Fernandes
> >> <josferna at redhat.com>
> >> CC: 	Gluster Devel <gluster-devel at gluster.org>
> >>
> >>
> >>
> >> Hey,
> >>     Seems like even after this fix is merged, the regression tests are
> >>     failing for the same script. You can check the logs at
> >>     http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a14%3a06%3a46.tgz
> > Pranith,
> > 
> > Is this the correct link? I don't see any log having this sequence there.
> > Also looking at the log from this mail, this is expected as per the
> > barrier functionality, an enable request followed by another enable
> > should always fail and the same happens for disable.
> > 
> > Can you please confirm the link and which particular regression test is
> > causing this issue, is it bug-1090042.t?
> > 
> > --Atin
> >>
> >> Relevant logs:
> >> [2014-05-20 20:17:07.026045]  : volume create patchy
> >> build.gluster.org:/d/backends/patchy1
> >> build.gluster.org:/d/backends/patchy2 : SUCCESS
> >> [2014-05-20 20:17:08.030673]  : volume start patchy : SUCCESS
> >> [2014-05-20 20:17:08.279148]  : volume barrier patchy enable : SUCCESS
> >> [2014-05-20 20:17:08.476785]  : volume barrier patchy enable : FAILED :
> >> Failed to reconfigure barrier.
> >> [2014-05-20 20:17:08.727429]  : volume barrier patchy disable : SUCCESS
> >> [2014-05-20 20:17:08.926995]  : volume barrier patchy disable : FAILED :
> >> Failed to reconfigure barrier.
> >>
> This log is for bug-1092841.t and its expected.

Damn :-(. I think I screwed up the timestamps while checking.... Sorry about that :-(. But there are failures. Check http://build.gluster.org/job/regression/4501/consoleFull

Pranith

> 
> --Atin
> >> Pranith
> >>
> >> ----- Original Message -----
> >>> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> >>> To: "Gluster Devel" <gluster-devel at gluster.org>
> >>> Cc: "Joseph Fernandes" <josferna at redhat.com>, "Vijaikumar M"
> >>> <vmallika at redhat.com>
> >>> Sent: Tuesday, May 20, 2014 3:41:11 PM
> >>> Subject: Re: Spurious failures because of nfs and snapshots
> >>>
> >>> hi,
> >>>     Please resubmit the patches on top of
> >>>     http://review.gluster.com/#/c/7753
> >>>     to prevent frequent regression failures.
> >>>
> >>> Pranith
> >>> ----- Original Message -----
> >>>> From: "Vijaikumar M" <vmallika at redhat.com>
> >>>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> >>>> Cc: "Joseph Fernandes" <josferna at redhat.com>, "Gluster Devel"
> >>>> <gluster-devel at gluster.org>
> >>>> Sent: Monday, May 19, 2014 2:40:47 PM
> >>>> Subject: Re: Spurious failures because of nfs and snapshots
> >>>>
> >>>> Brick disconnected with ping-time out:
> >>>>
> >>>> Here is the log message
> >>>> [2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main]
> >>>> 0-/build/install/sbin/glusterfsd: Started running /build/install/sbi
> >>>> n/glusterfsd version 3.5qa2 (args: /build/install/sbin/glusterfsd -s
> >>>> build.gluster.org --volfile-id /snaps/patchy_snap1/3f2ae3fbb4a74587b1a9
> >>>> 1013f07d327f.build.gluster.org.var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3
> >>>> -p /var/lib/glusterd/snaps/patchy_snap1/3f2ae3f
> >>>> bb4a74587b1a91013f07d327f/run/build.gluster.org-var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.pid
> >>>> -S /var/run/51fe50a6faf0aa    e006c815da946caf3a.socket --brick-name
> >>>> /var/run/gluster/snaps/3f2ae3fbb4a74587b1a91013f07d327f/brick3 -l
> >>>> /build/install/var/log/glusterfs/br
> >>>> icks/var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.log
> >>>> --xlator-option *-posix.glusterd-uuid=494ef3cd-15fc-4c8c-8751-2d441ba
> >>>> 7b4b0 --brick-port 49164 --xlator-option
> >>>> 3f2ae3fbb4a74587b1a91013f07d327f-server.listen-port=49164)
> >>>>    2 [2014-05-19 04:29:38.141118] I
> >>>> [rpc-clnt.c:988:rpc_clnt_connection_init] 0-glusterfs: defaulting
> >>>> ping-timeout to 30secs
> >>>>    3 [2014-05-19 04:30:09.139521] C
> >>>> [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-glusterfs: server
> >>>> 10.3.129.13:24007 has not responded in the last 30 seconds,
> >>>> disconnecting.
> >>>>
> >>>>
> >>>>
> >>>> Patch 'http://review.gluster.org/#/c/7753/' will fix the problem, where
> >>>> ping-timer will be disabled by default for all the rpc connection except
> >>>> for glusterd-glusterd (set to 30sec) and client-glusterd (set to 42sec).
> >>>>
> >>>>
> >>>> Thanks,
> >>>> Vijay
> >>>>
> >>>>
> >>>> On Monday 19 May 2014 11:56 AM, Pranith Kumar Karampuri wrote:
> >>>>> The latest build failure also has the same issue:
> >>>>> Download it from here:
> >>>>> http://build.gluster.org:443/logs/glusterfs-logs-20140518%3a22%3a27%3a31.tgz
> >>>>>
> >>>>> Pranith
> >>>>>
> >>>>> ----- Original Message -----
> >>>>>> From: "Vijaikumar M" <vmallika at redhat.com>
> >>>>>> To: "Joseph Fernandes" <josferna at redhat.com>
> >>>>>> Cc: "Pranith Kumar Karampuri" <pkarampu at redhat.com>, "Gluster Devel"
> >>>>>> <gluster-devel at gluster.org>
> >>>>>> Sent: Monday, 19 May, 2014 11:41:28 AM
> >>>>>> Subject: Re: Spurious failures because of nfs and snapshots
> >>>>>>
> >>>>>> Hi Joseph,
> >>>>>>
> >>>>>> In the log mentioned below, it say ping-time is set to default value
> >>>>>> 30sec.I think issue is different.
> >>>>>> Can you please point me to the logs where you where able to re-create
> >>>>>> the problem.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Vijay
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Monday 19 May 2014 09:39 AM, Pranith Kumar Karampuri wrote:
> >>>>>>> hi Vijai, Joseph,
> >>>>>>>       In 2 of the last 3 build failures,
> >>>>>>>       http://build.gluster.org/job/regression/4479/console,
> >>>>>>>       http://build.gluster.org/job/regression/4478/console this
> >>>>>>>       test(tests/bugs/bug-1090042.t) failed. Do you guys think it is
> >>>>>>>       better
> >>>>>>>       to revert this test until the fix is available? Please send a
> >>>>>>>       patch
> >>>>>>>       to revert the test case if you guys feel so. You can re-submit
> >>>>>>>       it
> >>>>>>>       along with the fix to the bug mentioned by Joseph.
> >>>>>>>
> >>>>>>> Pranith.
> >>>>>>>
> >>>>>>> ----- Original Message -----
> >>>>>>>> From: "Joseph Fernandes" <josferna at redhat.com>
> >>>>>>>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> >>>>>>>> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> >>>>>>>> Sent: Friday, 16 May, 2014 5:13:57 PM
> >>>>>>>> Subject: Re: Spurious failures because of nfs and snapshots
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi All,
> >>>>>>>>
> >>>>>>>> tests/bugs/bug-1090042.t :
> >>>>>>>>
> >>>>>>>> I was able to reproduce the issue i.e when this test is done in a
> >>>>>>>> loop
> >>>>>>>>
> >>>>>>>> for i in {1..135} ; do  ./bugs/bug-1090042.t
> >>>>>>>>
> >>>>>>>> When checked the logs
> >>>>>>>> [2014-05-16 10:49:49.003978] I
> >>>>>>>> [rpc-clnt.c:973:rpc_clnt_connection_init]
> >>>>>>>> 0-management: setting frame-timeout to 600
> >>>>>>>> [2014-05-16 10:49:49.004035] I
> >>>>>>>> [rpc-clnt.c:988:rpc_clnt_connection_init]
> >>>>>>>> 0-management: defaulting ping-timeout to 30secs
> >>>>>>>> [2014-05-16 10:49:49.004303] I
> >>>>>>>> [rpc-clnt.c:973:rpc_clnt_connection_init]
> >>>>>>>> 0-management: setting frame-timeout to 600
> >>>>>>>> [2014-05-16 10:49:49.004340] I
> >>>>>>>> [rpc-clnt.c:988:rpc_clnt_connection_init]
> >>>>>>>> 0-management: defaulting ping-timeout to 30secs
> >>>>>>>>
> >>>>>>>> The issue is with ping-timeout and is tracked under the bug
> >>>>>>>>
> >>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1096729
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> The workaround is mentioned in
> >>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1096729#c8
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Joe
> >>>>>>>>
> >>>>>>>> ----- Original Message -----
> >>>>>>>> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> >>>>>>>> To: "Gluster Devel" <gluster-devel at gluster.org>
> >>>>>>>> Cc: "Joseph Fernandes" <josferna at redhat.com>
> >>>>>>>> Sent: Friday, May 16, 2014 6:19:54 AM
> >>>>>>>> Subject: Spurious failures because of nfs and snapshots
> >>>>>>>>
> >>>>>>>> hi,
> >>>>>>>>       In the latest build I fired for review.gluster.com/7766
> >>>>>>>>       (http://build.gluster.org/job/regression/4443/console) failed
> >>>>>>>>       because
> >>>>>>>>       of
> >>>>>>>>       spurious failure. The script doesn't wait for nfs export to be
> >>>>>>>>       available. I fixed that, but interestingly I found quite a few
> >>>>>>>>       scripts
> >>>>>>>>       with same problem. Some of the scripts are relying on 'sleep
> >>>>>>>>       5'
> >>>>>>>>       which
> >>>>>>>>       also could lead to spurious failures if the export is not
> >>>>>>>>       available
> >>>>>>>>       in 5
> >>>>>>>>       seconds. We found that waiting for 20 seconds is better, but
> >>>>>>>>       'sleep
> >>>>>>>>       20'
> >>>>>>>>       would unnecessarily delay the build execution. So if you guys
> >>>>>>>>       are
> >>>>>>>>       going
> >>>>>>>>       to write any scripts which has to do nfs mounts, please do it
> >>>>>>>>       the
> >>>>>>>>       following way:
> >>>>>>>>
> >>>>>>>> EXPECT_WITHIN 20 "1" is_nfs_export_available;
> >>>>>>>> TEST mount -t nfs -o vers=3 $H0:/$V0 $N0;
> >>>>>>>>
> >>>>>>>> Please review http://review.gluster.com/7773 :-)
> >>>>>>>>
> >>>>>>>> I saw one more spurious failure in a snapshot related script
> >>>>>>>> tests/bugs/bug-1090042.t on the next build fired by Niels.
> >>>>>>>> Joesph (CCed) is debugging it. He agreed to reply what he finds and
> >>>>>>>> share
> >>>>>>>> it
> >>>>>>>> with us so that we won't introduce similar bugs in future.
> >>>>>>>>
> >>>>>>>> I encourage you guys to share what you fix to prevent spurious
> >>>>>>>> failures
> >>>>>>>> in
> >>>>>>>> future.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> Pranith
> >>>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> Gluster-devel at gluster.org
> >> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> >>
> >>
> >>
> 


More information about the Gluster-devel mailing list