[Gluster-devel] Fwd: Re: Spurious failures because of nfs and snapshots

Atin Mukherjee amukherj at redhat.com
Wed May 21 06:12:39 UTC 2014



On 05/21/2014 10:54 AM, SATHEESARAN wrote:
> Guys,
> 
> This is the issue pointed out by Pranith with regard to Barrier.
> I was reading through it.
> 
> But I wanted to bring it to concern
> 
> -- S
> 
> 
> -------- Original Message --------
> Subject: 	Re: [Gluster-devel] Spurious failures because of nfs and
> snapshots
> Date: 	Tue, 20 May 2014 21:16:57 -0400 (EDT)
> From: 	Pranith Kumar Karampuri <pkarampu at redhat.com>
> To: 	Vijaikumar M <vmallika at redhat.com>, Joseph Fernandes
> <josferna at redhat.com>
> CC: 	Gluster Devel <gluster-devel at gluster.org>
> 
> 
> 
> Hey,
>     Seems like even after this fix is merged, the regression tests are failing for the same script. You can check the logs at http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a14%3a06%3a46.tgz
Pranith,

Is this the correct link? I don't see any log having this sequence there.
Also looking at the log from this mail, this is expected as per the
barrier functionality, an enable request followed by another enable
should always fail and the same happens for disable.

Can you please confirm the link and which particular regression test is
causing this issue, is it bug-1090042.t?

--Atin
> 
> Relevant logs:
> [2014-05-20 20:17:07.026045]  : volume create patchy build.gluster.org:/d/backends/patchy1 build.gluster.org:/d/backends/patchy2 : SUCCESS
> [2014-05-20 20:17:08.030673]  : volume start patchy : SUCCESS
> [2014-05-20 20:17:08.279148]  : volume barrier patchy enable : SUCCESS
> [2014-05-20 20:17:08.476785]  : volume barrier patchy enable : FAILED : Failed to reconfigure barrier.
> [2014-05-20 20:17:08.727429]  : volume barrier patchy disable : SUCCESS
> [2014-05-20 20:17:08.926995]  : volume barrier patchy disable : FAILED : Failed to reconfigure barrier.
> 
> Pranith
> 
> ----- Original Message -----
>> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>> To: "Gluster Devel" <gluster-devel at gluster.org>
>> Cc: "Joseph Fernandes" <josferna at redhat.com>, "Vijaikumar M" <vmallika at redhat.com>
>> Sent: Tuesday, May 20, 2014 3:41:11 PM
>> Subject: Re: Spurious failures because of nfs and snapshots
>> 
>> hi,
>>     Please resubmit the patches on top of http://review.gluster.com/#/c/7753
>>     to prevent frequent regression failures.
>> 
>> Pranith
>> ----- Original Message -----
>> > From: "Vijaikumar M" <vmallika at redhat.com>
>> > To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>> > Cc: "Joseph Fernandes" <josferna at redhat.com>, "Gluster Devel"
>> > <gluster-devel at gluster.org>
>> > Sent: Monday, May 19, 2014 2:40:47 PM
>> > Subject: Re: Spurious failures because of nfs and snapshots
>> > 
>> > Brick disconnected with ping-time out:
>> > 
>> > Here is the log message
>> > [2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main]
>> > 0-/build/install/sbin/glusterfsd: Started running /build/install/sbi
>> > n/glusterfsd version 3.5qa2 (args: /build/install/sbin/glusterfsd -s
>> > build.gluster.org --volfile-id /snaps/patchy_snap1/3f2ae3fbb4a74587b1a9
>> > 1013f07d327f.build.gluster.org.var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3
>> > -p /var/lib/glusterd/snaps/patchy_snap1/3f2ae3f
>> > bb4a74587b1a91013f07d327f/run/build.gluster.org-var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.pid
>> > -S /var/run/51fe50a6faf0aa    e006c815da946caf3a.socket --brick-name
>> > /var/run/gluster/snaps/3f2ae3fbb4a74587b1a91013f07d327f/brick3 -l
>> > /build/install/var/log/glusterfs/br
>> > icks/var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.log
>> > --xlator-option *-posix.glusterd-uuid=494ef3cd-15fc-4c8c-8751-2d441ba
>> > 7b4b0 --brick-port 49164 --xlator-option
>> > 3f2ae3fbb4a74587b1a91013f07d327f-server.listen-port=49164)
>> >    2 [2014-05-19 04:29:38.141118] I
>> > [rpc-clnt.c:988:rpc_clnt_connection_init] 0-glusterfs: defaulting
>> > ping-timeout to 30secs
>> >    3 [2014-05-19 04:30:09.139521] C
>> > [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-glusterfs: server
>> > 10.3.129.13:24007 has not responded in the last 30 seconds, disconnecting.
>> > 
>> > 
>> > 
>> > Patch 'http://review.gluster.org/#/c/7753/' will fix the problem, where
>> > ping-timer will be disabled by default for all the rpc connection except
>> > for glusterd-glusterd (set to 30sec) and client-glusterd (set to 42sec).
>> > 
>> > 
>> > Thanks,
>> > Vijay
>> > 
>> > 
>> > On Monday 19 May 2014 11:56 AM, Pranith Kumar Karampuri wrote:
>> > > The latest build failure also has the same issue:
>> > > Download it from here:
>> > > http://build.gluster.org:443/logs/glusterfs-logs-20140518%3a22%3a27%3a31.tgz
>> > >
>> > > Pranith
>> > >
>> > > ----- Original Message -----
>> > >> From: "Vijaikumar M" <vmallika at redhat.com>
>> > >> To: "Joseph Fernandes" <josferna at redhat.com>
>> > >> Cc: "Pranith Kumar Karampuri" <pkarampu at redhat.com>, "Gluster Devel"
>> > >> <gluster-devel at gluster.org>
>> > >> Sent: Monday, 19 May, 2014 11:41:28 AM
>> > >> Subject: Re: Spurious failures because of nfs and snapshots
>> > >>
>> > >> Hi Joseph,
>> > >>
>> > >> In the log mentioned below, it say ping-time is set to default value
>> > >> 30sec.I think issue is different.
>> > >> Can you please point me to the logs where you where able to re-create
>> > >> the problem.
>> > >>
>> > >> Thanks,
>> > >> Vijay
>> > >>
>> > >>
>> > >>
>> > >> On Monday 19 May 2014 09:39 AM, Pranith Kumar Karampuri wrote:
>> > >>> hi Vijai, Joseph,
>> > >>>       In 2 of the last 3 build failures,
>> > >>>       http://build.gluster.org/job/regression/4479/console,
>> > >>>       http://build.gluster.org/job/regression/4478/console this
>> > >>>       test(tests/bugs/bug-1090042.t) failed. Do you guys think it is
>> > >>>       better
>> > >>>       to revert this test until the fix is available? Please send a
>> > >>>       patch
>> > >>>       to revert the test case if you guys feel so. You can re-submit it
>> > >>>       along with the fix to the bug mentioned by Joseph.
>> > >>>
>> > >>> Pranith.
>> > >>>
>> > >>> ----- Original Message -----
>> > >>>> From: "Joseph Fernandes" <josferna at redhat.com>
>> > >>>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>> > >>>> Cc: "Gluster Devel" <gluster-devel at gluster.org>
>> > >>>> Sent: Friday, 16 May, 2014 5:13:57 PM
>> > >>>> Subject: Re: Spurious failures because of nfs and snapshots
>> > >>>>
>> > >>>>
>> > >>>> Hi All,
>> > >>>>
>> > >>>> tests/bugs/bug-1090042.t :
>> > >>>>
>> > >>>> I was able to reproduce the issue i.e when this test is done in a loop
>> > >>>>
>> > >>>> for i in {1..135} ; do  ./bugs/bug-1090042.t
>> > >>>>
>> > >>>> When checked the logs
>> > >>>> [2014-05-16 10:49:49.003978] I
>> > >>>> [rpc-clnt.c:973:rpc_clnt_connection_init]
>> > >>>> 0-management: setting frame-timeout to 600
>> > >>>> [2014-05-16 10:49:49.004035] I
>> > >>>> [rpc-clnt.c:988:rpc_clnt_connection_init]
>> > >>>> 0-management: defaulting ping-timeout to 30secs
>> > >>>> [2014-05-16 10:49:49.004303] I
>> > >>>> [rpc-clnt.c:973:rpc_clnt_connection_init]
>> > >>>> 0-management: setting frame-timeout to 600
>> > >>>> [2014-05-16 10:49:49.004340] I
>> > >>>> [rpc-clnt.c:988:rpc_clnt_connection_init]
>> > >>>> 0-management: defaulting ping-timeout to 30secs
>> > >>>>
>> > >>>> The issue is with ping-timeout and is tracked under the bug
>> > >>>>
>> > >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1096729
>> > >>>>
>> > >>>>
>> > >>>> The workaround is mentioned in
>> > >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1096729#c8
>> > >>>>
>> > >>>>
>> > >>>> Regards,
>> > >>>> Joe
>> > >>>>
>> > >>>> ----- Original Message -----
>> > >>>> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>> > >>>> To: "Gluster Devel" <gluster-devel at gluster.org>
>> > >>>> Cc: "Joseph Fernandes" <josferna at redhat.com>
>> > >>>> Sent: Friday, May 16, 2014 6:19:54 AM
>> > >>>> Subject: Spurious failures because of nfs and snapshots
>> > >>>>
>> > >>>> hi,
>> > >>>>       In the latest build I fired for review.gluster.com/7766
>> > >>>>       (http://build.gluster.org/job/regression/4443/console) failed
>> > >>>>       because
>> > >>>>       of
>> > >>>>       spurious failure. The script doesn't wait for nfs export to be
>> > >>>>       available. I fixed that, but interestingly I found quite a few
>> > >>>>       scripts
>> > >>>>       with same problem. Some of the scripts are relying on 'sleep 5'
>> > >>>>       which
>> > >>>>       also could lead to spurious failures if the export is not
>> > >>>>       available
>> > >>>>       in 5
>> > >>>>       seconds. We found that waiting for 20 seconds is better, but
>> > >>>>       'sleep
>> > >>>>       20'
>> > >>>>       would unnecessarily delay the build execution. So if you guys
>> > >>>>       are
>> > >>>>       going
>> > >>>>       to write any scripts which has to do nfs mounts, please do it
>> > >>>>       the
>> > >>>>       following way:
>> > >>>>
>> > >>>> EXPECT_WITHIN 20 "1" is_nfs_export_available;
>> > >>>> TEST mount -t nfs -o vers=3 $H0:/$V0 $N0;
>> > >>>>
>> > >>>> Please review http://review.gluster.com/7773 :-)
>> > >>>>
>> > >>>> I saw one more spurious failure in a snapshot related script
>> > >>>> tests/bugs/bug-1090042.t on the next build fired by Niels.
>> > >>>> Joesph (CCed) is debugging it. He agreed to reply what he finds and
>> > >>>> share
>> > >>>> it
>> > >>>> with us so that we won't introduce similar bugs in future.
>> > >>>>
>> > >>>> I encourage you guys to share what you fix to prevent spurious
>> > >>>> failures
>> > >>>> in
>> > >>>> future.
>> > >>>>
>> > >>>> Thanks
>> > >>>> Pranith
>> > >>>>
>> > >>
>> > 
>> > 
>> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> 
> 
> 


More information about the Gluster-devel mailing list