[Gluster-devel] Fwd: Re: Spurious failures because of nfs and snapshots

Atin Mukherjee amukherj at redhat.com
Wed May 21 10:09:21 UTC 2014



On 05/21/2014 11:42 AM, Atin Mukherjee wrote:
> 
> 
> On 05/21/2014 10:54 AM, SATHEESARAN wrote:
>> Guys,
>>
>> This is the issue pointed out by Pranith with regard to Barrier.
>> I was reading through it.
>>
>> But I wanted to bring it to concern
>>
>> -- S
>>
>>
>> -------- Original Message --------
>> Subject: 	Re: [Gluster-devel] Spurious failures because of nfs and
>> snapshots
>> Date: 	Tue, 20 May 2014 21:16:57 -0400 (EDT)
>> From: 	Pranith Kumar Karampuri <pkarampu at redhat.com>
>> To: 	Vijaikumar M <vmallika at redhat.com>, Joseph Fernandes
>> <josferna at redhat.com>
>> CC: 	Gluster Devel <gluster-devel at gluster.org>
>>
>>
>>
>> Hey,
>>     Seems like even after this fix is merged, the regression tests are failing for the same script. You can check the logs at http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a14%3a06%3a46.tgz
> Pranith,
> 
> Is this the correct link? I don't see any log having this sequence there.
> Also looking at the log from this mail, this is expected as per the
> barrier functionality, an enable request followed by another enable
> should always fail and the same happens for disable.
> 
> Can you please confirm the link and which particular regression test is
> causing this issue, is it bug-1090042.t?
> 
> --Atin
>>
>> Relevant logs:
>> [2014-05-20 20:17:07.026045]  : volume create patchy build.gluster.org:/d/backends/patchy1 build.gluster.org:/d/backends/patchy2 : SUCCESS
>> [2014-05-20 20:17:08.030673]  : volume start patchy : SUCCESS
>> [2014-05-20 20:17:08.279148]  : volume barrier patchy enable : SUCCESS
>> [2014-05-20 20:17:08.476785]  : volume barrier patchy enable : FAILED : Failed to reconfigure barrier.
>> [2014-05-20 20:17:08.727429]  : volume barrier patchy disable : SUCCESS
>> [2014-05-20 20:17:08.926995]  : volume barrier patchy disable : FAILED : Failed to reconfigure barrier.
>>
This log is for bug-1092841.t and its expected.

--Atin
>> Pranith
>>
>> ----- Original Message -----
>>> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>>> To: "Gluster Devel" <gluster-devel at gluster.org>
>>> Cc: "Joseph Fernandes" <josferna at redhat.com>, "Vijaikumar M" <vmallika at redhat.com>
>>> Sent: Tuesday, May 20, 2014 3:41:11 PM
>>> Subject: Re: Spurious failures because of nfs and snapshots
>>>
>>> hi,
>>>     Please resubmit the patches on top of http://review.gluster.com/#/c/7753
>>>     to prevent frequent regression failures.
>>>
>>> Pranith
>>> ----- Original Message -----
>>>> From: "Vijaikumar M" <vmallika at redhat.com>
>>>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>>>> Cc: "Joseph Fernandes" <josferna at redhat.com>, "Gluster Devel"
>>>> <gluster-devel at gluster.org>
>>>> Sent: Monday, May 19, 2014 2:40:47 PM
>>>> Subject: Re: Spurious failures because of nfs and snapshots
>>>>
>>>> Brick disconnected with ping-time out:
>>>>
>>>> Here is the log message
>>>> [2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main]
>>>> 0-/build/install/sbin/glusterfsd: Started running /build/install/sbi
>>>> n/glusterfsd version 3.5qa2 (args: /build/install/sbin/glusterfsd -s
>>>> build.gluster.org --volfile-id /snaps/patchy_snap1/3f2ae3fbb4a74587b1a9
>>>> 1013f07d327f.build.gluster.org.var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3
>>>> -p /var/lib/glusterd/snaps/patchy_snap1/3f2ae3f
>>>> bb4a74587b1a91013f07d327f/run/build.gluster.org-var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.pid
>>>> -S /var/run/51fe50a6faf0aa    e006c815da946caf3a.socket --brick-name
>>>> /var/run/gluster/snaps/3f2ae3fbb4a74587b1a91013f07d327f/brick3 -l
>>>> /build/install/var/log/glusterfs/br
>>>> icks/var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.log
>>>> --xlator-option *-posix.glusterd-uuid=494ef3cd-15fc-4c8c-8751-2d441ba
>>>> 7b4b0 --brick-port 49164 --xlator-option
>>>> 3f2ae3fbb4a74587b1a91013f07d327f-server.listen-port=49164)
>>>>    2 [2014-05-19 04:29:38.141118] I
>>>> [rpc-clnt.c:988:rpc_clnt_connection_init] 0-glusterfs: defaulting
>>>> ping-timeout to 30secs
>>>>    3 [2014-05-19 04:30:09.139521] C
>>>> [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-glusterfs: server
>>>> 10.3.129.13:24007 has not responded in the last 30 seconds, disconnecting.
>>>>
>>>>
>>>>
>>>> Patch 'http://review.gluster.org/#/c/7753/' will fix the problem, where
>>>> ping-timer will be disabled by default for all the rpc connection except
>>>> for glusterd-glusterd (set to 30sec) and client-glusterd (set to 42sec).
>>>>
>>>>
>>>> Thanks,
>>>> Vijay
>>>>
>>>>
>>>> On Monday 19 May 2014 11:56 AM, Pranith Kumar Karampuri wrote:
>>>>> The latest build failure also has the same issue:
>>>>> Download it from here:
>>>>> http://build.gluster.org:443/logs/glusterfs-logs-20140518%3a22%3a27%3a31.tgz
>>>>>
>>>>> Pranith
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "Vijaikumar M" <vmallika at redhat.com>
>>>>>> To: "Joseph Fernandes" <josferna at redhat.com>
>>>>>> Cc: "Pranith Kumar Karampuri" <pkarampu at redhat.com>, "Gluster Devel"
>>>>>> <gluster-devel at gluster.org>
>>>>>> Sent: Monday, 19 May, 2014 11:41:28 AM
>>>>>> Subject: Re: Spurious failures because of nfs and snapshots
>>>>>>
>>>>>> Hi Joseph,
>>>>>>
>>>>>> In the log mentioned below, it say ping-time is set to default value
>>>>>> 30sec.I think issue is different.
>>>>>> Can you please point me to the logs where you where able to re-create
>>>>>> the problem.
>>>>>>
>>>>>> Thanks,
>>>>>> Vijay
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Monday 19 May 2014 09:39 AM, Pranith Kumar Karampuri wrote:
>>>>>>> hi Vijai, Joseph,
>>>>>>>       In 2 of the last 3 build failures,
>>>>>>>       http://build.gluster.org/job/regression/4479/console,
>>>>>>>       http://build.gluster.org/job/regression/4478/console this
>>>>>>>       test(tests/bugs/bug-1090042.t) failed. Do you guys think it is
>>>>>>>       better
>>>>>>>       to revert this test until the fix is available? Please send a
>>>>>>>       patch
>>>>>>>       to revert the test case if you guys feel so. You can re-submit it
>>>>>>>       along with the fix to the bug mentioned by Joseph.
>>>>>>>
>>>>>>> Pranith.
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>>> From: "Joseph Fernandes" <josferna at redhat.com>
>>>>>>>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>>>>>>>> Cc: "Gluster Devel" <gluster-devel at gluster.org>
>>>>>>>> Sent: Friday, 16 May, 2014 5:13:57 PM
>>>>>>>> Subject: Re: Spurious failures because of nfs and snapshots
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> tests/bugs/bug-1090042.t :
>>>>>>>>
>>>>>>>> I was able to reproduce the issue i.e when this test is done in a loop
>>>>>>>>
>>>>>>>> for i in {1..135} ; do  ./bugs/bug-1090042.t
>>>>>>>>
>>>>>>>> When checked the logs
>>>>>>>> [2014-05-16 10:49:49.003978] I
>>>>>>>> [rpc-clnt.c:973:rpc_clnt_connection_init]
>>>>>>>> 0-management: setting frame-timeout to 600
>>>>>>>> [2014-05-16 10:49:49.004035] I
>>>>>>>> [rpc-clnt.c:988:rpc_clnt_connection_init]
>>>>>>>> 0-management: defaulting ping-timeout to 30secs
>>>>>>>> [2014-05-16 10:49:49.004303] I
>>>>>>>> [rpc-clnt.c:973:rpc_clnt_connection_init]
>>>>>>>> 0-management: setting frame-timeout to 600
>>>>>>>> [2014-05-16 10:49:49.004340] I
>>>>>>>> [rpc-clnt.c:988:rpc_clnt_connection_init]
>>>>>>>> 0-management: defaulting ping-timeout to 30secs
>>>>>>>>
>>>>>>>> The issue is with ping-timeout and is tracked under the bug
>>>>>>>>
>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1096729
>>>>>>>>
>>>>>>>>
>>>>>>>> The workaround is mentioned in
>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1096729#c8
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Joe
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>>>>>>>> To: "Gluster Devel" <gluster-devel at gluster.org>
>>>>>>>> Cc: "Joseph Fernandes" <josferna at redhat.com>
>>>>>>>> Sent: Friday, May 16, 2014 6:19:54 AM
>>>>>>>> Subject: Spurious failures because of nfs and snapshots
>>>>>>>>
>>>>>>>> hi,
>>>>>>>>       In the latest build I fired for review.gluster.com/7766
>>>>>>>>       (http://build.gluster.org/job/regression/4443/console) failed
>>>>>>>>       because
>>>>>>>>       of
>>>>>>>>       spurious failure. The script doesn't wait for nfs export to be
>>>>>>>>       available. I fixed that, but interestingly I found quite a few
>>>>>>>>       scripts
>>>>>>>>       with same problem. Some of the scripts are relying on 'sleep 5'
>>>>>>>>       which
>>>>>>>>       also could lead to spurious failures if the export is not
>>>>>>>>       available
>>>>>>>>       in 5
>>>>>>>>       seconds. We found that waiting for 20 seconds is better, but
>>>>>>>>       'sleep
>>>>>>>>       20'
>>>>>>>>       would unnecessarily delay the build execution. So if you guys
>>>>>>>>       are
>>>>>>>>       going
>>>>>>>>       to write any scripts which has to do nfs mounts, please do it
>>>>>>>>       the
>>>>>>>>       following way:
>>>>>>>>
>>>>>>>> EXPECT_WITHIN 20 "1" is_nfs_export_available;
>>>>>>>> TEST mount -t nfs -o vers=3 $H0:/$V0 $N0;
>>>>>>>>
>>>>>>>> Please review http://review.gluster.com/7773 :-)
>>>>>>>>
>>>>>>>> I saw one more spurious failure in a snapshot related script
>>>>>>>> tests/bugs/bug-1090042.t on the next build fired by Niels.
>>>>>>>> Joesph (CCed) is debugging it. He agreed to reply what he finds and
>>>>>>>> share
>>>>>>>> it
>>>>>>>> with us so that we won't introduce similar bugs in future.
>>>>>>>>
>>>>>>>> I encourage you guys to share what you fix to prevent spurious
>>>>>>>> failures
>>>>>>>> in
>>>>>>>> future.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Pranith
>>>>>>>>
>>>>>>
>>>>
>>>>
>>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>>


More information about the Gluster-devel mailing list