[Gluster-devel] Spurious failures because of nfs and snapshots

Vijaikumar M vmallika at redhat.com
Mon May 19 09:10:47 UTC 2014


Brick disconnected with ping-time out:

Here is the log message
[2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main] 
0-/build/install/sbin/glusterfsd: Started running /build/install/sbi    
n/glusterfsd version 3.5qa2 (args: /build/install/sbin/glusterfsd -s 
build.gluster.org --volfile-id /snaps/patchy_snap1/3f2ae3fbb4a74587b1a9 
1013f07d327f.build.gluster.org.var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3 
-p /var/lib/glusterd/snaps/patchy_snap1/3f2ae3f 
bb4a74587b1a91013f07d327f/run/build.gluster.org-var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.pid 
-S /var/run/51fe50a6faf0aa    e006c815da946caf3a.socket --brick-name 
/var/run/gluster/snaps/3f2ae3fbb4a74587b1a91013f07d327f/brick3 -l 
/build/install/var/log/glusterfs/br 
icks/var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.log 
--xlator-option *-posix.glusterd-uuid=494ef3cd-15fc-4c8c-8751-2d441ba    
7b4b0 --brick-port 49164 --xlator-option 
3f2ae3fbb4a74587b1a91013f07d327f-server.listen-port=49164)
   2 [2014-05-19 04:29:38.141118] I 
[rpc-clnt.c:988:rpc_clnt_connection_init] 0-glusterfs: defaulting 
ping-timeout to 30secs
   3 [2014-05-19 04:30:09.139521] C 
[rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-glusterfs: server 
10.3.129.13:24007 has not responded in the last 30 seconds, disconnecting.



Patch 'http://review.gluster.org/#/c/7753/' will fix the problem, where 
ping-timer will be disabled by default for all the rpc connection except 
for glusterd-glusterd (set to 30sec) and client-glusterd (set to 42sec).


Thanks,
Vijay


On Monday 19 May 2014 11:56 AM, Pranith Kumar Karampuri wrote:
> The latest build failure also has the same issue:
> Download it from here:
> http://build.gluster.org:443/logs/glusterfs-logs-20140518%3a22%3a27%3a31.tgz
>
> Pranith
>
> ----- Original Message -----
>> From: "Vijaikumar M" <vmallika at redhat.com>
>> To: "Joseph Fernandes" <josferna at redhat.com>
>> Cc: "Pranith Kumar Karampuri" <pkarampu at redhat.com>, "Gluster Devel" <gluster-devel at gluster.org>
>> Sent: Monday, 19 May, 2014 11:41:28 AM
>> Subject: Re: Spurious failures because of nfs and snapshots
>>
>> Hi Joseph,
>>
>> In the log mentioned below, it say ping-time is set to default value
>> 30sec.I think issue is different.
>> Can you please point me to the logs where you where able to re-create
>> the problem.
>>
>> Thanks,
>> Vijay
>>
>>
>>
>> On Monday 19 May 2014 09:39 AM, Pranith Kumar Karampuri wrote:
>>> hi Vijai, Joseph,
>>>       In 2 of the last 3 build failures,
>>>       http://build.gluster.org/job/regression/4479/console,
>>>       http://build.gluster.org/job/regression/4478/console this
>>>       test(tests/bugs/bug-1090042.t) failed. Do you guys think it is better
>>>       to revert this test until the fix is available? Please send a patch
>>>       to revert the test case if you guys feel so. You can re-submit it
>>>       along with the fix to the bug mentioned by Joseph.
>>>
>>> Pranith.
>>>
>>> ----- Original Message -----
>>>> From: "Joseph Fernandes" <josferna at redhat.com>
>>>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>>>> Cc: "Gluster Devel" <gluster-devel at gluster.org>
>>>> Sent: Friday, 16 May, 2014 5:13:57 PM
>>>> Subject: Re: Spurious failures because of nfs and snapshots
>>>>
>>>>
>>>> Hi All,
>>>>
>>>> tests/bugs/bug-1090042.t :
>>>>
>>>> I was able to reproduce the issue i.e when this test is done in a loop
>>>>
>>>> for i in {1..135} ; do  ./bugs/bug-1090042.t
>>>>
>>>> When checked the logs
>>>> [2014-05-16 10:49:49.003978] I [rpc-clnt.c:973:rpc_clnt_connection_init]
>>>> 0-management: setting frame-timeout to 600
>>>> [2014-05-16 10:49:49.004035] I [rpc-clnt.c:988:rpc_clnt_connection_init]
>>>> 0-management: defaulting ping-timeout to 30secs
>>>> [2014-05-16 10:49:49.004303] I [rpc-clnt.c:973:rpc_clnt_connection_init]
>>>> 0-management: setting frame-timeout to 600
>>>> [2014-05-16 10:49:49.004340] I [rpc-clnt.c:988:rpc_clnt_connection_init]
>>>> 0-management: defaulting ping-timeout to 30secs
>>>>
>>>> The issue is with ping-timeout and is tracked under the bug
>>>>
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1096729
>>>>
>>>>
>>>> The workaround is mentioned in
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1096729#c8
>>>>
>>>>
>>>> Regards,
>>>> Joe
>>>>
>>>> ----- Original Message -----
>>>> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>>>> To: "Gluster Devel" <gluster-devel at gluster.org>
>>>> Cc: "Joseph Fernandes" <josferna at redhat.com>
>>>> Sent: Friday, May 16, 2014 6:19:54 AM
>>>> Subject: Spurious failures because of nfs and snapshots
>>>>
>>>> hi,
>>>>       In the latest build I fired for review.gluster.com/7766
>>>>       (http://build.gluster.org/job/regression/4443/console) failed because
>>>>       of
>>>>       spurious failure. The script doesn't wait for nfs export to be
>>>>       available. I fixed that, but interestingly I found quite a few
>>>>       scripts
>>>>       with same problem. Some of the scripts are relying on 'sleep 5' which
>>>>       also could lead to spurious failures if the export is not available
>>>>       in 5
>>>>       seconds. We found that waiting for 20 seconds is better, but 'sleep
>>>>       20'
>>>>       would unnecessarily delay the build execution. So if you guys are
>>>>       going
>>>>       to write any scripts which has to do nfs mounts, please do it the
>>>>       following way:
>>>>
>>>> EXPECT_WITHIN 20 "1" is_nfs_export_available;
>>>> TEST mount -t nfs -o vers=3 $H0:/$V0 $N0;
>>>>
>>>> Please review http://review.gluster.com/7773 :-)
>>>>
>>>> I saw one more spurious failure in a snapshot related script
>>>> tests/bugs/bug-1090042.t on the next build fired by Niels.
>>>> Joesph (CCed) is debugging it. He agreed to reply what he finds and share
>>>> it
>>>> with us so that we won't introduce similar bugs in future.
>>>>
>>>> I encourage you guys to share what you fix to prevent spurious failures in
>>>> future.
>>>>
>>>> Thanks
>>>> Pranith
>>>>
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140519/f91b9083/attachment-0002.html>


More information about the Gluster-devel mailing list