[Gluster-devel] regression failed : snapshot/bug-1316437.t

Mon Jul 25 13:44:16 UTC 2016

On Mon, Jul 25, 2016 at 7:12 PM, Atin Mukherjee <amukherj at redhat.com> wrote:

>
>
> On Mon, Jul 25, 2016 at 5:37 PM, Atin Mukherjee <amukherj at redhat.com>
> wrote:
>
>>
>>
>> On Mon, Jul 25, 2016 at 4:34 PM, Avra Sengupta <asengupt at redhat.com>
>> wrote:
>>
>>> The crux of the problem is that as of today, brick processes on restart
>>> try to reuse the old port they were using (assuming that no other process
>>> will be using it, and not consulting pmap_registry_alloc() before using
>>> it). With a recent change, pmap_registry_alloc (), reassigns older ports
>>> that were used, but are now free. Hence snapd now gets a port that was
>>> previously used by a brick and tries to bind to it, whereas the older brick
>>> process without consulting pmap table blindly tries to connect to it, and
>>> hence we see this problem.
>>>
>>> Now coming to the fix, I feel brick process should not try to get the
>>> older port and should just take a new port every time it comes up. We will
>>> not run out of ports with this change coz, now pmap allocates old ports
>>> again, and the previous port being used by the brick process will
>>> eventually be reused. If anyone sees any concern with this approach, please
>>> feel free to raise so now.
>>>
>>
>> Looks to be OK, but I'll think through it and get back to you by a day or
>> two if I have any objections.
>>
>
> If we are conservative about bricks not binding to a different port on a
> restart, I've an alternative approach here [1] . Neither it has a full
> fledged commit message nor a BZ. I've just put this up for your input?
>

Read it as "binding" instead "not binding"

>
> [1] http://review.gluster.org/15005
>
>
>>
>>
>>> While awaiting feedback from you guys, I have sent this patch (
>>> http://review.gluster.org/15001), which moves the said test case to bad
>>> tests for now, and after we collectively reach to a conclusion on the fix,
>>> we will remove this from bad test.
>>>
>>> Regards,
>>> Avra
>>>
>>>
>>> On 07/25/2016 02:33 PM, Avra Sengupta wrote:
>>>
>>> The failure suggests that the port snapd is trying to bind to is already
>>> in use. But snapd has been modified to use a new port everytime. I am
>>> looking into this.
>>>
>>> On 07/25/2016 02:23 PM, Nithya Balachandran wrote:
>>>
>>> More failures:
>>>
>>> https://build.gluster.org/job/rackspace-regression-2GB-triggered/22452/console
>>>
>>> I see these messages in the snapd.log:
>>>
>>> [2016-07-22 05:31:52.482282] I
>>> [rpcsvc.c:2199:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured
>>> rpc.outstanding-rpc-limit with value 64
>>> [2016-07-22 05:31:52.482352] W [MSGID: 101002]
>>> [options.c:954:xl_opt_validate] 0-patchy-server: option 'listen-port' is
>>> deprecated, preferred is 'transport.socket.listen-port', continuing with
>>> correction
>>> [2016-07-22 05:31:52.482436] E [socket.c:771:__socket_server_bind]
>>> 0-tcp.patchy-server: binding to  failed: Address already in use
>>> [2016-07-22 05:31:52.482447] E [socket.c:774:__socket_server_bind]
>>> 0-tcp.patchy-server: Port is already in use
>>> [2016-07-22 05:31:52.482459] W [rpcsvc.c:1630:rpcsvc_create_listener]
>>> 0-rpc-service: listening on transport failed
>>> [2016-07-22 05:31:52.482469] W [MSGID: 115045] [server.c:1061:init]
>>> 0-patchy-server: creation of listener failed
>>> [2016-07-22 05:31:52.482481] E [MSGID: 101019]
>>> [xlator.c:433:xlator_init] 0-patchy-server: Initialization of volume
>>> 'patchy-server' failed, review your volfile again
>>> [2016-07-22 05:31:52.482491] E [MSGID: 101066]
>>> [graph.c:324:glusterfs_graph_init] 0-patchy-server: initializing translator
>>> failed
>>> [2016-07-22 05:31:52.482499] E [MSGID: 101176]
>>> [graph.c:670:glusterfs_graph_activate] 0-graph: init failed
>>>
>>> On Mon, Jul 25, 2016 at 12:00 PM, Ashish Pandey <aspandey at redhat.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Following test has failed 3 times in last two days -
>>>>
>>>> ./tests/bugs/snapshot/bug-1316437.t
>>>>
>>>> https://build.gluster.org/job/rackspace-regression-2GB-triggered/22445/consoleFull
>>>>
>>>> https://build.gluster.org/job/rackspace-regression-2GB-triggered/22445/consoleFull
>>>>
>>>> https://build.gluster.org/job/rackspace-regression-2GB-triggered/22470/consoleFull
>>>>
>>>> Please take a look at it and check if it spurious failure or not.
>>>>
>>>> Ashish
>>>>
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>>
>> --Atin
>>
>
>
>
> --
>
> --Atin
>

-- 

--Atin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160725/1d0390b8/attachment.html>