[Gluster-devel] infiniband failing when too many clients connect at once
Mickey Mazarick
mic at digitaltadpole.com
Wed Mar 19 06:39:33 UTC 2008
Thanks so much for all the help getting this working for us.
The only problem we are still seeing is when lots of clients connect at
once it seems to hang the servers.
Nothing is reported in the log files of the client; it literally just
freezes.
For the benefit of the list, the commands do the following:
/bu/scripts/EX runs the command supplied sequentially
on each of our 6 storage servers.
/etc/init.d/glustersystem mounts or unmounts a gluster mount point
as a service
For a demonstration ssh to RTPST201 and run:
/bu/scripts/EX "/etc/init.d/glustersystem stop"
/bu/scripts/EX "/etc/init.d/glustersystem start"
no gluster mount will work until you do:
/bu/scripts/EX "/etc/init.d/glusterserver restart"
The servers all crash with the errors:
2008-03-19 01:58:41 C [ib-verbs-server.c:231:gf_transport_fini]
ib-verbs/server: server: called fini on transport: 0x527bc0
2008-03-19 01:58:42 C [ib-verbs.c:1551:ib_verbs_tcp_notify]
transport/ib-verbs: server: notify (2) called on tcp socket
2008-03-19 01:58:42 C [ib-verbs.c:1458:ib_verbs_disconnect]
transport/ib-verbs: server: peer disconnected, cleaning up
Thanks again!
-Mickey Mazarick
Mickey Mazarick wrote:
> Did you make any changes to the server? It's working great I just want
> to know if I can take credit. ;-)
> I reinstalled OFED with the latest ver. (1.3)
>
> That seems to have cleared up the weird problem with afr not failing
> over.
>
> Once again, thanks for your help :-)
>
> -Mickey Mazarick
>
> Amar S. Tumballi wrote:
>> Hi Mickey,
>> Is it possible for you to get me a remote login, so that I can debug
>> it sooner?
>>
>> Regards,
>> Amar
>>
>> On Tue, Mar 18, 2008 at 12:35 PM, Mickey Mazarick
>> <mic at digitaltadpole.com <mailto:mic at digitaltadpole.com>> wrote:
>>
>> Yes, IB mounting works now :-)
>>
>> I ran
>> find /system -type f -exec head -n 1 {} \; >/dev/null
>> and the client process crashed and locked up the mount after a few
>> minutes...
>>
>> The server has these errors exactly once a minute, but no errors
>> in the
>> client log, it just too:
>> 2008-03-18 15:30:30 C [ib-verbs.c:1551:ib_verbs_tcp_notify]
>> transport/ib-verbs: server: notify (2) called on tcp socket
>> 2008-03-18 15:30:30 C [ib-verbs.c:1458:ib_verbs_disconnect]
>> transport/ib-verbs: server: peer disconnected, cleaning up
>> 2008-03-18 15:30:30 C [ib-verbs-server.c:231:gf_transport_fini]
>> ib-verbs/server: server: called fini on transport: 0x57c680
>> 2008-03-18 15:30:30 C [ib-verbs.c:1551:ib_verbs_tcp_notify]
>> transport/ib-verbs: server: notify (2) called on tcp socket
>> 2008-03-18 15:30:30 C [ib-verbs.c:1458:ib_verbs_disconnect]
>> transport/ib-verbs: server: peer disconnected, cleaning up
>> 2008-03-18 15:30:30 C [ib-verbs-server.c:231:gf_transport_fini]
>> ib-verbs/server: server: called fini on transport: 0x5a3760
>> 2008-03-18 15:31:29 C [ib-verbs.c:1551:ib_verbs_tcp_notify]
>> transport/ib-verbs: server: notify (2) called on tcp socket
>> 2008-03-18 15:31:29 C [ib-verbs.c:1458:ib_verbs_disconnect]
>> transport/ib-verbs: server: peer disconnected, cleaning up
>> 2008-03-18 15:31:29 C [ib-verbs-server.c:231:gf_transport_fini]
>> ib-verbs/server: server: called fini on transport: 0x5349c0
>> 2008-03-18 15:31:29 C [ib-verbs.c:1551:ib_verbs_tcp_notify]
>> transport/ib-verbs: server: notify (2) called on tcp socket
>> 2008-03-18 15:31:29 C [ib-verbs.c:1458:ib_verbs_disconnect]
>> transport/ib-verbs: server: peer disconnected, cleaning up
>> 2008-03-18 15:31:29 C [ib-verbs-server.c:231:gf_transport_fini]
>> ib-verbs/server: server: called fini on transport: 0x5c3880
>>
>>
>>
>> Amar S. Tumballi wrote:
>> > Hi Mickey,
>> > With current latest (patch-708), the ib-verbs transport is
>> working
>> > fine. You can try with it.
>> >
>> > Regards,
>> > Amar
>> >
>> > On Sun, Mar 16, 2008 at 7:03 PM, Amar S. Tumballi
>> <amar at zresearch.com <mailto:amar at zresearch.com>
>> > <mailto:amar at zresearch.com <mailto:amar at zresearch.com>>> wrote:
>> >
>> > Hi Mickey,
>> > I am working on that. You can revert back to patch-700 or
>> earlier
>> > till i see whats happening.
>> >
>> > Regards,
>> > Amar
>> >
>> >
>> > On Sun, Mar 16, 2008 at 12:33 PM, Mickey Mazarick
>> > <mic at digitaltadpole.com <mailto:mic at digitaltadpole.com>
>> <mailto:mic at digitaltadpole.com <mailto:mic at digitaltadpole.com>>>
>> wrote:
>> >
>> > on the client I get the message attempting to pipeline
>> > handshake but we
>> > never see any contents. The filesystem hangs completely
>> until
>> > we unmount.
>> > I'll see if I can dig up more info/logs later.
>> >
>> > -Mickey Mazarick
>> > --
>> >
>> >
>> > _______________________________________________
>> > Gluster-devel mailing list
>> > Gluster-devel at nongnu.org
>> <mailto:Gluster-devel at nongnu.org> <mailto:Gluster-devel at nongnu.org
>> <mailto:Gluster-devel at nongnu.org>>
>> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
>> >
>> >
>> >
>> >
>> > --
>> > Amar Tumballi
>> > Gluster/GlusterFS Hacker
>> > [bulde on #gluster/irc.gnu.org]
>> > http://www.zresearch.com - Commoditizing Supercomputing and
>> > Superstorage!
>> >
>> >
>> >
>> >
>> > --
>> > Amar Tumballi
>> > Gluster/GlusterFS Hacker
>> > [bulde on #gluster/irc.gnu.org]
>> > http://www.zresearch.com - Commoditizing Supercomputing and
>> Superstorage!
>>
>>
>> --
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>>
>>
>>
>> --
>> Amar Tumballi
>> Gluster/GlusterFS Hacker
>> [bulde on #gluster/irc.gnu.org]
>> http://www.zresearch.com - Commoditizing Supercomputing and
>> Superstorage!
>
>
--
More information about the Gluster-devel
mailing list