[Gluster-users] Fedora 11 - 2.6.31 Kernel - Fuse 2.8.0 - Infiniband

Mickey Mazarick mic at digitaltadpole.com
Tue Sep 22 19:28:21 UTC 2009


Sorry the mail daemon just batched me the rest of this conversation and 
I see this is already done. please ignore.

-Mic

Mickey Mazarick wrote:
> I had some difficulty getting OFED 1.3 working on kernel 2.6.27 about 
> 6 months back.  It took some patching but I did find that you needed 
> to have the srq enabled for it to work. The ibv_srq_pingpong test app 
> was a good test for weather it would work with gluster of not.
>
> I also had to upgrade the firmware on the mellanox cards I have to 
> enable srq (send recieve que)
>
> -Mic
>
> Nathan Stratton wrote:
>>
>> Hate to post again, but anyone have any ideas on this?
>>
>> -Nathan
>>
>> On Fri, 18 Sep 2009, Nathan Stratton wrote:
>>
>>>
>>> Has anyone been able to get Infiniband working with 2.6.31 kernel 
>>> and fuse 2.8.0? My config works fine on my Centos 2.6.18 box, so I 
>>> know that is ok.
>>>
>>> Infiniband looks good:
>>>
>>> [root at xen1 src]# lsmod |grep ib
>>> ib_ucm                 13752  0
>>> ib_uverbs              32256  2 rdma_ucm,ib_ucm
>>> ib_ipoib               68880  0
>>> ib_mthca              123700  0
>>>
>>> [root at xen1 src]# ibv_devices
>>>    device                 node GUID
>>>    ------              ----------------
>>>    mthca0              0005ad00000327e8
>>>
>>> Gluster looks like it starts OK, but I can't touch the mount and 
>>> after a while it times out. Debug logs:
>>>
>>>
>>> [2009-09-18 19:36:17] D [glusterfsd.c:354:_get_specfp] glusterfs: 
>>> loading volume file /usr/local/etc/glusterfs/glusterfs.vol
>>> ================================================================================ 
>>>
>>> Version      : glusterfs 2.0.6 built on Sep 18 2009 09:54:43
>>> TLA Revision : v2.0.6
>>> Starting Time: 2009-09-18 19:36:17
>>> Command line : glusterfs -L DEBUG -l /var/log/glusterfs.log 
>>> --disable-direct-io-mode /share
>>> PID          : 8303
>>> System name  : Linux
>>> Nodename     : xen1.hou.blinkmind.com
>>> Kernel Release : 2.6.31
>>> Hardware Identifier: x86_64
>>>
>>> Given volfile:
>>> +------------------------------------------------------------------------------+ 
>>>
>>>  1: volume brick0
>>>  2:  type protocol/client
>>>  3:  option transport-type ib-verbs/client
>>>  4:  option remote-host 172.16.0.200
>>>  5:  option remote-port 6997
>>>  6:  option transport.address-family inet/inet6
>>>  7:  option remote-subvolume brick
>>>  8: end-volume
>>>  9:
>>> 10: volume mirror0
>>> 11:  type protocol/client
>>> 12:  option transport-type ib-verbs/client
>>> 13:  option remote-host 172.16.0.201
>>> 14:  option remote-port 6997
>>> 15:  option transport.address-family inet/inet6
>>> 16:  option remote-subvolume brick
>>> 17: end-volume
>>> 18:
>>> 19: volume brick1
>>> 20:  type protocol/client
>>> 21:  option transport-type ib-verbs/client
>>> 22:  option remote-host 172.16.0.202
>>> 23:  option remote-port 6997
>>> 24:  option transport.address-family inet/inet6
>>> 25:  option remote-subvolume brick
>>> 26: end-volume
>>> 27:
>>> 28: volume mirror1
>>> 29:  type protocol/client
>>> 30:  option transport-type ib-verbs/client
>>> 31:  option remote-host 172.16.0.203
>>> 32:  option remote-port 6997
>>> 33:  option transport.address-family inet/inet6
>>> 34:  option remote-subvolume brick
>>> 35: end-volume
>>> 36:
>>> 37: volume brick2
>>> 38:  type protocol/client
>>> 39:  option transport-type ib-verbs/client
>>> 40:  option remote-host 172.16.0.204
>>> 41:  option remote-port 6997
>>> 42:  option transport.address-family inet/inet6
>>> 43:  option remote-subvolume brick
>>> 44: end-volume
>>> 45:
>>> 46: volume mirror2
>>> 47:  type protocol/client
>>> 48:  option transport-type ib-verbs/client
>>> 49:  option remote-host 172.16.0.205
>>> 50:  option remote-port 6997
>>> 51:  option transport.address-family inet/inet6
>>> 52:  option remote-subvolume brick
>>> 53: end-volume
>>> 54:
>>> 55: volume block0
>>> 56:  type cluster/replicate
>>> 57:  subvolumes brick0 mirror0
>>> 58: end-volume
>>> 59:
>>> 60: volume block1
>>> 61:  type cluster/replicate
>>> 62:  subvolumes brick1 mirror1
>>> 63: end-volume
>>> 64:
>>> 65: volume block2
>>> 66:  type cluster/replicate
>>> 67:  subvolumes brick2 mirror2
>>> 68: end-volume
>>> 69:
>>> 70: volume unify
>>> 71:  type cluster/distribute
>>> 72:  subvolumes block0 block1 block2
>>> 73: end-volume
>>> 74:
>>>
>>> +------------------------------------------------------------------------------+ 
>>>
>>> [2009-09-18 19:36:17] D [glusterfsd.c:1205:main] glusterfs: running 
>>> in pid 8303
>>> [2009-09-18 19:36:17] D [client-protocol.c:5952:init] brick0: 
>>> defaulting frame-timeout to 30mins
>>> [2009-09-18 19:36:17] D [client-protocol.c:5963:init] brick0: 
>>> defaulting ping-timeout to 10
>>> [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: 
>>> attempt to load file 
>>> /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
>>> [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] 
>>> brick0: no range check required for 'option remote-port 6997'
>>> [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: 
>>> attempt to load file 
>>> /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
>>> [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] 
>>> brick0: no range check required for 'option remote-port 6997'
>>> [2009-09-18 19:36:17] D [client-protocol.c:5952:init] mirror0: 
>>> defaulting frame-timeout to 30mins
>>> [2009-09-18 19:36:17] D [client-protocol.c:5963:init] mirror0: 
>>> defaulting ping-timeout to 10
>>> [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: 
>>> attempt to load file 
>>> /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
>>> [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] 
>>> mirror0: no range check required for 'option remote-port 6997'
>>> [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: 
>>> attempt to load file 
>>> /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
>>> [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] 
>>> mirror0: no range check required for 'option remote-port 6997'
>>> [2009-09-18 19:36:17] D [client-protocol.c:5952:init] brick1: 
>>> defaulting frame-timeout to 30mins
>>> [2009-09-18 19:36:17] D [client-protocol.c:5963:init] brick1: 
>>> defaulting ping-timeout to 10
>>> [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: 
>>> attempt to load file 
>>> /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
>>> [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] 
>>> brick1: no range check required for 'option remote-port 6997'
>>> [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: 
>>> attempt to load file 
>>> /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
>>> [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] 
>>> brick1: no range check required for 'option remote-port 6997'
>>> [2009-09-18 19:36:17] D [client-protocol.c:5952:init] mirror1: 
>>> defaulting frame-timeout to 30mins
>>> [2009-09-18 19:36:17] D [client-protocol.c:5963:init] mirror1: 
>>> defaulting ping-timeout to 10
>>> [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: 
>>> attempt to load file 
>>> /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
>>> [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] 
>>> mirror1: no range check required for 'option remote-port 6997'
>>> [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: 
>>> attempt to load file 
>>> /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
>>> [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] 
>>> mirror1: no range check required for 'option remote-port 6997'
>>> [2009-09-18 19:36:17] D [client-protocol.c:5952:init] brick2: 
>>> defaulting frame-timeout to 30mins
>>> [2009-09-18 19:36:17] D [client-protocol.c:5963:init] brick2: 
>>> defaulting ping-timeout to 10
>>> [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: 
>>> attempt to load file 
>>> /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
>>> [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] 
>>> brick2: no range check required for 'option remote-port 6997'
>>> [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: 
>>> attempt to load file 
>>> /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
>>> [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] 
>>> brick2: no range check required for 'option remote-port 6997'
>>> [2009-09-18 19:36:17] D [client-protocol.c:5952:init] mirror2: 
>>> defaulting frame-timeout to 30mins
>>> [2009-09-18 19:36:17] D [client-protocol.c:5963:init] mirror2: 
>>> defaulting ping-timeout to 10
>>> [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: 
>>> attempt to load file 
>>> /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
>>> [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] 
>>> mirror2: no range check required for 'option remote-port 6997'
>>> [2009-09-18 19:36:17] D [transport.c:141:transport_load] transport: 
>>> attempt to load file 
>>> /usr/local/lib/glusterfs/2.0.6/transport/ib-verbs.so
>>> [2009-09-18 19:36:17] D [xlator.c:276:_volume_option_value_validate] 
>>> mirror2: no range check required for 'option remote-port 6997'
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] brick0: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] brick0: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] mirror0: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] mirror0: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] brick1: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] brick1: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] mirror1: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] mirror1: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] brick2: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] brick2: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] mirror2: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] mirror2: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] brick0: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] brick0: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] mirror0: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] mirror0: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] brick1: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] brick1: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] mirror1: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] mirror1: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] brick2: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] brick2: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] mirror2: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] D [client-protocol.c:6280:notify] mirror2: got 
>>> GF_EVENT_PARENT_UP, attempting connect on transport
>>> [2009-09-18 19:36:17] N [glusterfsd.c:1224:main] glusterfs: 
>>> Successfully started
>>> [2009-09-18 19:36:17] D [client-protocol.c:6294:notify] brick0: got 
>>> GF_EVENT_CHILD_UP
>>> [2009-09-18 19:36:17] D [client-protocol.c:6294:notify] brick0: got 
>>> GF_EVENT_CHILD_UP
>>> [2009-09-18 19:36:17] D [client-protocol.c:6294:notify] mirror0: got 
>>> GF_EVENT_CHILD_UP
>>> [2009-09-18 19:36:17] D [client-protocol.c:6294:notify] mirror0: got 
>>> GF_EVENT_CHILD_UP
>>> [2009-09-18 19:36:17] D [client-protocol.c:6294:notify] brick1: got 
>>> GF_EVENT_CHILD_UP
>>> [2009-09-18 19:36:17] D [client-protocol.c:6294:notify] brick1: got 
>>> GF_EVENT_CHILD_UP
>>> [2009-09-18 19:36:17] D [client-protocol.c:6294:notify] mirror1: got 
>>> GF_EVENT_CHILD_UP
>>> [2009-09-18 19:36:17] D [client-protocol.c:6294:notify] mirror1: got 
>>> GF_EVENT_CHILD_UP
>>> [2009-09-18 19:36:17] D [client-protocol.c:6294:notify] brick2: got 
>>> GF_EVENT_CHILD_UP
>>> [2009-09-18 19:36:17] D [client-protocol.c:6294:notify] brick2: got 
>>> GF_EVENT_CHILD_UP
>>> [2009-09-18 19:36:17] D [client-protocol.c:6294:notify] mirror2: got 
>>> GF_EVENT_CHILD_UP
>>> [2009-09-18 19:36:17] D [client-protocol.c:6294:notify] mirror2: got 
>>> GF_EVENT_CHILD_UP
>>> [2009-09-18 20:06:18] E [client-protocol.c:289:call_bail] brick0: 
>>> bailing out frame SETVOLUME(0) frame sent = 2009-09-18 19:36:17. 
>>> frame-timeout = 1800
>>> [2009-09-18 20:06:18] D 
>>> [client-protocol.c:5491:client_setvolume_cbk] brick0: setvolume 
>>> failed (Transport endpoint is not connected)
>>> [2009-09-18 20:06:18] E [client-protocol.c:289:call_bail] brick0: 
>>> bailing out frame SETVOLUME(0) frame sent = 2009-09-18 19:36:17. 
>>> frame-timeout = 1800
>>> [2009-09-18 20:06:18] D 
>>> [client-protocol.c:5491:client_setvolume_cbk] brick0: setvolume 
>>> failed (Transport endpoint is not connected)
>>> [2009-09-18 20:06:18] E [client-protocol.c:289:call_bail] mirror0: 
>>> bailing out frame SETVOLUME(0) frame sent = 2009-09-18 19:36:17. 
>>> frame-timeout = 1800
>>> [2009-09-18 20:06:18] D 
>>> [client-protocol.c:5491:client_setvolume_cbk] mirror0: setvolume 
>>> failed (Transport endpoint is not connected)
>>> [2009-09-18 20:06:18] E [client-protocol.c:289:call_bail] mirror0: 
>>> bailing out frame SETVOLUME(0) frame sent = 2009-09-18 19:36:17. 
>>> frame-timeout = 1800
>>> [2009-09-18 20:06:18] D 
>>> [client-protocol.c:5491:client_setvolume_cbk] mirror0: setvolume 
>>> failed (Transport endpoint is not connected)
>>> [2009-09-18 20:06:18] E [client-protocol.c:289:call_bail] brick1: 
>>> bailing out frame SETVOLUME(0) frame sent = 2009-09-18 19:36:17. 
>>> frame-timeout = 1800
>>> [2009-09-18 20:06:18] D 
>>> [client-protocol.c:5491:client_setvolume_cbk] brick1: setvolume 
>>> failed (Transport endpoint is not connected)
>>> [2009-09-18 20:06:18] E [client-protocol.c:289:call_bail] brick1: 
>>> bailing out frame SETVOLUME(0) frame sent = 2009-09-18 19:36:17. 
>>> frame-timeout = 1800
>>> [2009-09-18 20:06:18] D 
>>> [client-protocol.c:5491:client_setvolume_cbk] brick1: setvolume 
>>> failed (Transport endpoint is not connected)
>>> [2009-09-18 20:06:18] E [client-protocol.c:289:call_bail] mirror1: 
>>> bailing out frame SETVOLUME(0) frame sent = 2009-09-18 19:36:17. 
>>> frame-timeout = 1800
>>> [2009-09-18 20:06:18] D 
>>> [client-protocol.c:5491:client_setvolume_cbk] mirror1: setvolume 
>>> failed (Transport endpoint is not connected)
>>> [2009-09-18 20:06:18] E [client-protocol.c:289:call_bail] mirror1: 
>>> bailing out frame SETVOLUME(0) frame sent = 2009-09-18 19:36:17. 
>>> frame-timeout = 1800
>>> [2009-09-18 20:06:18] D 
>>> [client-protocol.c:5491:client_setvolume_cbk] mirror1: setvolume 
>>> failed (Transport endpoint is not connected)
>>> [2009-09-18 20:06:18] E [client-protocol.c:289:call_bail] brick2: 
>>> bailing out frame SETVOLUME(0) frame sent = 2009-09-18 19:36:17. 
>>> frame-timeout = 1800
>>> [2009-09-18 20:06:18] D 
>>> [client-protocol.c:5491:client_setvolume_cbk] brick2: setvolume 
>>> failed (Transport endpoint is not connected)
>>> [2009-09-18 20:06:18] E [client-protocol.c:289:call_bail] brick2: 
>>> bailing out frame SETVOLUME(0) frame sent = 2009-09-18 19:36:17. 
>>> frame-timeout = 1800
>>> [2009-09-18 20:06:18] D 
>>> [client-protocol.c:5491:client_setvolume_cbk] brick2: setvolume 
>>> failed (Transport endpoint is not connected)
>>> [2009-09-18 20:06:18] E [client-protocol.c:289:call_bail] mirror2: 
>>> bailing out frame SETVOLUME(0) frame sent = 2009-09-18 19:36:17. 
>>> frame-timeout = 1800
>>> [2009-09-18 20:06:18] D 
>>> [client-protocol.c:5491:client_setvolume_cbk] mirror2: setvolume 
>>> failed (Transport endpoint is not connected)
>>> [2009-09-18 20:06:18] E [client-protocol.c:289:call_bail] mirror2: 
>>> bailing out frame SETVOLUME(0) frame sent = 2009-09-18 19:36:17. 
>>> frame-timeout = 1800
>>> [2009-09-18 20:06:18] D 
>>> [client-protocol.c:5491:client_setvolume_cbk] mirror2: setvolume 
>>> failed (Transport endpoint is not connected)
>>> [2009-09-18 20:06:18] D [dht-common.c:820:dht_lookup] unify: no 
>>> subvolume in layout for path=/, checking on all the subvols to see 
>>> if it is a directory
>>> [2009-09-18 20:06:18] D [dht-common.c:113:dht_lookup_dir_cbk] unify: 
>>> lookup of / on block0 returned error (Transport endpoint is not 
>>> connected)
>>> [2009-09-18 20:06:18] D [dht-common.c:113:dht_lookup_dir_cbk] unify: 
>>> lookup of / on block1 returned error (Transport endpoint is not 
>>> connected)
>>> [2009-09-18 20:06:18] D [dht-common.c:113:dht_lookup_dir_cbk] unify: 
>>> lookup of / on block2 returned error (Transport endpoint is not 
>>> connected)
>>> [2009-09-18 20:06:18] D [fuse-bridge.c:2385:fuse_root_lookup_cbk] 
>>> fuse: first lookup on root failed.
>>> [2009-09-18 20:06:18] W [fuse-bridge.c:1841:fuse_statfs_cbk] 
>>> glusterfs-fuse: 2: ERR => -1 (Transport endpoint is not connected)
>>>
>>>
>>>
>>>> <>
>>> Nathan Stratton                                CTO, BlinkMind, Inc.
>>> nathan at robotics.net                         nathan at blinkmind.com
>>> http://www.robotics.net                        http://www.blinkmind.com
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users




More information about the Gluster-users mailing list