[Gluster-users] hanging "df" (3.1, infiniband)
Lana Deere
lana.deere at gmail.com
Tue Oct 19 23:23:12 UTC 2010
yum install libibverbs tells me
[root at storage0 ~]# yum install libibverbs
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* addons: mirror.umoss.org
* base: mirror.vcu.edu
* extras: mirror.atlanticmetro.net
* updates: holmes.umflint.edu
Setting up Install Process
Package libibverbs-1.1.3-2.el5.x86_64 already installed and latest version
Package libibverbs-1.1.3-2.el5.i386 already installed and latest version
Nothing to do
Perhaps it is installed but turned off somehow?
.. Lana (lana.deere at gmail.com)
On Tue, Oct 19, 2010 at 7:15 PM, Craig Carl <craig at gluster.com> wrote:
> Lana -
> Looks like you have the IPoIB stack installed, but not support for
> ibverbs. Let's try this -
>
> # yum install libibverbs
> # service glusterd restart
>
> Thanks,
>
> Craig
>
> --
> Craig Carl
> Senior Systems Engineer; Gluster, Inc.
> Cell - (408) 829-9953 (California, USA)
> Office - (408) 770-1884
> Gtalk - craig.carl at gmail.com
> Twitter - @gluster
> Installing Gluster Storage Platform, the movie!
> http://rackerhacker.com/2010/08/11/one-month-with-glusterfs-in-production/
>
>
> ________________________________
> From: "Lana Deere" <lana.deere at gmail.com>
> To: "Craig Carl" <craig at gluster.com>
> Cc: gluster-users at gluster.org, landman at scalableinformatics.com
> Sent: Tuesday, October 19, 2010 4:02:11 PM
> Subject: Re: [Gluster-users] hanging "df" (3.1, infiniband)
>
> They show up in ibhosts and I can ping or ssh via IPoIB to them, but
> perhaps they are not completely configured properly. Or perhaps I
> have mixed some references to the regular Ethernet into the
> configuration for rdma? Anyway, here are the outputs you requested:
>
> [root at storage0 ~]# lsmod
> Module Size Used by
> iptable_filter 36161 0
> ip_tables 55201 1 iptable_filter
> x_tables 50505 1 ip_tables
> fuse 83057 1
> autofs4 63049 3
> hidp 83521 2
> rfcomm 104937 0
> l2cap 89409 10 hidp,rfcomm
> bluetooth 118853 5 hidp,rfcomm,l2cap
> lockd 101553 0
> sunrpc 199945 2 lockd
> cpufreq_ondemand 42449 8
> acpi_cpufreq 47937 0
> freq_table 38977 2 cpufreq_ondemand,acpi_cpufreq
> ib_iser 69569 0
> libiscsi2 77765 1 ib_iser
> scsi_transport_iscsi2 74073 2 ib_iser,libiscsi2
> scsi_transport_iscsi 35017 1 scsi_transport_iscsi2
> ib_srp 67465 0
> rds 401393 0
> ib_sdp 144285 0
> ib_ipoib 113057 0
> ipoib_helper 35537 2 ib_ipoib
> ipv6 435489 77 ib_ipoib
> xfrm_nalgo 43333 1 ipv6
> crypto_api 42945 1 xfrm_nalgo
> rdma_ucm 47681 0
> rdma_cm 68437 4 ib_iser,rds,ib_sdp,rdma_ucm
> ib_ucm 50121 0
> ib_uverbs 68720 2 rdma_ucm,ib_ucm
> ib_umad 50153 0
> ib_cm 72809 4 ib_srp,ib_ipoib,rdma_cm,ib_ucm
> iw_cm 43465 1 rdma_cm
> ib_addr 41929 1 rdma_cm
> ib_sa 74953 4 ib_srp,ib_ipoib,rdma_cm,ib_cm
> mlx4_ib 94461 0
> ib_mad 70629 4 ib_umad,ib_cm,ib_sa,mlx4_ib
> ib_core 104901 15
> ib_iser,ib_srp,rds,ib_sdp,ib_ipoib,rdma_ucm,rdma_cm,ib_ucm,ib_uverbs,ib_umad,ib_cm,iw_cm,ib_sa,mlx4_ib,ib_mad
> xfs 508625 1
> loop 48721 0
> dm_mirror 54737 0
> dm_multipath 56921 0
> scsi_dh 42177 1 dm_multipath
> raid456 152417 1
> xor 38865 1 raid456
> video 53197 0
> backlight 39873 1 video
> sbs 49921 0
> power_meter 47053 0
> hwmon 36553 1 power_meter
> i2c_ec 38593 1 sbs
> dell_wmi 37601 0
> wmi 41985 1 dell_wmi
> button 40545 0
> battery 43849 0
> asus_acpi 50917 0
> acpi_memhotplug 40517 0
> ac 38729 0
> parport_pc 62313 0
> lp 47121 0
> parport 73165 2 parport_pc,lp
> mlx4_en 107985 0
> joydev 43969 0
> i2c_i801 41813 0
> igb 122709 0
> i2c_core 56641 2 i2c_ec,i2c_i801
> 8021q 57425 1 igb
> shpchp 70893 0
> mlx4_core 152773 2 mlx4_ib,mlx4_en
> serio_raw 40517 0
> dca 41221 1 igb
> sg 70377 0
> pcspkr 36289 0
> dm_raid45 99657 0
> dm_message 36289 1 dm_raid45
> dm_region_hash 46145 1 dm_raid45
> dm_log 44993 3 dm_mirror,dm_raid45,dm_region_hash
> dm_mod 101649 4 dm_mirror,dm_multipath,dm_raid45,dm_log
> dm_mem_cache 38977 1 dm_raid45
> mpt2sas 159337 12
> scsi_transport_sas 66753 1 mpt2sas
> ahci 69705 6
> libata 209489 1 ahci
> sd_mod 56513 32
> scsi_mod 196953 10
> ib_iser,libiscsi2,scsi_transport_iscsi2,ib_srp,scsi_dh,sg,mpt2sas,scsi_transport_sas,libata,sd_mod
> raid1 56001 3
> ext3 168913 2
> jbd 94769 1 ext3
> uhci_hcd 57433 0
> ohci_hcd 56309 0
> ehci_hcd 66125 0
> [root at storage0 ~]# ibv_devinfo
> libibverbs: Warning: no userspace device-specific driver found for
> /sys/class/infiniband_verbs/uverbs0
> No IB devices found
> [root at storage0 ~]# lspci
> 00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 22)
> 00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
> Express Root Port 1 (rev 22)
> 00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
> Express Root Port 3 (rev 22)
> 00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express
> Root Port 5 (rev 22)
> 00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
> Express Root Port 7 (rev 22)
> 00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
> Express Root Port 9 (rev 22)
> 00:13.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub I/OxAPIC
> Interrupt Controller (rev 22)
> 00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management
> Registers (rev 22)
> 00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch
> Pad Registers (rev 22)
> 00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status
> and RAS Registers (rev 22)
> 00:14.3 PIC: Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers (rev
> 22)
> 00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset
> QuickData Technology Device (rev 22)
> 00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset
> QuickData Technology Device (rev 22)
> 00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset
> QuickData Technology Device (rev 22)
> 00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset
> QuickData Technology Device (rev 22)
> 00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset
> QuickData Technology Device (rev 22)
> 00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset
> QuickData Technology Device (rev 22)
> 00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset
> QuickData Technology Device (rev 22)
> 00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset
> QuickData Technology Device (rev 22)
> 00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
> UHCI Controller #4
> 00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
> UHCI Controller #5
> 00:1a.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
> UHCI Controller #6
> 00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2
> EHCI Controller #2
> 00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
> UHCI Controller #1
> 00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
> UHCI Controller #2
> 00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
> UHCI Controller #3
> 00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2
> EHCI Controller #1
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
> 00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface
> Controller
> 00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA
> AHCI Controller
> 00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
> 01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
> Connection (rev 01)
> 01:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
> Connection (rev 01)
> 02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic
> SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 02)
> 05:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe
> 2.0 5GT/s - IB QDR / 10GigE] (rev b0)
> 06:01.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW
> WPCM450 (rev 0a)
> [root at storage0 ~]# /etc/init.d/openibd status
> Low level hardware support loaded:
> mlx4_ib
>
> Upper layer protocol modules:
> ib_iser ib_srp rds ib_sdp ib_ipoib
>
> User space access modules:
> rdma_ucm ib_ucm ib_uverbs ib_umad
>
> Connection management modules:
> rdma_cm ib_cm iw_cm
>
> Configured IPoIB interfaces: ib0
> Currently active IPoIB interfaces: ib0
> [root at storage0 ~]#
>
>
>
> .. Lana (lana.deere at gmail.com)
>
>
>
>
>
>
> On Tue, Oct 19, 2010 at 6:48 PM, Craig Carl <craig at gluster.com> wrote:
>> Lana -
>> The first couple of lines of the log identify our problem -
>>
>> [2010-10-19 07:47:49.315416] C [rdma.c:3817:rdma_init] rpc-transport/rdma:
>> No IB devices found
>> [2010-10-19 07:47:49.315438] E [rdma.c:4744:init] rdma.management: Failed
>> to
>> initialize IB Device
>> [2010-10-19 07:47:49.315452] E [rpc-transport.c:965:rpc_transport_load]
>> rpc-transport: 'rdma' initialization failed
>>
>> Are you sure your IB cards are working? Can you send the output of -
>>
>> # lsmod
>> # ibv_devinfo
>> # lspci
>> # /etc/init.d/openibd status
>>
>>
>>
>> Thanks,
>>
>> Craig
>>
>> --
>> Craig Carl
>> Senior Systems Engineer; Gluster, Inc.
>> Cell - (408) 829-9953 (California, USA)
>> Office - (408) 770-1884
>> Gtalk - craig.carl at gmail.com
>> Twitter - @gluster
>> Installing Gluster Storage Platform, the movie!
>> http://rackerhacker.com/2010/08/11/one-month-with-glusterfs-in-production/
>>
>>
>> ________________________________
>> From: "Lana Deere" <lana.deere at gmail.com>
>> To: "Craig Carl" <craig at gluster.com>
>> Cc: gluster-users at gluster.org, landman at scalableinformatics.com
>> Sent: Tuesday, October 19, 2010 3:29:41 PM
>> Subject: Re: [Gluster-users] hanging "df" (3.1, infiniband)
>>
>> For the last little while I've been using storage0 as both client and
>> server, so those files are both client and server files at the same
>> time. If it would be helpful, I could go back to using a different
>> host as client (but then 'df' will hang instead of reporting the
>> Transport message).
>>
>> [root at storage0 ~]# cat /etc/glusterd/.cmd_log_history
>> [2010-10-19 07:54:36.244333] peer probe : on host storage1:24007
>> [2010-10-19 07:54:36.249891] peer probe : on host storage1:24007 FAILED
>> [2010-10-19 07:54:43.745558] peer probe : on host storage2:24007
>> [2010-10-19 07:54:43.750752] peer probe : on host storage2:24007 FAILED
>> [2010-10-19 07:54:48.915378] peer probe : on host storage3:24007
>> [2010-10-19 07:54:48.920595] peer probe : on host storage3:24007 FAILED
>> [2010-10-19 07:59:49.737251] Volume create : on volname: RaidData
>> attempted
>> [2010-10-19 07:59:49.737314] Volume create : on volname: RaidData
>> type:DEFAULT count:4 bricks: storage0:/data storage1:/data
>> storage2:/data storage3:/data
>> [2010-10-19 07:59:49.737631] Volume create : on volname: RaidData SUCCESS
>> [2010-10-19 08:01:36.909963] volume start : on volname: RaidData SUCCESS
>>
>> The /var/log file was pretty big, so I put it on pastebin:
>> http://pastebin.com/m6WbHPUp
>>
>>
>> .. Lana (lana.deere at gmail.com)
>>
>>
>>
>>
>>
>>
>> On Tue, Oct 19, 2010 at 6:10 PM, Craig Carl <craig at gluster.com> wrote:
>>> Lana -
>>> Can you also post the contents of
>>>
>>> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
>>> and
>>> /etc/glusterd/.cmd_log_history
>>>
>>> on both the client and server to the list?
>>
>
More information about the Gluster-users
mailing list