[Gluster-users] hanging "df" (3.1, infiniband)

Lana Deere lana.deere at gmail.com
Tue Oct 19 23:02:11 UTC 2010


They show up in ibhosts and I can ping or ssh via IPoIB to them, but
perhaps they are not completely configured properly.  Or perhaps I
have mixed some references to the regular Ethernet into the
configuration for rdma?  Anyway, here are the outputs you requested:

[root at storage0 ~]# lsmod
Module                  Size  Used by
iptable_filter         36161  0
ip_tables              55201  1 iptable_filter
x_tables               50505  1 ip_tables
fuse                   83057  1
autofs4                63049  3
hidp                   83521  2
rfcomm                104937  0
l2cap                  89409  10 hidp,rfcomm
bluetooth             118853  5 hidp,rfcomm,l2cap
lockd                 101553  0
sunrpc                199945  2 lockd
cpufreq_ondemand       42449  8
acpi_cpufreq           47937  0
freq_table             38977  2 cpufreq_ondemand,acpi_cpufreq
ib_iser                69569  0
libiscsi2              77765  1 ib_iser
scsi_transport_iscsi2    74073  2 ib_iser,libiscsi2
scsi_transport_iscsi    35017  1 scsi_transport_iscsi2
ib_srp                 67465  0
rds                   401393  0
ib_sdp                144285  0
ib_ipoib              113057  0
ipoib_helper           35537  2 ib_ipoib
ipv6                  435489  77 ib_ipoib
xfrm_nalgo             43333  1 ipv6
crypto_api             42945  1 xfrm_nalgo
rdma_ucm               47681  0
rdma_cm                68437  4 ib_iser,rds,ib_sdp,rdma_ucm
ib_ucm                 50121  0
ib_uverbs              68720  2 rdma_ucm,ib_ucm
ib_umad                50153  0
ib_cm                  72809  4 ib_srp,ib_ipoib,rdma_cm,ib_ucm
iw_cm                  43465  1 rdma_cm
ib_addr                41929  1 rdma_cm
ib_sa                  74953  4 ib_srp,ib_ipoib,rdma_cm,ib_cm
mlx4_ib                94461  0
ib_mad                 70629  4 ib_umad,ib_cm,ib_sa,mlx4_ib
ib_core               104901  15
ib_iser,ib_srp,rds,ib_sdp,ib_ipoib,rdma_ucm,rdma_cm,ib_ucm,ib_uverbs,ib_umad,ib_cm,iw_cm,ib_sa,mlx4_ib,ib_mad
xfs                   508625  1
loop                   48721  0
dm_mirror              54737  0
dm_multipath           56921  0
scsi_dh                42177  1 dm_multipath
raid456               152417  1
xor                    38865  1 raid456
video                  53197  0
backlight              39873  1 video
sbs                    49921  0
power_meter            47053  0
hwmon                  36553  1 power_meter
i2c_ec                 38593  1 sbs
dell_wmi               37601  0
wmi                    41985  1 dell_wmi
button                 40545  0
battery                43849  0
asus_acpi              50917  0
acpi_memhotplug        40517  0
ac                     38729  0
parport_pc             62313  0
lp                     47121  0
parport                73165  2 parport_pc,lp
mlx4_en               107985  0
joydev                 43969  0
i2c_i801               41813  0
igb                   122709  0
i2c_core               56641  2 i2c_ec,i2c_i801
8021q                  57425  1 igb
shpchp                 70893  0
mlx4_core             152773  2 mlx4_ib,mlx4_en
serio_raw              40517  0
dca                    41221  1 igb
sg                     70377  0
pcspkr                 36289  0
dm_raid45              99657  0
dm_message             36289  1 dm_raid45
dm_region_hash         46145  1 dm_raid45
dm_log                 44993  3 dm_mirror,dm_raid45,dm_region_hash
dm_mod                101649  4 dm_mirror,dm_multipath,dm_raid45,dm_log
dm_mem_cache           38977  1 dm_raid45
mpt2sas               159337  12
scsi_transport_sas     66753  1 mpt2sas
ahci                   69705  6
libata                209489  1 ahci
sd_mod                 56513  32
scsi_mod              196953  10
ib_iser,libiscsi2,scsi_transport_iscsi2,ib_srp,scsi_dh,sg,mpt2sas,scsi_transport_sas,libata,sd_mod
raid1                  56001  3
ext3                  168913  2
jbd                    94769  1 ext3
uhci_hcd               57433  0
ohci_hcd               56309  0
ehci_hcd               66125  0
[root at storage0 ~]# ibv_devinfo
libibverbs: Warning: no userspace device-specific driver found for
/sys/class/infiniband_verbs/uverbs0
No IB devices found
[root at storage0 ~]# lspci
00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 22)
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
Express Root Port 1 (rev 22)
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
Express Root Port 3 (rev 22)
00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express
Root Port 5 (rev 22)
00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
Express Root Port 7 (rev 22)
00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
Express Root Port 9 (rev 22)
00:13.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub I/OxAPIC
Interrupt Controller (rev 22)
00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management
Registers (rev 22)
00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch
Pad Registers (rev 22)
00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status
and RAS Registers (rev 22)
00:14.3 PIC: Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers (rev 22)
00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #4
00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #5
00:1a.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #6
00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2
EHCI Controller #2
00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #1
00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #2
00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #3
00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2
EHCI Controller #1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller
00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA
AHCI Controller
00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)
02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic
SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 02)
05:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe
2.0 5GT/s - IB QDR / 10GigE] (rev b0)
06:01.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW
WPCM450 (rev 0a)
[root at storage0 ~]# /etc/init.d/openibd status
Low level hardware support loaded:
        mlx4_ib

Upper layer protocol modules:
        ib_iser ib_srp rds ib_sdp ib_ipoib

User space access modules:
        rdma_ucm ib_ucm ib_uverbs ib_umad

Connection management modules:
        rdma_cm ib_cm iw_cm

Configured IPoIB interfaces: ib0
Currently active IPoIB interfaces: ib0
[root at storage0 ~]#



.. Lana (lana.deere at gmail.com)






On Tue, Oct 19, 2010 at 6:48 PM, Craig Carl <craig at gluster.com> wrote:
> Lana -
>  The first couple of lines of the log identify our problem -
>
> [2010-10-19 07:47:49.315416] C [rdma.c:3817:rdma_init] rpc-transport/rdma:
> No IB devices found
> [2010-10-19 07:47:49.315438] E [rdma.c:4744:init] rdma.management: Failed to
> initialize IB Device
> [2010-10-19 07:47:49.315452] E [rpc-transport.c:965:rpc_transport_load]
> rpc-transport: 'rdma' initialization failed
>
> Are you sure your IB cards are working? Can you send the output of -
>
> # lsmod
> # ibv_devinfo
> # lspci
> # /etc/init.d/openibd status
>
>
>
> Thanks,
>
> Craig
>
> --
> Craig Carl
> Senior Systems Engineer; Gluster, Inc.
> Cell - (408) 829-9953 (California, USA)
> Office - (408) 770-1884
> Gtalk - craig.carl at gmail.com
> Twitter - @gluster
> Installing Gluster Storage Platform, the movie!
> http://rackerhacker.com/2010/08/11/one-month-with-glusterfs-in-production/
>
>
> ________________________________
> From: "Lana Deere" <lana.deere at gmail.com>
> To: "Craig Carl" <craig at gluster.com>
> Cc: gluster-users at gluster.org, landman at scalableinformatics.com
> Sent: Tuesday, October 19, 2010 3:29:41 PM
> Subject: Re: [Gluster-users] hanging "df" (3.1, infiniband)
>
> For the last little while I've been using storage0 as both client and
> server, so those files are both client and server files at the same
> time.  If it would be helpful, I could go back to using a different
> host as client (but then 'df' will hang instead of reporting the
> Transport message).
>
> [root at storage0 ~]# cat /etc/glusterd/.cmd_log_history
> [2010-10-19 07:54:36.244333] peer probe :  on host storage1:24007
> [2010-10-19 07:54:36.249891] peer probe : on host storage1:24007 FAILED
> [2010-10-19 07:54:43.745558] peer probe :  on host storage2:24007
> [2010-10-19 07:54:43.750752] peer probe : on host storage2:24007 FAILED
> [2010-10-19 07:54:48.915378] peer probe :  on host storage3:24007
> [2010-10-19 07:54:48.920595] peer probe : on host storage3:24007 FAILED
> [2010-10-19 07:59:49.737251] Volume create : on volname: RaidData attempted
> [2010-10-19 07:59:49.737314] Volume create : on volname: RaidData
> type:DEFAULT count:4 bricks: storage0:/data storage1:/data
> storage2:/data storage3:/data
> [2010-10-19 07:59:49.737631] Volume create : on volname: RaidData SUCCESS
> [2010-10-19 08:01:36.909963] volume start : on volname: RaidData SUCCESS
>
> The /var/log file was pretty big, so I put it on pastebin:
>     http://pastebin.com/m6WbHPUp
>
>
> .. Lana (lana.deere at gmail.com)
>
>
>
>
>
>
> On Tue, Oct 19, 2010 at 6:10 PM, Craig Carl <craig at gluster.com> wrote:
>> Lana -
>>    Can you also post the contents of
>>
>> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
>> and
>> /etc/glusterd/.cmd_log_history
>>
>> on both the client and server to the list?
>



More information about the Gluster-users mailing list