[Gluster-users] RDMA Failure with 3.1.3

scandic scandic at rootblog.org
Wed Apr 6 20:31:16 UTC 2011

I just installed GlusterFS 3.1.3 on Arch Linux x86_64 with Kernel and libibverbs 1.1.4. It's a 2 Node replicated setup over rdma:

gluster> peer status
Number of Peers: 1
Hostname: easyStor-252
Uuid: 6766b083-8a12-41a8-984e-dd3ca2f1c536
State: Peer in Cluster (Connected)

gluster> volume info all
Volume Name: Storage-1
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: rdma
Brick1: easyStor-251:/var/export
Brick2: easyStor-252:/var/export
Options Reconfigured:
auth.allow: 10.1.*

When I try to mount using "mount -t glusterfs easyStor-251:/Storage-1 /mnt" I get no output. When I try to chdir to /mnt or call df the process hangs.

Here's my logfile content:
[2011-04-06 21:20:50.678395] W [io-stats.c:1644:init] 0-Storage-1: dangling volume. check volfile 
[2011-04-06 21:20:50.678516] W [dict.c:1205:data_to_str] 0-dict: @data=(nil)
[2011-04-06 21:20:50.678545] W [dict.c:1205:data_to_str] 0-dict: @data=(nil)
[2011-04-06 21:20:50.682292] C [rdma.c:3953:rdma_init] 0-rpc-transport/rdma: No IB devices found
[2011-04-06 21:20:50.682371] E [rdma.c:4826:init] 0-Storage-1-client-1: Failed to initialize IB Device
[2011-04-06 21:20:50.682397] E [rpc-transport.c:849:rpc_transport_load] 0-rpc-tr
ansport: 'rdma' initialization failed
pending frames:

patchset: v3.1.3
signal received: 6

So it seems it can't find the IB device. According to lsmod output the modules are loaded:
Module                  Size  Used by
xprtrdma               38186  0 
svcrdma                31379  0 
sunrpc                190513  2 xprtrdma,svcrdma
rdma_ucm               12356  0 
rdma_cm                29317  3 xprtrdma,svcrdma,rdma_ucm
iw_cm                   7033  1 rdma_cm
ib_addr                 4965  1 rdma_cm
ib_ucm                 11398  0 
ib_uverbs              27035  2 rdma_ucm,ib_ucm
ib_umad                10506  0 
ib_ipoib               71040  0 
ib_cm                  30040  3 rdma_cm,ib_ucm,ib_ipoib
ib_sa                  18440  4 rdma_ucm,rdma_cm,ib_ipoib,ib_cm
ipv6                  277133  27 bridge,ib_addr,ib_ipoib
ib_mad                 35558  4 ib_umad,ib_cm,ib_sa,ib_mthca 
ib_core                44827  13 xprtrdma,svcrdma,rdma_ucm,rdma_cm,iw_cm,ib_ucm,ib_uverbs,ib_umad,ib_ipoib,ib_cm,ib_sa,ib_mthca,ib_mad
fuse                   64671  3

And also dmesg tells me the HCA has been found:
ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 2008)
ib_mthca: Initializing 0000:0a:00.0
ib_mthca 0000:0a:00.0: PCI INT A -> GSI 32 (level, low) -> IRQ 32
ib_mthca 0000:0a:00.0: setting latency timer to 64
ib_mthca 0000:0a:00.0: HCA FW version 1.0.800 is old (1.2.000 is current).
ib_mthca 0000:0a:00.0: If you have problems, try updating your HCA FW.
ib_mthca 0000:0a:00.0: irq 68 for MSI/MSI-X
ib_mthca 0000:0a:00.0: irq 69 for MSI/MSI-X
ib_mthca 0000:0a:00.0: irq 70 for MSI/MSI-X

There's no firewall active. Would be great if someone can lead me to the right direction.

More information about the Gluster-users mailing list