[Gluster-users] infiniband speed

Tue Jan 11 00:43:18 UTC 2011

I am testing Infiniband for the first time. It seems that I should be able to get a lot more speed than I am with some pretty basic tests. Maybe someone running Infiniband can confirm that what I am seeing is way out of line, and/or help diagnose? 

I have two systems connected using 3.1.2qa3. With 3.1.1 infiniband wouldn't even start, it gave an error about unable to intialize rdma. But with the latest version and an upgrade to OFED 1.5.2, everything starts up with no errors and I can create a volume and mount it. 

The underlying Infiniband seems ok, and a basic ibv_rc_pingpong test shows I can move data pretty fast:
81920000 bytes in 0.23 seconds = 2858.45 Mbit/sec
10000 iters in 0.23 seconds = 22.93 usec/iter

So now I have two volumes created, one that uses tcp over a gig-e link and one that uses rdma. I mount them and do some file copy tests... And they are almost exactly the same? What? 

gluster volume info

Volume Name: test2_volume
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: bravo:/cluster/shadow/test2
Brick2: backup:/cluster/shadow/test2

Volume Name: test_volume
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: rdma
Bricks:
Brick1: bravo:/cluster/shadow/test
Brick2: backup:/cluster/shadow/test

mount:
glusterfs#localhost:/test_volume on /mnt/test type fuse (rw,allow_other,default_permissions,max_read=131072)
glusterfs#localhost:/test2_volume on /mnt/test2 type fuse (rw,allow_other,default_permissions,max_read=131072)

time cp files.tar /mnt/test2/

real    0m11.159s
user    0m0.123s
sys     0m1.244s

files.tar is single file, 390MB, so this about 35MB/s. Fine for gig-e. 
----------------------------

time cp files.tar /mnt/test/

real    0m5.656s
user    0m0.116s
sys     0m0.962s

69MB/s... ehhh. Faster at least. On a few runs, this was not any faster at all. Maybe a cache effect? 
----------------------------

time cp -av /usr/src/kernels /mnt/test2/
real    0m49.605s
user    0m0.681s
sys     0m2.593s

kernels dir is 34MB of small files. The low latency of IB should really show an improvement here I thought. 
-----------------------------

time cp -av /usr/src/kernels /mnt/test/

real    0m56.046s
user    0m0.625s
sys     0m2.675s

It took LONGER? That can't be right. 
------------------------------

And finally, this error is appearing in the rdma mount log every 3 seconds on both nodes:

[2011-01-10 19:46:56.728127] E [rdma.c:4428:tcp_connect_finish] test_volume-client-1: tcp connect to  failed (Connection refused)
[2011-01-10 19:46:59.738291] E [rdma.c:4428:tcp_connect_finish] test_volume-client-1: tcp connect to  failed (Connection refused)
[2011-01-10 19:47:02.748260] E [rdma.c:4428:tcp_connect_finish] test_volume-client-1: tcp connect to  failed (Connection refused)
[2011-01-10 19:47:05.758256] E [rdma.c:4428:tcp_connect_finish] test_volume-client-1: tcp connect to  failed (Connection refused)
[2011-01-10 19:47:08.768299] E [rdma.c:4428:tcp_connect_finish] test_volume-client-1: tcp connect to  failed (Connection refused)
[2011-01-10 19:47:11.778308] E [rdma.c:4428:tcp_connect_finish] test_volume-client-1: tcp connect to  failed (Connection refused)
[2011-01-10 19:47:14.788356] E [rdma.c:4428:tcp_connect_finish] test_volume-client-1: tcp connect to  failed (Connection refused)
[2011-01-10 19:47:17.798381] E [rdma.c:4428:tcp_connect_finish] test_volume-client-1: tcp connect to  failed (Connection refused)
[2011-01-10 19:47:20.808413] E [rdma.c:4428:tcp_connect_finish] test_volume-client-1: tcp connect to  failed (Connection refused)

But there are no restrictions in the config. Everything is allow *. So my questions are, can anyone else tell me what kind of basic file copy performance they see using IB? And what can I do to troubleshoot?

Thanks List and Devs, 

Chris