[Gluster-users] RDMA "not fully supported" by GlusterFS 3.3.0 ?!

Ling Ho ling at slac.stanford.edu
Mon Jul 23 19:30:02 UTC 2012

On 07/16/2012 09:16 AM, Philippe Muller wrote:
> Hi RedHat & GlusterFS users,
> Last week-end, I worked on a GlusterFS cluster upgrade, from 3.0.3 to 
> 3.3.0.
> We were using hand-made volume files defining 2 volumes, a distributed 
> one, and a replicated-distribute one; both using the "transport-type 
> ib-verbs" option.
> One of our objectives was to use the "gluster" CLI tool (which doesn't 
> existed in 3.0.3 - from what I remember).
> Here is what we did:
> 1 - Shutdown all glusterfs instances
> 2 - Install the Gluster 3.3.0
> 3 - Start glusterd on all hosts
> 4 - Create a trusted pool with all our hosts
> 5 - Create "compatible volumes" using the CLI tool; using the same 
> bricks we were using with our hand-made volfiles and using the "rdma" 
> transport (since ib-verbs was no longer an option...)
> 6 - Mount the volumes
> Of course, we tested that scenario on VMs. No issues with data. We 
> tested everything except.... RDMA !
> When we finally made the upgrade, everything went fine, except 
> mounting the volumes. We got this kind of error messages in the log files:
> "E [rdma.c:4458:tcp_connect_finish] 0-zodiac-client-3: tcp connect to 
> failed (Connection refused)"
> (notice the 2 white spaces between "connect to" and "failed")
> That reminded me of an issue when we had a problem with the subnet manager running on the IB switch. But this time, the switch wasn't responsible; IPoIB was still running fine...
> I scratched my head more than once, thinking about what I could possibly have forgotten. Then I searched for all information I could find about RDMA and 3.3.0.
> Here is what I found:
> - On page 123 of the "GlusterFS Administration Guide 3.3.0", a small note saying: "NOTE: with 3.3.0 release, transport type 'rdma' and 'tcp,rdma' are not fully supported."
> - On July 7, Ling Ho started a thread on this mailing-list, with very similar symptoms:http://www.mail-archive.com/gluster-users@gluster.org/msg09326.html  ; but he doesn't got any answer.
> In the upgrade urgency, we weren't sure rollbacking to 3.0.3 was a good option (since we don't precisely known what XFS attributes were modified by 3.3.0 on the backend FS). So we switched to TCP (over IPoIB).
> It's working. We are now running 3.3.0. But we are no longer taking advantage of RDMA.
> So here are a few questions:
> - Did I missed something that prevented me to use RDMA in 3.3.0 ?
> - Is there a way to use RDMA in 3.3.0 ?
> - Is there any official communication about the 3.3.0 RDMA issue ?
> - Is there a 3.3.x release with RDMA support planned ? For when ?
> - Will the RDMA transport be dropped in future releases ?
> Thanks !
> (and yeah, despite that issue, I still love GlusterFS :-)
> Philippe Muller
I just came back from one week vacation. Yes, I didn't get any reply 
from the list, and were not able to get RDMA working when the server is 
configured for tcp,rdma. When I was doing testing, I had set up the 
server using rdma only and totally missed this.

I ended up using tcp with ipoverib. The performance is much better than 
tcp over 10G/s. However, since I am in a mix environment, and my I have 
to do some static routing on the gluster server. Basically routing the 
ipoverib subnet to the 10G/s subnet which the bricks are all set up 
with. Things have been working fine.


