Philippe Muller philippe.muller at gmail.com
Mon Jul 16 16:16:26 UTC 2012

Hi RedHat & GlusterFS users,

Last week-end, I worked on a GlusterFS cluster upgrade, from 3.0.3 to 3.3.0.
We were using hand-made volume files defining 2 volumes, a distributed one,
and a replicated-distribute one; both using the "transport-type ib-verbs"

One of our objectives was to use the "gluster" CLI tool (which doesn't
existed in 3.0.3 - from what I remember).

Here is what we did:
1 - Shutdown all glusterfs instances
2 - Install the Gluster 3.3.0
3 - Start glusterd on all hosts
4 - Create a trusted pool with all our hosts
5 - Create "compatible volumes" using the CLI tool; using the same bricks
we were using with our hand-made volfiles and using the "rdma" transport
(since ib-verbs was no longer an option...)
6 - Mount the volumes

Of course, we tested that scenario on VMs. No issues with data. We tested
everything except.... RDMA !

When we finally made the upgrade, everything went fine, except mounting the
volumes. We got this kind of error messages in the log files:
"E [rdma.c:4458:tcp_connect_finish] 0-zodiac-client-3: tcp connect to
failed (Connection refused)"
(notice the 2 white spaces between "connect to" and "failed")

That reminded me of an issue when we had a problem with the subnet
manager running on the IB switch. But this time, the switch wasn't
responsible; IPoIB was still running fine...

I scratched my head more than once, thinking about what I could
possibly have forgotten. Then I searched for all information I could
find about RDMA and 3.3.0.

Here is what I found:
- On page 123 of the "GlusterFS Administration Guide 3.3.0", a small
note saying: "NOTE: with 3.3.0 release, transport type 'rdma' and
'tcp,rdma' are not fully supported."

- On July 7, Ling Ho started a thread on this mailing-list, with very
similar symptoms:
http://www.mail-archive.com/gluster-users@gluster.org/msg09326.html ;
but he doesn't got any answer.

In the upgrade urgency, we weren't sure rollbacking to 3.0.3 was a
good option (since we don't precisely known what XFS attributes were
modified by 3.3.0 on the backend FS). So we switched to TCP (over

It's working. We are now running 3.3.0. But we are no longer taking
advantage of RDMA.

So here are a few questions:
- Did I missed something that prevented me to use RDMA in 3.3.0 ?
- Is there a way to use RDMA in 3.3.0 ?

- Is there any official communication about the 3.3.0 RDMA issue ?
- Is there a 3.3.x release with RDMA support planned ? For when ?
- Will the RDMA transport be dropped in future releases ?

Thanks !
(and yeah, despite that issue, I still love GlusterFS :-)

Philippe Muller
