[Gluster-users] Fwd: RDMA status in Gluster 3.4.1 (mounts hanging)

Shane StClair shane at axiomalaska.com
Tue Oct 8 23:22:37 UTC 2013


Hi all,

After many days of experimentation, doc and mailing list reading, irc
asking, etc, I think the crippled RDMA status in current versions of
Gluster (3.3.x - 3.4.1) is a known issue. I'd like to confirm that, share
my findings, and ask about any status updates/timelines.
*
*
After noticing that RDMA mounts were hanging with a new install of Gluster
3.4.1, I tested a series of different Gluster volumes. Simple (single
brick), distributed, replicate, and distributed-replicate volumes were each
tested for both tcp and rdma transport types. Detailed results are below,
but the short version is that while *all volume types worked over tcp, only
the simple (single brick) volume worked using rdma. All other volume types
failed over rdma, *meaning that mount commands from the client hung forever.

*Environment details:*
OS: Debian Wheezy
Server type: Dell M610
Gluster version: 3.4.1, from Gluster Debian repository
Infiniband software: OFED 1.4.2, from Debian Wheezy stock packages
Infiniband card info: http://fpaste.org/45305/81273796/
Loaded modules: http://fpaste.org/45306/73881138/
*
*
*RDMA successful configs:*
Single brick

*RDMA failed configs:*
Distributed (2 bricks)
Replicate (2 bricks)
Distributed-Replicate (2 x 2 bricks)

*TCP successful configs (all):*
Single brick
Distributed (2 bricks)
Replicate (2 bricks)
Distributed-Replicate (2 x 2 bricks)

*Example RDMA volume creation command:*
gluster volume create dist-rdma transport rdma
192.168.255.120:/home/axiom/dist-rdma-1
192.168.255.120:/home/axiom/dist-rdma-2

*Example RDMA mounting command:*
mount -t glusterfs -o transport=rdma 192.168.255.120:/dist-rdma dist-rdma

*Logs from example failed RDMA config (distributed/two bricks):*
gluster volume info: http://fpaste.org/45298/38127208/
gluster volume status: http://fpaste.org/45299/13812721/
glusterd.vol.log excerpt: http://fpaste.org/45302/13812722/
client log: http://fpaste.org/45303/38127234/

These results somewhat agree with Justin Clift's findings during the
GlusterFest (
http://www.gluster.org/community/documentation/index.php/GlusterFest)
testing, which evolved into this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=978148

However, in the bug report it's mentioned that only the
distributed-replicate volume variant is failing, while I'm seeing
distributed and replicate volumes fail also.

I'd be happy to create a new bug or update the existing bug if needed. Let
me know if any additional information is needed.

Also, I dunno if there's a proper place to post a warning about RDMA's
status, but it seems that a handful of people have banged their head
against this problem. I'd suggest that if the resource don't exist to
address this issue by 3.4.2 that a warning be issued when creating an RDMA
volume, or perhaps that RDMA volume creation be disabled altogether.

Please let me know if we can be of any help in the future (testing, log
output, etc).

Best,
Shane

-- 
Shane StClair
Software Engineer
Axiom Consulting & Design
http://www.axiomalaska.com



-- 
Shane StClair
Software Engineer
Axiom Consulting & Design
http://www.axiomalaska.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131008/395f46b9/attachment.html>


More information about the Gluster-users mailing list