[Gluster-users] Issue in RDMA transport
raghavendra at gluster.com
Wed Feb 23 05:25:49 UTC 2011
Is it possible for you to repeat tests after patching glusterfs with attached patch? Also please send us the entire server log files after you run into the issue.
----- Original Message -----
> From: "Beat Rubischon" <beat at 0x1b.ch>
> To: gluster-users at gluster.org
> Sent: Monday, February 21, 2011 10:51:23 AM
> Subject: [Gluster-users] Issue in RDMA transport
> I found some memory corruption in the RDMA transport layer.
> Setup is CentOS 5.5, Mellanox OFED 1.5.2 / OpenFabrics OFED 1.5.2,
> ConnectX-2 cards, GlusterFS 3.1.2 / Git Master Branch.
> Application is ANSYS CFX wit transient cases, running with strange
> corecounts like 6 or 12.
> Symptoms are failure during the write out of the case. Errors are
> recorded in the brick's and client's logs:
> [2011-02-04 15:41:19.688110] W [fuse-bridge.c:1761:fuse_writev_cbk]
> glusterfs-fuse: 29810266: WRITE => -1 (Bad address)
> [2011-02-04 15:41:19.687733] E [posix.c:2504:posix_writev] home-posix:
> write failed: offset 538534184, Bad address
> I was able to reproduce the error using a single brick and a single
> client. Running server and client on the same system didn't pop up the
> error, the data must pass a wire to trigger the bug. Switching to TCP
> over IPoIB was a successful workaround.
> It looks like a pointer in the iovec structure used by the writev is
> screwed up during the transport over RDMA. I can imagine that the
> debugging would be rather hard, hopefully you'll be able to find the
> root cause. Feel free to ask for additional logs or traces, I'll try
> provide them.
> \|/ Beat Rubischon <beat at 0x1b.ch>
> ( 0-0 ) http://www.0x1b.ch/~beat/
> Meine Erlebnisse, Gedanken und Traeume: http://www.0x1b.ch/blog/
> Gluster-users mailing list
> Gluster-users at gluster.org
More information about the Gluster-users