[Gluster-users] rdma problems with glusterfs 3.1.0

Thu Oct 28 16:50:26 UTC 2010

Michael,

I was having at least a similar symptom to the "Transport endpoint is
not connected" message you list, and in my case it was because I was
using a version of ofed which wasn't good enough.  When I started
using ofed 1.5.1 then that problem went away.

You might look at the archives for a thread "hanging "df" (3.1,
infiniband)" from Oct 19th which contains the record of diagnosis and
repair, in case it offers you any help.

.. Lana (lana.deere at gmail.com)

On Thu, Oct 28, 2010 at 11:26 AM, Michael Galloway
<michael.d.galloway at gmail.com> wrote:
> Good day all,
>
> I’ve built a new glusterfs volume using 20 nodes of one of my clusters, each
> with a 2TB SATA disk, formatted with ext3 (system is centos 5.2, x86_64).
> The volume is such:
>
> Volume Name: gfsvol1
> Type: Distributed-Replicate
> Status: Started
> Number of Bricks: 10 x 2 = 20
> Transport-type: rdma
> Bricks:
> Brick1: node002:/gfs
> Brick2: node003:/gfs
> Brick3: node004:/gfs
> Brick4: node005:/gfs
> Brick5: node006:/gfs
> Brick6: node007:/gfs
> Brick7: node008:/gfs
> Brick8: node009:/gfs
> Brick9: node010:/gfs
> Brick10: node011:/gfs
> Brick11: node012:/gfs
> Brick12: node013:/gfs
> Brick13: node014:/gfs
> Brick14: node015:/gfs
> Brick15: node016:/gfs
> Brick16: node017:/gfs
> Brick17: node019:/gfs
> Brick18: node020:/gfs
> Brick19: node021:/gfs
> Brick20: node022:/gfs
>
> The volume mounts on a client:
>
> [root at moldyn ~]# mount -t glusterfs -o transport=rdma node002:/gfsvol1
> /gfsvol1
> [root at moldyn ~]# df
> Filesystem           1K-blocks      Used Available Use% Mounted on
> glusterfs#node002:/gfsvol1
>                     19228583424   2001664 18249825792   1% /gfsvol1
>
> I get this error on a copy into the gluster volume:
>
> [mgx at moldyn ~]$ cp -R pmemd/ /gfsvol1/mgx/pmemd
> cp: writing `/gfsvol1/mgx/pmemd/fmdrun.out': Transport endpoint is not
> connected
> cp: closing `/gfsvol1/mgx/pmemd/fmdrun.out': Resource temporarily
> unavailable
>
> it did copy files, just failed on that one:
>
> /gfsvol1/mgx/pmemd/:
> total 4357376
> -rw-rw-r-- 1 root root  514711552 Oct 27 13:02 fmdrun.out
> -rw-rw-r-- 1 mgx  mgx        4754 Oct 27 13:01 fmdrun.out.new
> -rw-rw-r-- 1 mgx  mgx   851832631 Oct 27 13:03 fmdrun.out_run1
> -rw-rw-r-- 1 mgx  mgx          81 Oct 27 13:01 mdinfo
> -rw------- 1 mgx  mgx         803 Oct 27 13:02 md.out
> -rw-rw-r-- 1 mgx  mgx         342 Oct 27 13:03 md.sub
> -rw-rw-r-- 1 mgx  mgx  1567835776 Oct 27 13:02 new.mdcrd
> -rw-rw-r-- 1 mgx  mgx  1522326100 Oct 27 13:01 new.mdcrd_run1
> -rw-rw-r-- 1 mgx  mgx      155957 Oct 27 13:02 new.rst
> -rw-rw-r-- 1 mgx  mgx      155957 Oct 27 13:01 old.rst
> drwxrwxr-x 3 mgx  mgx       40960 Oct 27 13:01 rbenew
> -rw-rw-r-- 1 mgx  mgx        1008 Oct 27 13:03 vp_mdrun.in
> -rw-rw-r-- 1 mgx  mgx       26190 Oct 27 13:01 vp.prmtop
> -rw-rw-r-- 1 mgx  mgx      348092 Oct 27 13:01 vp_wat.prmtop
>
> pmemd/:
> total 4711216
> -rw-rw-r-- 1 mgx mgx  876818259 Apr  2  2010 fmdrun.out
> -rw-rw-r-- 1 mgx mgx       4754 Mar 19  2010 fmdrun.out.new
> -rw-rw-r-- 1 mgx mgx  851832631 Mar  6  2010 fmdrun.out_run1
> -rw-rw-r-- 1 mgx mgx         81 Apr  2  2010 mdinfo
> -rw------- 1 mgx mgx        803 Apr  2  2010 md.out
> -rw-rw-r-- 1 mgx mgx        342 Mar 31  2010 md.sub
> -rw-rw-r-- 1 mgx mgx 1567835776 Apr  2  2010 new.mdcrd
> -rw-rw-r-- 1 mgx mgx 1522326100 Mar  6  2010 new.mdcrd_run1
> -rw-rw-r-- 1 mgx mgx     155957 Apr  2  2010 new.rst
> -rw-rw-r-- 1 mgx mgx     155957 Mar  9  2010 old.rst
> drwxrwxr-x 3 mgx mgx       4096 Mar 31  2010 rbenew
> -rw-rw-r-- 1 mgx mgx       1008 Mar  2  2010 vp_mdrun.in
> -rw-rw-r-- 1 mgx mgx      26190 Mar  2  2010 vp.prmtop
> -rw-rw-r-- 1 mgx mgx     348092 Mar  2  2010 vp_wat.prmtop
>
> The fmdrun.out file is truncated and incorrect ownership.
>
> The volume was created following the 3.1 docu.
>
> Where is the problem at? Gluster? IB? my ib is ofed 1.3.1 and I have SDR
> mellenox HCA’s.
>
> --- michael
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>