[Gluster-users] rdma problems with glusterfs 3.1.0
Lana Deere
lana.deere at gmail.com
Thu Oct 28 16:50:26 UTC 2010
Michael,
I was having at least a similar symptom to the "Transport endpoint is
not connected" message you list, and in my case it was because I was
using a version of ofed which wasn't good enough. When I started
using ofed 1.5.1 then that problem went away.
You might look at the archives for a thread "hanging "df" (3.1,
infiniband)" from Oct 19th which contains the record of diagnosis and
repair, in case it offers you any help.
.. Lana (lana.deere at gmail.com)
On Thu, Oct 28, 2010 at 11:26 AM, Michael Galloway
<michael.d.galloway at gmail.com> wrote:
> Good day all,
>
> I’ve built a new glusterfs volume using 20 nodes of one of my clusters, each
> with a 2TB SATA disk, formatted with ext3 (system is centos 5.2, x86_64).
> The volume is such:
>
> Volume Name: gfsvol1
> Type: Distributed-Replicate
> Status: Started
> Number of Bricks: 10 x 2 = 20
> Transport-type: rdma
> Bricks:
> Brick1: node002:/gfs
> Brick2: node003:/gfs
> Brick3: node004:/gfs
> Brick4: node005:/gfs
> Brick5: node006:/gfs
> Brick6: node007:/gfs
> Brick7: node008:/gfs
> Brick8: node009:/gfs
> Brick9: node010:/gfs
> Brick10: node011:/gfs
> Brick11: node012:/gfs
> Brick12: node013:/gfs
> Brick13: node014:/gfs
> Brick14: node015:/gfs
> Brick15: node016:/gfs
> Brick16: node017:/gfs
> Brick17: node019:/gfs
> Brick18: node020:/gfs
> Brick19: node021:/gfs
> Brick20: node022:/gfs
>
> The volume mounts on a client:
>
> [root at moldyn ~]# mount -t glusterfs -o transport=rdma node002:/gfsvol1
> /gfsvol1
> [root at moldyn ~]# df
> Filesystem 1K-blocks Used Available Use% Mounted on
> glusterfs#node002:/gfsvol1
> 19228583424 2001664 18249825792 1% /gfsvol1
>
> I get this error on a copy into the gluster volume:
>
> [mgx at moldyn ~]$ cp -R pmemd/ /gfsvol1/mgx/pmemd
> cp: writing `/gfsvol1/mgx/pmemd/fmdrun.out': Transport endpoint is not
> connected
> cp: closing `/gfsvol1/mgx/pmemd/fmdrun.out': Resource temporarily
> unavailable
>
> it did copy files, just failed on that one:
>
> /gfsvol1/mgx/pmemd/:
> total 4357376
> -rw-rw-r-- 1 root root 514711552 Oct 27 13:02 fmdrun.out
> -rw-rw-r-- 1 mgx mgx 4754 Oct 27 13:01 fmdrun.out.new
> -rw-rw-r-- 1 mgx mgx 851832631 Oct 27 13:03 fmdrun.out_run1
> -rw-rw-r-- 1 mgx mgx 81 Oct 27 13:01 mdinfo
> -rw------- 1 mgx mgx 803 Oct 27 13:02 md.out
> -rw-rw-r-- 1 mgx mgx 342 Oct 27 13:03 md.sub
> -rw-rw-r-- 1 mgx mgx 1567835776 Oct 27 13:02 new.mdcrd
> -rw-rw-r-- 1 mgx mgx 1522326100 Oct 27 13:01 new.mdcrd_run1
> -rw-rw-r-- 1 mgx mgx 155957 Oct 27 13:02 new.rst
> -rw-rw-r-- 1 mgx mgx 155957 Oct 27 13:01 old.rst
> drwxrwxr-x 3 mgx mgx 40960 Oct 27 13:01 rbenew
> -rw-rw-r-- 1 mgx mgx 1008 Oct 27 13:03 vp_mdrun.in
> -rw-rw-r-- 1 mgx mgx 26190 Oct 27 13:01 vp.prmtop
> -rw-rw-r-- 1 mgx mgx 348092 Oct 27 13:01 vp_wat.prmtop
>
> pmemd/:
> total 4711216
> -rw-rw-r-- 1 mgx mgx 876818259 Apr 2 2010 fmdrun.out
> -rw-rw-r-- 1 mgx mgx 4754 Mar 19 2010 fmdrun.out.new
> -rw-rw-r-- 1 mgx mgx 851832631 Mar 6 2010 fmdrun.out_run1
> -rw-rw-r-- 1 mgx mgx 81 Apr 2 2010 mdinfo
> -rw------- 1 mgx mgx 803 Apr 2 2010 md.out
> -rw-rw-r-- 1 mgx mgx 342 Mar 31 2010 md.sub
> -rw-rw-r-- 1 mgx mgx 1567835776 Apr 2 2010 new.mdcrd
> -rw-rw-r-- 1 mgx mgx 1522326100 Mar 6 2010 new.mdcrd_run1
> -rw-rw-r-- 1 mgx mgx 155957 Apr 2 2010 new.rst
> -rw-rw-r-- 1 mgx mgx 155957 Mar 9 2010 old.rst
> drwxrwxr-x 3 mgx mgx 4096 Mar 31 2010 rbenew
> -rw-rw-r-- 1 mgx mgx 1008 Mar 2 2010 vp_mdrun.in
> -rw-rw-r-- 1 mgx mgx 26190 Mar 2 2010 vp.prmtop
> -rw-rw-r-- 1 mgx mgx 348092 Mar 2 2010 vp_wat.prmtop
>
> The fmdrun.out file is truncated and incorrect ownership.
>
> The volume was created following the 3.1 docu.
>
> Where is the problem at? Gluster? IB? my ib is ofed 1.3.1 and I have SDR
> mellenox HCA’s.
>
> --- michael
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>
More information about the Gluster-users
mailing list