[Gluster-users] rdma problems with glusterfs 3.1.0

Michael Galloway michael.d.galloway at gmail.com
Thu Oct 28 15:26:24 UTC 2010


Good day all,

I’ve built a new glusterfs volume using 20 nodes of one of my clusters, each
with a 2TB SATA disk, formatted with ext3 (system is centos 5.2, x86_64).
The volume is such:

Volume Name: gfsvol1
Type: Distributed-Replicate
Status: Started
Number of Bricks: 10 x 2 = 20
Transport-type: rdma
Bricks:
Brick1: node002:/gfs
Brick2: node003:/gfs
Brick3: node004:/gfs
Brick4: node005:/gfs
Brick5: node006:/gfs
Brick6: node007:/gfs
Brick7: node008:/gfs
Brick8: node009:/gfs
Brick9: node010:/gfs
Brick10: node011:/gfs
Brick11: node012:/gfs
Brick12: node013:/gfs
Brick13: node014:/gfs
Brick14: node015:/gfs
Brick15: node016:/gfs
Brick16: node017:/gfs
Brick17: node019:/gfs
Brick18: node020:/gfs
Brick19: node021:/gfs
Brick20: node022:/gfs

The volume mounts on a client:

[root at moldyn ~]# mount -t glusterfs -o transport=rdma node002:/gfsvol1
/gfsvol1
[root at moldyn ~]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
glusterfs#node002:/gfsvol1
                     19228583424   2001664 18249825792   1% /gfsvol1

I get this error on a copy into the gluster volume:

[mgx at moldyn ~]$ cp -R pmemd/ /gfsvol1/mgx/pmemd
cp: writing `/gfsvol1/mgx/pmemd/fmdrun.out': Transport endpoint is not
connected
cp: closing `/gfsvol1/mgx/pmemd/fmdrun.out': Resource temporarily
unavailable

it did copy files, just failed on that one:

/gfsvol1/mgx/pmemd/:
total 4357376
-rw-rw-r-- 1 root root  514711552 Oct 27 13:02 fmdrun.out
-rw-rw-r-- 1 mgx  mgx        4754 Oct 27 13:01 fmdrun.out.new
-rw-rw-r-- 1 mgx  mgx   851832631 Oct 27 13:03 fmdrun.out_run1
-rw-rw-r-- 1 mgx  mgx          81 Oct 27 13:01 mdinfo
-rw------- 1 mgx  mgx         803 Oct 27 13:02 md.out
-rw-rw-r-- 1 mgx  mgx         342 Oct 27 13:03 md.sub
-rw-rw-r-- 1 mgx  mgx  1567835776 Oct 27 13:02 new.mdcrd
-rw-rw-r-- 1 mgx  mgx  1522326100 Oct 27 13:01 new.mdcrd_run1
-rw-rw-r-- 1 mgx  mgx      155957 Oct 27 13:02 new.rst
-rw-rw-r-- 1 mgx  mgx      155957 Oct 27 13:01 old.rst
drwxrwxr-x 3 mgx  mgx       40960 Oct 27 13:01 rbenew
-rw-rw-r-- 1 mgx  mgx        1008 Oct 27 13:03 vp_mdrun.in
-rw-rw-r-- 1 mgx  mgx       26190 Oct 27 13:01 vp.prmtop
-rw-rw-r-- 1 mgx  mgx      348092 Oct 27 13:01 vp_wat.prmtop

pmemd/:
total 4711216
-rw-rw-r-- 1 mgx mgx  876818259 Apr  2  2010 fmdrun.out
-rw-rw-r-- 1 mgx mgx       4754 Mar 19  2010 fmdrun.out.new
-rw-rw-r-- 1 mgx mgx  851832631 Mar  6  2010 fmdrun.out_run1
-rw-rw-r-- 1 mgx mgx         81 Apr  2  2010 mdinfo
-rw------- 1 mgx mgx        803 Apr  2  2010 md.out
-rw-rw-r-- 1 mgx mgx        342 Mar 31  2010 md.sub
-rw-rw-r-- 1 mgx mgx 1567835776 Apr  2  2010 new.mdcrd
-rw-rw-r-- 1 mgx mgx 1522326100 Mar  6  2010 new.mdcrd_run1
-rw-rw-r-- 1 mgx mgx     155957 Apr  2  2010 new.rst
-rw-rw-r-- 1 mgx mgx     155957 Mar  9  2010 old.rst
drwxrwxr-x 3 mgx mgx       4096 Mar 31  2010 rbenew
-rw-rw-r-- 1 mgx mgx       1008 Mar  2  2010 vp_mdrun.in
-rw-rw-r-- 1 mgx mgx      26190 Mar  2  2010 vp.prmtop
-rw-rw-r-- 1 mgx mgx     348092 Mar  2  2010 vp_wat.prmtop

The fmdrun.out file is truncated and incorrect ownership.

The volume was created following the 3.1 docu.

Where is the problem at? Gluster? IB? my ib is ofed 1.3.1 and I have SDR
mellenox HCA’s.

--- michael


More information about the Gluster-users mailing list