[Gluster-users] problems with replication & NFS

Thu Sep 13 19:17:52 UTC 2012

Greetings,
I'm trying to setup a small glusterFS test cluster, in order to gauge
the feasibility for using it in a large production environment.  I've
been working through the official Admin Guide
(Gluster_File_System-3.3.0-Administration_Guide-en-US.pdf) along with
the website setup instructions (
http://www.gluster.org/community/documentation/index.php/Getting_started_overview
).

What I have are two Fedora16-x86_64 servers, with a 20GB XFS formatted
partition set aside as bricks.  I'm using version 3.3.0.  I setup each
for replication, and it seems like its setup & working:
####
$ gluster volume info gv0

Volume Name: gv0
Type: Replicate
Volume ID: 6c9fbbc7-e382-4f26-afae-60f8658207c5
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.31.99.166:/mnt/sdb1
Brick2: 10.31.99.165:/mnt/sdb1
####

This is where my problems begin.  I assumed that if replication was
truly working, then any changes to the contents of /mnt/sdb1 on one
brick would automatically get replicated to the other brick.  However,
that isn't happening.  In fact, nothing seems to be happening.  I've
added new files, changed pre-existing, yet none of it ever replicates
to the other brick.  Both bricks were empty prior to formatting the
filesystem and setting them up for this test instance.  Surely I must
be missing something obvious, as something this fundamental & basic
must work, right?

Next problem is that my production environment would need to access
the volume via NFS (rather than 'native' gluster).  I had a 3rd system
setup (also with Fedora16-x86_64), and was able to successfully NFS
mount the gluster volume.  Or so I thought.  When I attempted to
simply look at the files on the mount point (using 'ls'), it seemed to
work at first, but then shortly afterwards, it failed with a cryptic
"Invalid argument" error.  So I manually unmounted, then remounted,
and tried again.  Once again, it worked ok for a few seconds, then
died again with the same "Invalid argument" error:
########
[root at cuda-fs3 basebackups]# mount -t nfs -o vers=3,mountproto=tcp
10.31.99.165:/gv0 /mnt/gv0
[root at cuda-fs3 basebackups]# ls -l /mnt/gv0/
total 8
-rw-r--r-- 0 root root 6670 Sep 13 10:21 foo1
[root at cuda-fs3 basebackups]# ls -l /mnt/gv0/
total 8
-rw-r--r-- 0 root root 6670 Sep 13 10:21 foo1
[root at cuda-fs3 basebackups]# ls -l /mnt/gv0/
ls: cannot access /mnt/gv0/foo1: Invalid argument
total 0
-????????? ? ? ? ?            ? foo1
########

The duration between the mount command invocation and the failed 'ls'
command was literally about 5 seconds.  I have numerous other
traditional NFS mounts that work just fine.  Its only the gluster
volume that exhibits this behavior.  I did some googling, and this bug
seems to match my problem exactly:
https://bugzilla.redhat.com/show_bug.cgi?id=800755

I can't quite tell from the bug whether its actually fixed in the
released 3.3.0, or not.  Can someone clarify whether NFS is supposed
to work in 3.3.0 ?  Am I doing something wrong?

thanks!