[Gluster-devel] Re: NFS reexport status

Brent A Nelson brent at phys.ufl.edu
Wed Aug 8 16:25:15 UTC 2007


On Wed, 8 Aug 2007, Krishna Srinivas wrote:

> Hi Brent,
>
> Thanks. So if you use storage/posix under afr, you don't see
> problem in nfs reexport.

Correct, that worked fine.  Once I introduced protocol/client and 
protocol/server, though, rsync -aH /usr/ /mount/nfs0/ gives I/O errors and 
an inconsistent copy.

> We are not able to reproduce this behaviour here.

Did you try with the spec files I sent you (they only need two directories 
available on a single machine), with an rsync of your /usr partition 
to the NFS reexport (this can also be done via localhost, no additional 
machines needed)? You are using the kernel NFS server, I assume, not one 
of the user-mode NFS servers?

> Can you give us access to your machines? is it possible?
>

Yes, if the above doesn't do the trick, we can coordinate some way to get 
you access.  Do you have an SSH public key I could add as an authorized 
key?

Thanks,

Brent

> On 8/8/07, Brent A Nelson <brent at phys.ufl.edu> wrote:
>> Today, I tried switching to the Gluster-modified fuse-2.7.0, but I still
>> encountered the same misbehavior with NFS reexport.  Heads-up: like
>> someone else on the mailing list, I found that GlusterFS performance is
>> MUCH slower with 2.7.0 than with my old 2.6.3, at least for simple "du"
>> tests...
>>
>> Failing that, I thought I'd try to figure out the simplest specs to
>> exhibit the issue; see attached.  I first tried glusterfs (no glusterfsd);
>> it worked for a simple afr as well as unification of two afrs with no NFS
>> reexport trouble.  As soon as I introduced a glusterfsd exporting to the
>> glusterfs via protocol/client and protocol/server (via localhost),
>> however, the rsync problems appeared.  I didn't see the issues with du in
>> this simple setup, though (perhaps that problem will disappear when this
>> problem is fixed, perhaps not).
>>
>> Thanks,
>>
>> Brent
>>
>> On Tue, 7 Aug 2007, Krishna Srinivas wrote:
>>
>>> Hi Brent,
>>>
>>> Those messages in log are harmless, I have removed them from the
>>> source. Can you mail the spec files? I will see again if it can be
>>> repro'd
>>>
>>> Thanks
>>> Krishna
>>>
>>>
>>> On 8/7/07, Brent A Nelson <brent at phys.ufl.edu> wrote:
>>>> I added debugging to all the AFR subvolumes.  On the du test, all it
>>>> produced were lines like this over and over:
>>>> 2007-08-06 17:23:41 C [dict.c:1094:data_to_ptr] libglusterfs/dict:
>>>> @data=(nil)
>>>>
>>>> For the rsync (in addition to the @data=(nil) messages):
>>>> rsync -a /tmp/blah/usr0/ /tmp/blah/nfs0/
>>>> rsync: readdir("/tmp/blah/usr0/share/perl/5.8.8/unicore/lib/gc_sc"):
>>>> Input/output error (5)
>>>> rsync: readdir("/tmp/blah/usr0/share/man/man1"): Input/output error (5)
>>>> rsync: readdir("/tmp/blah/usr0/share/man/man8"): Input/output error (5)
>>>> rsync: readdir("/tmp/blah/usr0/bin"): Input/output error (5)
>>>> rsync:
>>>> readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/linux"):
>>>> Input/output error (5)
>>>> rsync:
>>>> readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16-server/include/config"):
>>>> Input/output error (5)
>>>> rsync:
>>>> readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16-server/include/linux"):
>>>> Input/output error (5)
>>>> rsync: writefd_unbuffered failed to write 2672 bytes [sender]: Broken pipe
>>>> (32)
>>>> rsync: close failed on "/tmp/blah/nfs0/games/.banner.vl3iqI": Operation
>>>> not permitted (1)
>>>> rsync: connection unexpectedly closed (98 bytes received so far) [sender]
>>>> rsync error: error in rsync protocol data stream (code 12) at io.c(454)
>>>> [sender=2.6.9]
>>>>
>>>> The debug output is:
>>>> 2007-08-06 17:33:58 E [afr.c:1389:afr_selfheal_getxattr_cbk] mirror3:
>>>> (path=/nfs0/games/.banner.vl3iqI child=share3-0) op_ret=-1 op_errno=61
>>>> 2007-08-06 17:33:58 E [afr.c:1389:afr_selfheal_getxattr_cbk] mirror3:
>>>> (path=/nfs0/games/.banner.vl3iqI child=share3-1) op_ret=-1 op_errno=61
>>>> 2007-08-06 17:33:58 E [afr.c:1389:afr_selfheal_getxattr_cbk] ns0:
>>>> (path=/nfs0/games/.banner.vl3iqI child=ns0-0) op_ret=-1 op_errno=61
>>>> 2007-08-06 17:33:58 E [afr.c:1389:afr_selfheal_getxattr_cbk] ns0:
>>>> (path=/nfs0/games/.banner.vl3iqI child=ns0-1) op_ret=-1 op_errno=61
>>>>
>>>> This is new behavior; rsync didn't used to actually die, it just made
>>>> incomplete copies.
>>>>
>>>>
>>>> On Tue, 7 Aug 2007, Krishna Srinivas wrote:
>>>>
>>>>> Hi Brent,
>>>>>
>>>>> Can you put "option debug on" in afr subvolume and try the
>>>>> du/rsync operations and mail the log?
>>>>>
>>>>> We are not able to reproduce the problem here, nfs is working
>>>>> fine over afr.
>>>>>
>>>>> Thanks
>>>>> Krishna
>>>>>
>>>>> On 8/4/07, Krishna Srinivas <krishna at zresearch.com> wrote:
>>>>>> rsync was failing for me without no_root_squash, so thought that
>>>>>> might have been the culprit.
>>>>>>
>>>>>> If i put no_root_squash, nfs over afr works fine for me.
>>>>>>
>>>>>> Yes you are right, for some reason readdir() is not functioning
>>>>>> properly I think because of which paths are getting corrupted.
>>>>>>
>>>>>> will get back to you.
>>>>>>
>>>>>> Thanks
>>>>>> Krishna
>>>>>>
>>>>>> On 8/4/07, Brent A Nelson <brent at phys.ufl.edu> wrote:
>>>>>>> All of my tests were done with no_root_squash already, and all tests were
>>>>>>> done as root.
>>>>>>>
>>>>>>> Without AFR, gluster and NFS reexports work fine with du and rsync.
>>>>>>>
>>>>>>> With AFR, gluster by itself is fine, but du and rsync from an NFS client
>>>>>>> do not work properly. rsync gives lots of I/O errors and occasional "file
>>>>>>> has vanished" messages for paths where the last element is junk.  du gives
>>>>>>> incorrect sizes (smaller than it should) and occassionally gives "no such
>>>>>>> file or directory", also for paths where the last element is junk.  See
>>>>>>> output below for examples from both of this junk.  Perhaps if you could
>>>>>>> figure out how those paths are getting corrupted, the whole problem will
>>>>>>> be resolved...
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Brent
>>>>>>>
>>>>>>> On Sat, 4 Aug 2007, Krishna Srinivas wrote:
>>>>>>>
>>>>>>>> Hi Brent,
>>>>>>>>
>>>>>>>> Can you add no_root_squash to exports file and reexport and mount
>>>>>>>> using nfs and try to rsync as root and see if it works?
>>>>>>>>
>>>>>>>> like: "/mnt/gluster *(rw,no_root_squash,sync,fsid=3)"
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Krishna
>>>>>>>>
>>>>>>>> On 8/4/07, Brent A Nelson <brent at phys.ufl.edu> wrote:
>>>>>>>>> Woops, scratch that.  I accidentally tested the 2nd GlusterFS directory,
>>>>>>>>> not the final NFS mount.  Even with the GlusterFS reexport of the original
>>>>>>>>> GlusterFS, the issue is still present.
>>>>>>>>>
>>>>>>>>> Thanks and sorry for the confusion,
>>>>>>>>>
>>>>>>>>> Brent
>>>>>>>>>
>>>>>>>>> On Fri, 3 Aug 2007, Brent A Nelson wrote:
>>>>>>>>>
>>>>>>>>>> I do have a workaround which can hide this bug, thanks to the wonderful
>>>>>>>>>> flexibility of GlusterFS and the fact that it in itself is POSIX.  If I mount
>>>>>>>>>> the GlusterFS as usual, but then use another glusterfs/glusterfsd pair to
>>>>>>>>>> export and mount it and NFS reexport THAT, the problem does not appear.
>>>>>>>>>>
>>>>>>>>>> Presumably, server-side AFR instead of client-side would also bypass the
>>>>>>>>>> issue (not tested)...
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Brent
>>>>>>>>>>
>>>>>>>>>> On Fri, 3 Aug 2007, Brent A Nelson wrote:
>>>>>>>>>>
>>>>>>>>>>> I turned off self-heal on all the AFR volumes, remounted and reexported (I
>>>>>>>>>>> didn't delete the data; let me know if that is needed).
>>>>>>>>>>>
>>>>>>>>>>> du -sk /tmp/blah/* (via NFS)
>>>>>>>>>>> du: cannot access `/tmp/blah/usr0/include/c++/4.1.2/\a': No such file or
>>>>>>>>>>> directory
>>>>>>>>>>> 171832  /tmp/blah/usr0
>>>>>>>>>>> 109476  /tmp/blah/usr0-copy
>>>>>>>>>>> du: cannot access `/tmp/blah/usr1/include/sys/\337O\004': No such file or
>>>>>>>>>>> directory
>>>>>>>>>>> du: cannot access
>>>>>>>>>>> `/tmp/blah/usr1/src/linux-headers-2.6.20-16/include/asm-ia64/\v': No such
>>>>>>>>>>> file or directory
>>>>>>>>>>> du: cannot access
>>>>>>>>>>> `/tmp/blah/usr1/src/linux-headers-2.6.20-16/include/asm-ia64/&\324\004': No
>>>>>>>>>>> such file or directory
>>>>>>>>>>> du: cannot access
>>>>>>>>>>> `/tmp/blah/usr1/src/linux-headers-2.6.20-16/drivers/\006': No such file or
>>>>>>>>>>> directory
>>>>>>>>>>> 117472  /tmp/blah/usr1
>>>>>>>>>>> 58392   /tmp/blah/usr1-copy
>>>>>>>>>>>
>>>>>>>>>>> It appears that self-heal isn't the culprit.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Brent
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 3 Aug 2007, Krishna Srinivas wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Brent,
>>>>>>>>>>>>
>>>>>>>>>>>> Can you turn self-heal off (option self-heal off) and see how it
>>>>>>>>>>>> behaves?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Krishna
>>>>>>>>>>>>
>>>>>>>>>>>> On 8/3/07, Brent A Nelson <brent at phys.ufl.edu> wrote:
>>>>>>>>>>>>> A hopefully relevant strace snippet:
>>>>>>>>>>>>>
>>>>>>>>>>>>> open("share/perl/5.8.8/unicore/lib/jt",
>>>>>>>>>>>>> O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
>>>>>>>>>>>>> fstat64(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
>>>>>>>>>>>>> fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
>>>>>>>>>>>>> mmap2(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
>>>>>>>>>>>>> 0) = 0xb7c63000
>>>>>>>>>>>>> getdents64(3, /* 6 entries */, 1048576) = 144
>>>>>>>>>>>>> lstat64("share/perl/5.8.8/unicore/lib/jt/C.pl", {st_mode=S_IFREG|0644,
>>>>>>>>>>>>> st_size=220, ...}) = 0
>>>>>>>>>>>>> lstat64("share/perl/5.8.8/unicore/lib/jt/U.pl", {st_mode=S_IFREG|0644,
>>>>>>>>>>>>> st_size=251, ...}) = 0
>>>>>>>>>>>>> lstat64("share/perl/5.8.8/unicore/lib/jt/D.pl", {st_mode=S_IFREG|0644,
>>>>>>>>>>>>> st_size=438, ...}) = 0
>>>>>>>>>>>>> lstat64("share/perl/5.8.8/unicore/lib/jt/R.pl", {st_mode=S_IFREG|0644,
>>>>>>>>>>>>> st_size=426, ...}) = 0
>>>>>>>>>>>>> getdents64(3, /* 0 entries */, 1048576) = 0
>>>>>>>>>>>>> munmap(0xb7c63000, 1052672)             = 0
>>>>>>>>>>>>> close(3)                                = 0
>>>>>>>>>>>>> open("share/perl/5.8.8/unicore/lib/gc_sc",
>>>>>>>>>>>>> O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
>>>>>>>>>>>>> fstat64(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
>>>>>>>>>>>>> fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
>>>>>>>>>>>>> mmap2(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
>>>>>>>>>>>>> 0) = 0xb7c63000
>>>>>>>>>>>>> getdents64(3, 0xb7c63024, 1048576)      = -1 EIO (Input/output error)
>>>>>>>>>>>>> write(2, "rsync: readdir(\"/tmp/blah/usr0/s"..., 91rsync:
>>>>>>>>>>>>> readdir("/tmp/blah/usr0/share/perl/5.8.8/unicore/lib/gc_sc"):
>>>>>>>>>>>>> Input/output
>>>>>>>>>>>>> error (5)) = 91
>>>>>>>>>>>>> write(2, "\n", 1
>>>>>>>>>>>>> )                       = 1
>>>>>>>>>>>>> munmap(0xb7c63000, 1052672)             = 0
>>>>>>>>>>>>> close(3)                                = 0
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Brent
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, 2 Aug 2007, Brent A Nelson wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> NFS reexport of a unified GlusterFS seems to be working fine as of TLA
>>>>>>>>>>>>>> 409.
>>>>>>>>>>>>>> I can make identical copies of a /usr area local-to-glusterfs and
>>>>>>>>>>>>>> glusterfs-to-glusterfs, hardlinks and all.  Awesome!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> However, this is not true when AFR is added to the mix (rsync
>>>>>>>>>>>>>> glusterfs-to-glusterfs via NFS reexport):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/lib/perl/5.8.8/auto/POSIX"): Input/output
>>>>>>>>>>>>>> error (5)
>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/perl/5.8.8"): Input/output error
>>>>>>>>>>>>>> (5)
>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/i18n/locales"): Input/output error
>>>>>>>>>>>>>> (5)
>>>>>>>>>>>>>> rsync:
>>>>>>>>>>>>>> readdir("/tmp/blah/usr0/share/locale-langpack/en_GB/LC_MESSAGES"):
>>>>>>>>>>>>>> Input/output error (5)
>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/groff/1.18.1/font/devps"):
>>>>>>>>>>>>>> Input/output
>>>>>>>>>>>>>> error (5)
>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/man/man1"): Input/output error (5)
>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/man/man8"): Input/output error (5)
>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/man/man7"): Input/output error (5)
>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/X11/xkb/symbols"): Input/output
>>>>>>>>>>>>>> error
>>>>>>>>>>>>>> (5)
>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/zoneinfo/right/Africa"):
>>>>>>>>>>>>>> Input/output
>>>>>>>>>>>>>> error (5)
>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/zoneinfo/right/Asia"): Input/output
>>>>>>>>>>>>>> error (5)
>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/zoneinfo/right/America"):
>>>>>>>>>>>>>> Input/output
>>>>>>>>>>>>>> error (5)
>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/zoneinfo/Asia"): Input/output error
>>>>>>>>>>>>>> (5)
>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/doc"): Input/output error (5)
>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/consolefonts"): Input/output error
>>>>>>>>>>>>>> (5)
>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/bin"): Input/output error (5)
>>>>>>>>>>>>>> rsync:
>>>>>>>>>>>>>> readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/asm-sparc64"):
>>>>>>>>>>>>>> Input/output error (5)
>>>>>>>>>>>>>> rsync:
>>>>>>>>>>>>>> readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/linux"):
>>>>>>>>>>>>>> Input/output error (5)
>>>>>>>>>>>>>> rsync:
>>>>>>>>>>>>>> readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/asm-mips"):
>>>>>>>>>>>>>> Input/output error (5)
>>>>>>>>>>>>>> rsync:
>>>>>>>>>>>>>> readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/asm-parisc"):
>>>>>>>>>>>>>> Input/output error (5)
>>>>>>>>>>>>>> file has vanished:
>>>>>>>>>>>>>> "/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/asm-sparc/\#012"
>>>>>>>>>>>>>> rsync:
>>>>>>>>>>>>>> readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16-server/include/config"):
>>>>>>>>>>>>>> Input/output error (5)
>>>>>>>>>>>>>> rsync:
>>>>>>>>>>>>>> readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16-server/include/linux"):
>>>>>>>>>>>>>> Input/output error (5)
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any ideas? Meanwhile, I'll try to track it down in strace (the output
>>>>>>>>>>>>>> will be
>>>>>>>>>>>>>> huge, but maybe I'll get lucky)...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Brent
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Gluster-devel mailing list
>>>>>>>>>>>>> Gluster-devel at nongnu.org
>>>>>>>>>>>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Gluster-devel mailing list
>>>>>>>>> Gluster-devel at nongnu.org
>>>>>>>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-devel mailing list
>>>>>>> Gluster-devel at nongnu.org
>>>>>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>





More information about the Gluster-devel mailing list