[Gluster-devel] Re: NFS reexport status
Brent A Nelson
brent at phys.ufl.edu
Wed Aug 8 17:01:46 UTC 2007
Hmm, I take part of my statement back. rsync gives a few I/O errors in
this simple scenario, BUT the copies seem to be good when checked later
(this is not the case with more complex specs, I believe). Perhaps
failures occur when writing to one subvolume but not the other, and then
self-heal fixes it. This would be consistent with the fact that the first
du I run just after the rsync tends to be different from all subsequent
ones.
Also odd: my rsyncs stopped complaining after a while. Unmounting the NFS
and remounting brought the misbehavior back, though.
Thanks,
Brent
On Wed, 8 Aug 2007, Brent A Nelson wrote:
> On Wed, 8 Aug 2007, Krishna Srinivas wrote:
>
>> Hi Brent,
>>
>> Thanks. So if you use storage/posix under afr, you don't see
>> problem in nfs reexport.
>
> Correct, that worked fine. Once I introduced protocol/client and
> protocol/server, though, rsync -aH /usr/ /mount/nfs0/ gives I/O errors and an
> inconsistent copy.
>
>> We are not able to reproduce this behaviour here.
>
> Did you try with the spec files I sent you (they only need two directories
> available on a single machine), with an rsync of your /usr partition to the
> NFS reexport (this can also be done via localhost, no additional machines
> needed)? You are using the kernel NFS server, I assume, not one of the
> user-mode NFS servers?
>
>> Can you give us access to your machines? is it possible?
>>
>
> Yes, if the above doesn't do the trick, we can coordinate some way to get you
> access. Do you have an SSH public key I could add as an authorized key?
>
> Thanks,
>
> Brent
>
>> On 8/8/07, Brent A Nelson <brent at phys.ufl.edu> wrote:
>>> Today, I tried switching to the Gluster-modified fuse-2.7.0, but I still
>>> encountered the same misbehavior with NFS reexport. Heads-up: like
>>> someone else on the mailing list, I found that GlusterFS performance is
>>> MUCH slower with 2.7.0 than with my old 2.6.3, at least for simple "du"
>>> tests...
>>>
>>> Failing that, I thought I'd try to figure out the simplest specs to
>>> exhibit the issue; see attached. I first tried glusterfs (no glusterfsd);
>>> it worked for a simple afr as well as unification of two afrs with no NFS
>>> reexport trouble. As soon as I introduced a glusterfsd exporting to the
>>> glusterfs via protocol/client and protocol/server (via localhost),
>>> however, the rsync problems appeared. I didn't see the issues with du in
>>> this simple setup, though (perhaps that problem will disappear when this
>>> problem is fixed, perhaps not).
>>>
>>> Thanks,
>>>
>>> Brent
>>>
>>> On Tue, 7 Aug 2007, Krishna Srinivas wrote:
>>>
>>>> Hi Brent,
>>>>
>>>> Those messages in log are harmless, I have removed them from the
>>>> source. Can you mail the spec files? I will see again if it can be
>>>> repro'd
>>>>
>>>> Thanks
>>>> Krishna
>>>>
>>>>
>>>> On 8/7/07, Brent A Nelson <brent at phys.ufl.edu> wrote:
>>>>> I added debugging to all the AFR subvolumes. On the du test, all it
>>>>> produced were lines like this over and over:
>>>>> 2007-08-06 17:23:41 C [dict.c:1094:data_to_ptr] libglusterfs/dict:
>>>>> @data=(nil)
>>>>>
>>>>> For the rsync (in addition to the @data=(nil) messages):
>>>>> rsync -a /tmp/blah/usr0/ /tmp/blah/nfs0/
>>>>> rsync: readdir("/tmp/blah/usr0/share/perl/5.8.8/unicore/lib/gc_sc"):
>>>>> Input/output error (5)
>>>>> rsync: readdir("/tmp/blah/usr0/share/man/man1"): Input/output error (5)
>>>>> rsync: readdir("/tmp/blah/usr0/share/man/man8"): Input/output error (5)
>>>>> rsync: readdir("/tmp/blah/usr0/bin"): Input/output error (5)
>>>>> rsync:
>>>>> readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/linux"):
>>>>> Input/output error (5)
>>>>> rsync:
>>>>> readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16-server/include/config"):
>>>>> Input/output error (5)
>>>>> rsync:
>>>>> readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16-server/include/linux"):
>>>>> Input/output error (5)
>>>>> rsync: writefd_unbuffered failed to write 2672 bytes [sender]: Broken
>>>>> pipe
>>>>> (32)
>>>>> rsync: close failed on "/tmp/blah/nfs0/games/.banner.vl3iqI": Operation
>>>>> not permitted (1)
>>>>> rsync: connection unexpectedly closed (98 bytes received so far)
>>>>> [sender]
>>>>> rsync error: error in rsync protocol data stream (code 12) at io.c(454)
>>>>> [sender=2.6.9]
>>>>>
>>>>> The debug output is:
>>>>> 2007-08-06 17:33:58 E [afr.c:1389:afr_selfheal_getxattr_cbk] mirror3:
>>>>> (path=/nfs0/games/.banner.vl3iqI child=share3-0) op_ret=-1 op_errno=61
>>>>> 2007-08-06 17:33:58 E [afr.c:1389:afr_selfheal_getxattr_cbk] mirror3:
>>>>> (path=/nfs0/games/.banner.vl3iqI child=share3-1) op_ret=-1 op_errno=61
>>>>> 2007-08-06 17:33:58 E [afr.c:1389:afr_selfheal_getxattr_cbk] ns0:
>>>>> (path=/nfs0/games/.banner.vl3iqI child=ns0-0) op_ret=-1 op_errno=61
>>>>> 2007-08-06 17:33:58 E [afr.c:1389:afr_selfheal_getxattr_cbk] ns0:
>>>>> (path=/nfs0/games/.banner.vl3iqI child=ns0-1) op_ret=-1 op_errno=61
>>>>>
>>>>> This is new behavior; rsync didn't used to actually die, it just made
>>>>> incomplete copies.
>>>>>
>>>>>
>>>>> On Tue, 7 Aug 2007, Krishna Srinivas wrote:
>>>>>
>>>>>> Hi Brent,
>>>>>>
>>>>>> Can you put "option debug on" in afr subvolume and try the
>>>>>> du/rsync operations and mail the log?
>>>>>>
>>>>>> We are not able to reproduce the problem here, nfs is working
>>>>>> fine over afr.
>>>>>>
>>>>>> Thanks
>>>>>> Krishna
>>>>>>
>>>>>> On 8/4/07, Krishna Srinivas <krishna at zresearch.com> wrote:
>>>>>>> rsync was failing for me without no_root_squash, so thought that
>>>>>>> might have been the culprit.
>>>>>>>
>>>>>>> If i put no_root_squash, nfs over afr works fine for me.
>>>>>>>
>>>>>>> Yes you are right, for some reason readdir() is not functioning
>>>>>>> properly I think because of which paths are getting corrupted.
>>>>>>>
>>>>>>> will get back to you.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Krishna
>>>>>>>
>>>>>>> On 8/4/07, Brent A Nelson <brent at phys.ufl.edu> wrote:
>>>>>>>> All of my tests were done with no_root_squash already, and all tests
>>>>>>>> were
>>>>>>>> done as root.
>>>>>>>>
>>>>>>>> Without AFR, gluster and NFS reexports work fine with du and rsync.
>>>>>>>>
>>>>>>>> With AFR, gluster by itself is fine, but du and rsync from an NFS
>>>>>>>> client
>>>>>>>> do not work properly. rsync gives lots of I/O errors and occasional
>>>>>>>> "file
>>>>>>>> has vanished" messages for paths where the last element is junk. du
>>>>>>>> gives
>>>>>>>> incorrect sizes (smaller than it should) and occassionally gives "no
>>>>>>>> such
>>>>>>>> file or directory", also for paths where the last element is junk.
>>>>>>>> See
>>>>>>>> output below for examples from both of this junk. Perhaps if you
>>>>>>>> could
>>>>>>>> figure out how those paths are getting corrupted, the whole problem
>>>>>>>> will
>>>>>>>> be resolved...
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Brent
>>>>>>>>
>>>>>>>> On Sat, 4 Aug 2007, Krishna Srinivas wrote:
>>>>>>>>
>>>>>>>>> Hi Brent,
>>>>>>>>>
>>>>>>>>> Can you add no_root_squash to exports file and reexport and mount
>>>>>>>>> using nfs and try to rsync as root and see if it works?
>>>>>>>>>
>>>>>>>>> like: "/mnt/gluster *(rw,no_root_squash,sync,fsid=3)"
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Krishna
>>>>>>>>>
>>>>>>>>> On 8/4/07, Brent A Nelson <brent at phys.ufl.edu> wrote:
>>>>>>>>>> Woops, scratch that. I accidentally tested the 2nd GlusterFS
>>>>>>>>>> directory,
>>>>>>>>>> not the final NFS mount. Even with the GlusterFS reexport of the
>>>>>>>>>> original
>>>>>>>>>> GlusterFS, the issue is still present.
>>>>>>>>>>
>>>>>>>>>> Thanks and sorry for the confusion,
>>>>>>>>>>
>>>>>>>>>> Brent
>>>>>>>>>>
>>>>>>>>>> On Fri, 3 Aug 2007, Brent A Nelson wrote:
>>>>>>>>>>
>>>>>>>>>>> I do have a workaround which can hide this bug, thanks to the
>>>>>>>>>>> wonderful
>>>>>>>>>>> flexibility of GlusterFS and the fact that it in itself is POSIX.
>>>>>>>>>>> If I mount
>>>>>>>>>>> the GlusterFS as usual, but then use another glusterfs/glusterfsd
>>>>>>>>>>> pair to
>>>>>>>>>>> export and mount it and NFS reexport THAT, the problem does not
>>>>>>>>>>> appear.
>>>>>>>>>>>
>>>>>>>>>>> Presumably, server-side AFR instead of client-side would also
>>>>>>>>>>> bypass the
>>>>>>>>>>> issue (not tested)...
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Brent
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 3 Aug 2007, Brent A Nelson wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I turned off self-heal on all the AFR volumes, remounted and
>>>>>>>>>>>> reexported (I
>>>>>>>>>>>> didn't delete the data; let me know if that is needed).
>>>>>>>>>>>>
>>>>>>>>>>>> du -sk /tmp/blah/* (via NFS)
>>>>>>>>>>>> du: cannot access `/tmp/blah/usr0/include/c++/4.1.2/\a': No such
>>>>>>>>>>>> file or
>>>>>>>>>>>> directory
>>>>>>>>>>>> 171832 /tmp/blah/usr0
>>>>>>>>>>>> 109476 /tmp/blah/usr0-copy
>>>>>>>>>>>> du: cannot access `/tmp/blah/usr1/include/sys/\337O\004': No such
>>>>>>>>>>>> file or
>>>>>>>>>>>> directory
>>>>>>>>>>>> du: cannot access
>>>>>>>>>>>> `/tmp/blah/usr1/src/linux-headers-2.6.20-16/include/asm-ia64/\v':
>>>>>>>>>>>> No such
>>>>>>>>>>>> file or directory
>>>>>>>>>>>> du: cannot access
>>>>>>>>>>>> `/tmp/blah/usr1/src/linux-headers-2.6.20-16/include/asm-ia64/&\324\004':
>>>>>>>>>>>> No
>>>>>>>>>>>> such file or directory
>>>>>>>>>>>> du: cannot access
>>>>>>>>>>>> `/tmp/blah/usr1/src/linux-headers-2.6.20-16/drivers/\006': No
>>>>>>>>>>>> such file or
>>>>>>>>>>>> directory
>>>>>>>>>>>> 117472 /tmp/blah/usr1
>>>>>>>>>>>> 58392 /tmp/blah/usr1-copy
>>>>>>>>>>>>
>>>>>>>>>>>> It appears that self-heal isn't the culprit.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Brent
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, 3 Aug 2007, Krishna Srinivas wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Brent,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you turn self-heal off (option self-heal off) and see how it
>>>>>>>>>>>>> behaves?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> Krishna
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 8/3/07, Brent A Nelson <brent at phys.ufl.edu> wrote:
>>>>>>>>>>>>>> A hopefully relevant strace snippet:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> open("share/perl/5.8.8/unicore/lib/jt",
>>>>>>>>>>>>>> O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
>>>>>>>>>>>>>> fstat64(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
>>>>>>>>>>>>>> fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
>>>>>>>>>>>>>> mmap2(NULL, 1052672, PROT_READ|PROT_WRITE,
>>>>>>>>>>>>>> MAP_PRIVATE|MAP_ANONYMOUS, -1,
>>>>>>>>>>>>>> 0) = 0xb7c63000
>>>>>>>>>>>>>> getdents64(3, /* 6 entries */, 1048576) = 144
>>>>>>>>>>>>>> lstat64("share/perl/5.8.8/unicore/lib/jt/C.pl",
>>>>>>>>>>>>>> {st_mode=S_IFREG|0644,
>>>>>>>>>>>>>> st_size=220, ...}) = 0
>>>>>>>>>>>>>> lstat64("share/perl/5.8.8/unicore/lib/jt/U.pl",
>>>>>>>>>>>>>> {st_mode=S_IFREG|0644,
>>>>>>>>>>>>>> st_size=251, ...}) = 0
>>>>>>>>>>>>>> lstat64("share/perl/5.8.8/unicore/lib/jt/D.pl",
>>>>>>>>>>>>>> {st_mode=S_IFREG|0644,
>>>>>>>>>>>>>> st_size=438, ...}) = 0
>>>>>>>>>>>>>> lstat64("share/perl/5.8.8/unicore/lib/jt/R.pl",
>>>>>>>>>>>>>> {st_mode=S_IFREG|0644,
>>>>>>>>>>>>>> st_size=426, ...}) = 0
>>>>>>>>>>>>>> getdents64(3, /* 0 entries */, 1048576) = 0
>>>>>>>>>>>>>> munmap(0xb7c63000, 1052672) = 0
>>>>>>>>>>>>>> close(3) = 0
>>>>>>>>>>>>>> open("share/perl/5.8.8/unicore/lib/gc_sc",
>>>>>>>>>>>>>> O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
>>>>>>>>>>>>>> fstat64(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
>>>>>>>>>>>>>> fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
>>>>>>>>>>>>>> mmap2(NULL, 1052672, PROT_READ|PROT_WRITE,
>>>>>>>>>>>>>> MAP_PRIVATE|MAP_ANONYMOUS, -1,
>>>>>>>>>>>>>> 0) = 0xb7c63000
>>>>>>>>>>>>>> getdents64(3, 0xb7c63024, 1048576) = -1 EIO (Input/output
>>>>>>>>>>>>>> error)
>>>>>>>>>>>>>> write(2, "rsync: readdir(\"/tmp/blah/usr0/s"..., 91rsync:
>>>>>>>>>>>>>> readdir("/tmp/blah/usr0/share/perl/5.8.8/unicore/lib/gc_sc"):
>>>>>>>>>>>>>> Input/output
>>>>>>>>>>>>>> error (5)) = 91
>>>>>>>>>>>>>> write(2, "\n", 1
>>>>>>>>>>>>>> ) = 1
>>>>>>>>>>>>>> munmap(0xb7c63000, 1052672) = 0
>>>>>>>>>>>>>> close(3) = 0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Brent
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, 2 Aug 2007, Brent A Nelson wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> NFS reexport of a unified GlusterFS seems to be working fine
>>>>>>>>>>>>>>> as of TLA
>>>>>>>>>>>>>>> 409.
>>>>>>>>>>>>>>> I can make identical copies of a /usr area local-to-glusterfs
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> glusterfs-to-glusterfs, hardlinks and all. Awesome!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> However, this is not true when AFR is added to the mix (rsync
>>>>>>>>>>>>>>> glusterfs-to-glusterfs via NFS reexport):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/lib/perl/5.8.8/auto/POSIX"):
>>>>>>>>>>>>>>> Input/output
>>>>>>>>>>>>>>> error (5)
>>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/perl/5.8.8"):
>>>>>>>>>>>>>>> Input/output error
>>>>>>>>>>>>>>> (5)
>>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/i18n/locales"):
>>>>>>>>>>>>>>> Input/output error
>>>>>>>>>>>>>>> (5)
>>>>>>>>>>>>>>> rsync:
>>>>>>>>>>>>>>> readdir("/tmp/blah/usr0/share/locale-langpack/en_GB/LC_MESSAGES"):
>>>>>>>>>>>>>>> Input/output error (5)
>>>>>>>>>>>>>>> rsync:
>>>>>>>>>>>>>>> readdir("/tmp/blah/usr0/share/groff/1.18.1/font/devps"):
>>>>>>>>>>>>>>> Input/output
>>>>>>>>>>>>>>> error (5)
>>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/man/man1"): Input/output
>>>>>>>>>>>>>>> error (5)
>>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/man/man8"): Input/output
>>>>>>>>>>>>>>> error (5)
>>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/man/man7"): Input/output
>>>>>>>>>>>>>>> error (5)
>>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/X11/xkb/symbols"):
>>>>>>>>>>>>>>> Input/output
>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>> (5)
>>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/zoneinfo/right/Africa"):
>>>>>>>>>>>>>>> Input/output
>>>>>>>>>>>>>>> error (5)
>>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/zoneinfo/right/Asia"):
>>>>>>>>>>>>>>> Input/output
>>>>>>>>>>>>>>> error (5)
>>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/zoneinfo/right/America"):
>>>>>>>>>>>>>>> Input/output
>>>>>>>>>>>>>>> error (5)
>>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/zoneinfo/Asia"):
>>>>>>>>>>>>>>> Input/output error
>>>>>>>>>>>>>>> (5)
>>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/doc"): Input/output error
>>>>>>>>>>>>>>> (5)
>>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/share/consolefonts"):
>>>>>>>>>>>>>>> Input/output error
>>>>>>>>>>>>>>> (5)
>>>>>>>>>>>>>>> rsync: readdir("/tmp/blah/usr0/bin"): Input/output error (5)
>>>>>>>>>>>>>>> rsync:
>>>>>>>>>>>>>>> readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/asm-sparc64"):
>>>>>>>>>>>>>>> Input/output error (5)
>>>>>>>>>>>>>>> rsync:
>>>>>>>>>>>>>>> readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/linux"):
>>>>>>>>>>>>>>> Input/output error (5)
>>>>>>>>>>>>>>> rsync:
>>>>>>>>>>>>>>> readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/asm-mips"):
>>>>>>>>>>>>>>> Input/output error (5)
>>>>>>>>>>>>>>> rsync:
>>>>>>>>>>>>>>> readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/asm-parisc"):
>>>>>>>>>>>>>>> Input/output error (5)
>>>>>>>>>>>>>>> file has vanished:
>>>>>>>>>>>>>>> "/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/asm-sparc/\#012"
>>>>>>>>>>>>>>> rsync:
>>>>>>>>>>>>>>> readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16-server/include/config"):
>>>>>>>>>>>>>>> Input/output error (5)
>>>>>>>>>>>>>>> rsync:
>>>>>>>>>>>>>>> readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16-server/include/linux"):
>>>>>>>>>>>>>>> Input/output error (5)
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any ideas? Meanwhile, I'll try to track it down in strace (the
>>>>>>>>>>>>>>> output
>>>>>>>>>>>>>>> will be
>>>>>>>>>>>>>>> huge, but maybe I'll get lucky)...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Brent
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Gluster-devel mailing list
>>>>>>>>>>>>>> Gluster-devel at nongnu.org
>>>>>>>>>>>>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Gluster-devel mailing list
>>>>>>>>>> Gluster-devel at nongnu.org
>>>>>>>>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-devel mailing list
>>>>>>>> Gluster-devel at nongnu.org
>>>>>>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the Gluster-devel
mailing list