[Gluster-users] Odd error with older Gluster/NFS3 tail swallowing

Wed Apr 10 11:36:30 UTC 2013

On Tue, Apr 09, 2013 at 09:44:26AM -0400, Whit Blauvelt wrote:
> Had some data loss with an older - 3.1.4 - Gluster share last night. Now
> trying to see what the best lessons are to learn from it. Obviously it's too
> old a version for a bug report to matter. Wondering if anyone recognizes
> this particular sort of error condition though. 
> 
> It's a 300 G replicated share, mounted by Gluster's NFS3 to several systems.
> It was getting fuller than we like it to be, at over 80%, so I copied a
> directory containg 26 G off of it. Checked the copy and it was good. Then I
> went and "rm -r"'d that directory. After a few minutes it complained "Cannot
> delete directory, directory not empty," citing a subdirectory. Strange.
> 
> So I stopped the process and looked in that subdirectory. The subdirectory
> had within it ... the whole of the Gluster share. Damn. Yes, the "rm -r"
> had, due to this illusion, managed to wipe out over half of the share
> because it had descended into other directories at its root level via this
> illusion. 

Would the likely culprit in this sort of error in the appearance of the
filesystem from the NFS client likely be failing RAM on the client? Is the
scheme something along the lines of there being a base address for the root
of the NFS mount, with addresses within the mount being at an offset from
that, so that loss of the offset for the address of a subdirectory could
result in the subdirectory seeming itself to contain the whole NFS mount?

Obviously I have no knowledge on this level. Any of your filesystem gurus
seen this before, or have a hypothesis?

Thanks,
Whit