On 01/03/2011 12:28 PM, Lana Deere wrote:
> I have seen this same problem but have not been able to find a
> workaround other than to delete the file from the server directly.  I
> was not able to figure out a way to reproduce the symptom reliably,
> but in my case I suspect it was related to heavy concurrent access.
> Does that seem plausible in light of your access patterns?

Lana (et al)

Check your system times.  Make sure all the clocks are sync'ed.  A quick

	pdsh date

(assuming you have pdsh installed/configured across your storage nodes) 
will tell you.

We've encountered some odd problems with files disappearing or similar 
due (in part) to this.

This said, there is definitely still a lurking bug in DHT that the time 
issue won't address, that is similar to this (has to do with strange 
permissions).  The other thing we've tried (see an email back in Dec 
2010 time frame) is to turn off some of the stat caching and other bits.

Try this and see if it helps:

[root at manager ~]# gluster volume set nfs
performance.cache-refresh-timeout 0
Set volume successful

[root at manager ~]# gluster volume set nfs performance.stat-prefetch 0
Set volume successful

In one customer case, they have decided to cease using the NFS interface 
and use the native gluster interface, as this bug was not as visible or 
less impactful with that.  We have some support tickets open on this 
(though we saw one closed yesterday that wasn't resolved, so we've got 
to re-open it).



