[Gluster-users] mkdir produces stale file handles

Thu Sep 19 15:02:50 UTC 2019

Thanks for the help!

> > Thanks for the quick answer!
> >
> > I think I can reduce data on the "full" bricks, solving the problem temporarily.
> >
> > The thing is, that the behavior changed from 3.12 to 6.5:   3.12 didn't have problems with almost full bricks, so I thought everything was fine. Then, after the upgrade, I ran into this problem. This might be a corner case that will go away once no-one uses 3.12 any more.
> >
> > But I think I can create a situation with 6.5 only that reproduces the error. Suppose I have a brick that 99% full.  So a write() will succeed. After the write, the brick can be 100% full, so a subsequent mkdir() will produce stale file handles (i.e., bricks that have different directory trees).  The funny thing is, that the mkdir() on the user side does not produce an error.   Clearly, no-one should ever let the file system get to 99%, but still, mkdir should fail then... 
> 
> I think there is a soft and hard limit that prevents creation of files/folders when a specific threshold is hit , but that threshold might be per brick instead of per replica set.

There is the cluster.min-free-disk, which states that the server should look for a free brick if the hash would place the file on a brick with less than "min-free-disk" bytes.   However, this seems to be a "should". If all bricks have less space than "min-free-disk", then the file is written anyway. 

Apart from that, I have some really large bricks (around 200 TB each), which means that if these are 99% full, then there are still 2 TB left (a signifikant amount).  The logic of "do not create a directory if the brick is 100% full" seems to be hard coded.  I didn't find a setting to disable this logic.

Nonethess, I think I can construct a test case where a sequence of write() and mkdir() would create stale file handles, even though all userland operations succeed.   Should I consider this a bug and make the effort to construct a test case?  (not on my production system, but on a toy model?  It will take me a few days...)

> > What remains:  is there a recommended way how to deal with the situation that I have some bricks that don't have all directories?
> 
> I think that you can mount the gluster volume and run a find with stat that will force a sync.
> find /rhev/mnt/full-path/directory-missing-on-some-bricks -iname '*' -exec stat {} \;

Thank you a lot! That indeed fixed the missing directories!   (I didn't know a "stat" triggers a sync of the bricks.)

best wishes,
Stefan