While it's still early, our testing is showing this issue fixed in
glusterfs7.2 (we were at 416).

Closing the loop in case people search for this.


On Sun, Jan 26, 2020 at 12:04:00PM -0600, Erik Jacobson wrote:
> > One last reply to myself.
> One of the test cases my test scripts triggered turned out to actually
> be due to my NFS RW mount options.
> OLD RW NFS mount options:
> "rw,noatime,nocto,actimeo=3600,lookupcache=all,nolock,tcp,vers=3"
> NEW options that work better
> rw,noatime,nolock,tcp,vers=3"
> I had copied the RO NFS options we use which try to be aggressive about
> caching. The RO root image doesn't change much and we want it as fast
> as possible. The options are not appropriate for RW areas that change.
> (Even though it's a single image file we care about).
> So now my test scripts run clean but since what we see on larger systems
> is right after reboot, the caching shouldn't matter. In the real problem
> case, the RW stuff is done once after reboot.
> FWIW I attached my current test scripts, my last batch had some errors.
> The search continues for the actual problem, which I'm struggling to
> reproduce @ 366 NFs clients.
> I believe yesterday, when I posted about actual HANGS, that is the real
> problem we're tracking. I hit that once in my test scripts - only once.
> My script was otherwise hitting a "file doesn't really exist even though
> cached" issue and it was tricking my scripts.
> In any case, I'm changing the RW NFS options we use regardless.
> Erik

