[Gluster-devel] help needed with glfsheal

Wed Nov 5 14:48:00 UTC 2014

Hello

I have been investigating a spurious failure in tests/basic/afr/self-heald.t
for NetBSD.  It happens in the test that ongoing I/O  is not considered 
as Pending heal: sometimes I get entries listes by gluster volume heal info
while they should not.

Looking at the logs, I see the lovely message "seekdir(...) failed (...)
Invalid argument (offset reused from another DIR * structure?)" from the 
index xlator. This is because an offset is reused from a DIR * to another
one.

That problem was supposed to be fixed here:
http://review.gluster.org/8936

I first rhought about the bug I fixed recently: there is such a spurious
message when we hit the end of directory. I fixed it in the posix xlator
but not the index xlator. I added the same fix:
http://review.gluster.org/9047

But I still have the same message. Adding more debug messages, I see that
brick2 complains about the offset reported at EOF for brick1. This means
that somewhere cli-side  there is a omponent that keeps state between 
opening of the same directory for different breicks. Anyone has an idea
where the offending code could be?

I guess it may impact Linux too, despite the absence of message in the 
logs, as readdir may start skipping the beginning of the directory because
offset is not reset to 0 when oving from a brick to another one.

-- 
Emmanuel Dreyfus
manu at netbsd.org