[Gluster-devel] [Gluster-Maintainers] Master branch lock down: RCA for tests (UNSOLVED ./tests/basic/stats-dump.t)

Shyam Ranganathan srangana at redhat.com
Mon Aug 13 19:51:07 UTC 2018


On 08/13/2018 03:34 PM, Niels de Vos wrote:
> On Mon, Aug 13, 2018 at 02:32:19PM -0400, Shyam Ranganathan wrote:
>> On 08/12/2018 08:42 PM, Shyam Ranganathan wrote:
>>> As a means of keeping the focus going and squashing the remaining tests
>>> that were failing sporadically, request each test/component owner to,
>>>
>>> - respond to this mail changing the subject (testname.t) to the test
>>> name that they are responding to (adding more than one in case they have
>>> the same RCA)
>>> - with the current RCA and status of the same
>>>
>>> List of tests and current owners as per the spreadsheet that we were
>>> tracking are:
>>>
>>> ./tests/basic/stats-dump.t		TBD
>>
>> This test fails as follows:
>>
>>   01:07:31 not ok 20 , LINENUM:42
>>   01:07:31 FAILED COMMAND: grep .queue_size
>> /var/lib/glusterd/stats/glusterfsd__d_backends_patchy1.dump
>>
>>   18:35:43 not ok 21 , LINENUM:43
>>   18:35:43 FAILED COMMAND: grep .queue_size
>> /var/lib/glusterd/stats/glusterfsd__d_backends_patchy2.dump
>>
>> Basically when grep'ing for a pattern in the stats dump it is not
>> finding the second grep pattern of "queue_size" in one or the other bricks.
>>
>> The above seems incorrect, if it found "aggr.fop.write.count" it stands
>> to reason that it found a stats dump, further there is a 2 second sleep
>> as well in the test case and the dump interval is 1 second.
>>
>> The only reason for this to fail could hence possibly be that the file
>> was just (re)opened (by the io-stats dumper thread) for overwriting
>> content, at which point the fopen uses the mode "w+", and the file was
>> hence truncated, and the grep CLI also opened the file at the same time,
>> and hence found no content.
> 
> This sounds like a dangerous approach in any case. Truncating a file
> while there are potential other readers should probably not be done. I
> wonder if there is a good reason for this.
> 
> A safer solution would be to create a new temporary file, write the
> stats to that and once done rename it to the expected filename. Any
> process reading from the 'old' file will have its file-descriptor open
> and can still read the previous, but consistent contents.

Correct. I do not know the answer to why it was so, but here is where it
is so,
https://github.com/gluster/glusterfs/blob/58d2c13c7996d6d192cc792eca372538673f808e/xlators/debug/io-stats/src/io-stats.c#L3224

Just so that I am not dreaming up the RCA :)

I guess the ideal fix would be to actually use the tmpfile rename
pattern here, to avoid readers from reading empty files.

It can be argued that, there maybe readers who hold the file open and
read contents from the same, and hence the file should remain the same
etc. but this needs more eyes and thought before we decide either way.

> 
> Niels
> 


More information about the Gluster-devel mailing list