[Bugs] [Bug 1138970] file corruption during concurrent read/write

Wed Oct 22 05:48:40 UTC 2014

https://bugzilla.redhat.com/show_bug.cgi?id=1138970

--- Comment #16 from Raghavendra G <rgowdapp at redhat.com> ---
changed the python script to not to do buffered reads/writes, by passing 0 as
the buffer length to third argument of open:

  writer = RecordWriter(open(fname, 'a', 0))
  reader = RecordReader(open(fname, 'r', 0))

But, the issue is still reproducible. Test script still hangs when run on a
gluster volume mounted with direct-io-mode=yes. Following is the output of
strace:

[pid 22338] set_robust_list(0x7f2a364f2a20, 0x18) = 0
[pid 22337] lseek(4, 0, SEEK_CUR)       = 0
[pid 22337] read(4,  <unfinished ...>
[pid 22338] write(3, "\0\0\0\1\0001\00010", 9 <unfinished ...>
[pid 22337] <... read resumed> "", 8)   = 0
[pid 22337] select(0, NULL, NULL, NULL, {0, 1000} <unfinished ...>
[pid 22338] <... write resumed> )       = 9
[pid 22338] select(0, NULL, NULL, NULL, {0, 3000} <unfinished ...>
[pid 22337] <... select resumed> )      = 0 (Timeout)
[pid 22337] lseek(4, 0, SEEK_SET)       = 0
[pid 22337] read(4, "\0\0\0\0\0\0\0\0", 8) = 8
[pid 22337] select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
[pid 22337] lseek(4, 8, SEEK_SET)       = 8
[pid 22337] select(0, NULL, NULL, NULL, {0, 1000} <unfinished ...>
[pid 22338] <... select resumed> )      = 0 (Timeout)
[pid 22338] write(3, "\0\0\0\1\0002\00021", 9 <unfinished ...>
[pid 22337] <... select resumed> )      = 0 (Timeout)

After the last read in the above output, process 22337 (reader) never issues a
read or fstat. It spends all its time in an infinite execution of:

[pid 22337] lseek(4, 8, SEEK_SET)       = 8
[pid 22337] select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)

This behaviour of reader is puzzling since without issuing a stat or read call
how can it detect whether the file is growing or not. I think this is the cause
for reader to be hung in infinite loop of _read function in attached
test-script.

So, the only assumption to verify is whether the issue is seen because of
buffering in kernel VFS layer, but the above issue is preventing from verifying
it.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=Tg3Br9WIQ2&a=cc_unsubscribe