[Gluster-users] 3.6.2, file-write events out of order - data missing temporarily

Kingsley gluster at gluster.dogwind.com
Fri Apr 10 08:31:54 UTC 2015


Hi,

We're running gluster 3.6.2 on CentOS 7, using a replicate-only volume
with 4 way replication.

We have 10 hosts mounting the volume - 6 running CentOS 6 that submit
jobs to a "to-process" directory on the gluster volume, and 4 running
CentOS 7 that process entries from that directory.

So that the 4 "processor" machines don't read partly written files, the
submitting machines write to a tmpspool subdirectory first (subdirectory
of the to-process directory on the gluster volume) and then move it into
the main to-process directory once written, eg:

cp /localdir/job1234.txt /mnt/gv0/to_process/tmpspool
mv /mnt/gv0/to_process/tmpspool/job1234.txt /mnt/gv0/to_process

These job files are small (less than 500 bytes).

However, if one of the processor machines picks up one of the files
quite quickly after it appears, it sees a smaller (ie not fully written)
file. If it waits a few seconds and tries again, the file is complete.

Is this a known bug that might be fixed in 3.6.3, or is it a new issue?

One I recently saw was a 441 byte file that was moved from tmpspool into
to_process by the client machine, but was read from to_process as a 391
byte file by one of the processing machines with the last 2 lines
missing, but read again 3 seconds later with all of the data in place.

Curiously, when there is data missing, it's always whole lines; the
temporarily-short file never seems to end half way along a line of text.

Cheers,
Kingsley.



More information about the Gluster-users mailing list