readStream with highWaterMark causes 'data' event to miss data
- Version: v14.4.0 (also tried with v12.17.0)
- Platform: Darwin Marks-MacBook-Pro.local 19.5.0 Darwin Kernel Version 19.5.0: Tue May 26 20:41:44 PDT 2020; root:xnu-6153.121.2~2/RELEASE_X86_64 x86_64
- Subsystem:
Explanation?
The code below will constantly append to a file (/tmp/test.txt
) and constantly read the file, pushing the data into an in-memory buffer. This is essentially a "tail".
After a few iterations, we eventually dirty the buffer with incorrect data, as the 'data' event does not contain the correct data.
What steps will reproduce the bug?
const fs = require('fs');
// Options
const highWaterMark = 10
// Storage
let stream;
let lastBytePosition = 0;
let buffer = Buffer.alloc(0);
let counter = 0;
// Initialise a test file
fs.writeFileSync('/tmp/test.txt', '');
// Constantly append data to the file
setInterval(() => {
counter = counter + 1
const line = `hello at ${counter}\n`;
fs.writeFileSync('/tmp/test.txt', line, { flag: 'a' });
}, 1)
// Constantly read the file
setInterval(() => {
if (stream) {
return
}
stream = fs.createReadStream('/tmp/test.txt', { highWaterMark, start: lastBytePosition });
stream.on('data', chunk => {
lastBytePosition = lastBytePosition + chunk.length;
buffer = Buffer.concat([buffer, chunk]);
});
stream.on('end', () => {
stream = null
})
}, 10)
// Let's keep checking the buffer until the inevitable break
setInterval(() => {
const hasBrokenLine = buffer.toString().split('\n').slice(0, -1).filter(line => {
return !line.startsWith('hello')
})
if (hasBrokenLine.length > 0) {
console.log(buffer.toString('utf8'));
process.exit();
}
}, 10)
How often does it reproduce? Is there a required condition?
Every time
What is the expected behaviour?
The file would never end, and every line in the buffer would start with 'hello'
What do you see instead?
The output reads:
node-bug % node test.js
hello at 1
hello at 2
hello at 3
hello at 4
hello at 5
hello at 6
hello at 7
hello at 8
hello at 9
hello at 10
hello at 11
hello at 12
hello at 13
hello at 14
hello at 15
hello at 16
hello at 17
hello at 18
hello at 19
hello at 20
hello at 21
hello at 22
hello at 23
hello at 24
hello at 25
26
Note, the last line should be hello at 26
Additional information
- The dirty line is not always the same, or the same format. The missing data varys.
- This doesn't happen when you choose a highWaterMark greater than the size of the file change.
- I have checked the
/tmp/test.txt
file and the data is as expected. - I have played with increasing the timers, and while the success rate it higher, it inevitably breaks after only a few more seconds.
- I have tried on Linux (via Docker for Mac) too and the same issue occurs.