Diagnostic code for fixed bug still present in v4.x's V8
- Version: v4.0.0 and later in the v4.x release line.
- Platform: all 64bits platforms except AIX.
- Subsystem: V8.
Lately, I came across several occurrences of crashes that look like duplicates of https://github.com/nodejs/node/issues/3898. Here's the callstack of one of these occurrences as displayed by mdb_v8
's ::jsstack
command:
> ::jsstack
native: v8::base::OS::Abort+9
native: v8::internal::PointersUpdatingVisitor::VisitPointers+0x90
native: v8::internal::HeapObject::IterateBody+0x569
native: v8::internal::MarkCompactCollector::EvacuateNewSpaceAndCandidate...
native: v8::internal::MarkCompactCollector::SweepSpaces+0x141
native: v8::internal::MarkCompactCollector::CollectGarbage+0x48
native: v8::internal::Heap::MarkCompact+0x60
native: v8::internal::Heap::PerformGarbageCollection+0x4c0
native: v8::internal::Heap::CollectGarbage+0x238
native: v8::internal::Heap::HandleGCRequest+0x8f
native: v8::internal::StackGuard::HandleInterrupts+0x31c
native: v8::internal::Runtime_StackGuard+0x2b
(1 internal frame elided)
js: <anonymous> (as ScatterGather.scatter)
(1 internal frame elided)
(1 internal frame elided)
(1 internal frame elided)
js: InnerArrayForEach
js: forEach
(1 internal frame elided)
js: <anonymous> (as MessageIndexer.get_modified_threads)
(1 internal frame elided)
(1 internal frame elided)
(1 internal frame elided)
(1 internal frame elided)
js: <anonymous> (as RiakCacheClient.write_cache)
(1 internal frame elided)
js: <anonymous> (as RiakRequest.callback)
js: <anonymous> (as RiakRequest.on_response)
(1 internal frame elided)
(1 internal frame elided)
js: <anonymous> (as PoolRequestSet.handle_response)
(1 internal frame elided)
js: <anonymous> (as PoolEndpoint.request_succeeded)
js: <anonymous> (as PoolEndpointRequest.on_end)
(1 internal frame elided)
js: emit
js: endReadableNT
js: nextTickCallbackWith2Args
js: _tickDomainCallback
(1 internal frame elided)
(1 internal frame elided)
native: v8::internal::Execution::Call+0x107
native: v8::Function::Call+0xff
native: v8::Function::Call+0x41
native: node::AsyncWrap::MakeCallback+0x22c
native: node::StreamBase::EmitData+0xdd
native: node::StreamWrap::OnReadImpl+0x14a
native: node::StreamWrap::OnRead+0x7c
native: uv__read+0x2ef
native: uv__stream_io+0x2a8
native: uv__io_poll+0x22a
native: uv_run+0x15e
native: node::Start+0x558
native: _start+0x6c
Unfortunately, I can't reproduce that crash and I can't share the core files that were provided to me.
My understanding was that, as mentioned in #3898, that crash was triggered by some diagnostic code that was added in order to investigate another V8 issue.
What surprised me is that it seems that the fix for the original issue was integrated in the v4.x branch as part of the upgrade to V8 4.5.103.24 with https://github.com/nodejs/node/commit/c43172578e3e10c2de84fc2ce0d6a7fb02a440d8 but the diagnostic code is still there.
The commit that upgrade V8 to version 4.5.103.24 in node v4.x doesn't show the fix for the original issue (https://github.com/v8/v8/commit/8606664b37f4dc4b42106563984c19e4f72d9d3a) in the GitHub web UI, but using git show c43172578e3e10c2de84fc2ce0d6a7fb02a440d8
in a local checkout of the v4.x-staging
branch shows that the fix was indeed integrated into the v4.x
and v4.x-staging
branches.
Another thing that surprised me is that c4317257 represents the upgrade of V8 to version 4.5.103.24, but the fix for the original issue (https://github.com/v8/v8/commit/8606664b37f4dc4b42106563984c19e4f72d9d3a) doesn't seem to be part of V8's 4.5.103.24 release:
➜ v8 git:(master) ✗ git tag --contains 8606664b37f4dc4b42106563984c19e4f72d9d3a | grep 4.5.103.24
➜ v8 git:(master) ✗
It's very likely that I'm missing something about how release branches/tags are organized in V8's repository though.
Now that raises a few questions:
- The current situation seems to be that node v4.x branches have the fix for a bug but that the diagnostic code for that bug, which makes some node programs crash, is still present. Is that on purpose, or is it an inconsistency that fell through the cracks?
- If the fix for that bug is present, how come the diagnostic code still triggers crashes? If I understand correctly, the diagnostic code aborts if a JS object has an address whose value may have been overwritten by a double. "Whether the address of an object may have been overwritten by a double" is determined by checking if its address is
> 1 << 48
. Is it possible for a valid JS object to have an address that is >1 << 48
for a 64bits process? If so, does that mean that the diagnostic code was overly conservative and triggered false positives? Basically, I'm trying to determine whether it makes sense to remove that diagnostic code by back porting https://codereview.chromium.org/1420253002 to v4.x-staging now that the original bug has been fixed.
FWIW, I tested a back port of https://codereview.chromium.org/1420253002 to v4.x-staging with make check
and make test-v8
and it doesn't trigger any regression.
/cc @nodejs/v8.