-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiler: lifecycle issue on finalizers with tables #2102
Conversation
assert_unlinkable
assert_uninstantiable
sorry, rather than fixing the spectest, I am strongly thinking that we should fix the lifecycle issue of engine's code in general -- e.g. holding the reference to the imported functions etc in the compiled module. anyway thank you for hunting down the root cause! |
ok the root cause seems to be that related #1608 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing 🙌
We discussed on Slack and agreed that there was a reference problem, this is likely the fix.
Nice work 👏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes, I missed your updated PR. 😔
This seems great. 👍
Sorry for the duplication!
Yeah sorry for not being clearer but it looks like this doesn't fix the reproducer, see the failing tests :/ |
assert_uninstantiable
update: this might be the culprit with putting the finalizer on the wazero/internal/engine/compiler/engine.go Lines 676 to 679 in 68409c7
casting the |
I followed @ncruces observation that setting This is still rough at the edges (including some linting errors that I have intentionally left for now), I am pushing the commit just to give you an idea. Let me know what you think EDIT: heh I also forgot about amd64. Well you get the idea. tests have been blindly updated to pass, but they might be simplified / be useless now in some cases (?) |
commit e9ad3fa temporarily reverts the patch to prove that the hammer test in e240147 demonstrates the issue in wazevo too. The solution seems to be, again, to use EDIT: proof https://github.com/tetratelabs/wazero/actions/runs/8183672859/job/22376887112 |
needs rebasing anyway due to #2130 |
Signed-off-by: Edoardo Vacchi <[email protected]>
…oid auto-close Signed-off-by: Edoardo Vacchi <[email protected]>
Signed-off-by: Edoardo Vacchi <[email protected]>
Signed-off-by: Edoardo Vacchi <[email protected]>
…e` to avoid auto-close" This reverts commit 5ddc150. Signed-off-by: Edoardo Vacchi <[email protected]>
Signed-off-by: Edoardo Vacchi <[email protected]>
Signed-off-by: Edoardo Vacchi <[email protected]>
Signed-off-by: Edoardo Vacchi <[email protected]>
Signed-off-by: Edoardo Vacchi <[email protected]>
Signed-off-by: Edoardo Vacchi <[email protected]>
it does not look like At this time, I am not confident in this solution, but at least now we have a reproducer and we still know that it is a very specific edge case that we should be able to work around by deferring the closure of modules [EDIT] more info: by reverting fde2d90 and 3458d18 the hammer test fails almost reliably, but with fde2d90 and 3458d18 it becomes just flakier, i.e. the current change doesn't seem to solve the issue completely and/or there is another issue at play. |
I will take over this in a different PR! |
thanks! |
So this was definitely a journey :D This PR should fix #2039 for the old compiler, but really it addresses an issue that is unrelated to the file cache, or the compiler (well, not directly) for that matter. It was indeed related to the finalizer, but it is much more articulated than you'd think.
Indeed, the problem is a lifecycle issue that involves finalizers, and it definitely relates to invoking code that is no longer
mmap
ped; however, it was not due to a module beingunmap
ped while executing, but it is a module doing acall_indirect
on a module that was collected.The spect tests generally do not fail because the GC is not performed at each iteration, but if we force it to collect at the end of each unit test (ab4f3d8) the spec tests will fail very predictably (see https://github.com/tetratelabs/wazero/actions/runs/8086797452/job/22097337665 and https://github.com/tetratelabs/wazero/actions/runs/8086797455/job/22097342252)
In fact, by allowing a bit of output, we learn that one of the tests in
linking.wast
enters (->
) a code segment with a given address (in this case0x12aba0000
) inassert_uninstantiable
, then leaves it (<-
), but then this segment is immediately released (!!
), i.e. finalized. This is also where we learn thatassert_return/line:388
, later, fails spectacularly:now how is it possible that
0x12ab90000
causes a fault at0x12aba0000
? that's becauselinking.wast:388
is referencing a function from another module (these are linking-related tests after all)Now, last question. Why is segment
0x12aba0000
being released at all? Well, here's the kicker: the module inassert_uninstantiable/line:371
is instantiated withbut this flags the compiled module to be closed on a compilation failure:
But on close, the module gets deleted from the engine, and in turn
this removes the last actual reference from memory, making it eligible for a future collection.
So, the solution is not to use
InstantiateWithConfig()
and instead compile then instantiate. This won't auto-close the module, resolving the issue.I am marking this as a draft because this indeed fixes the problem, but I have not written proper test for it yet.
Huge thanks to @achille-roussel and @ncruces for the help!! 🙏
EDIT meanwhile I added a proper reproducer, and we dug further into the issue. The real problem is that
function
objects hold a reference to acompiledCode
that is independent from the one incompiledModule
; so, when acompiledModule
gets collected, thefunction
ends up referencing an invalid memory address (no longer mmap'd). We tried putting the finalizer on*compiledCode
instead, but this still won't work.This might be the reason:
wazero/internal/engine/compiler/engine.go
Lines 676 to 679 in 68409c7
casting the
unsafe.Pointer()
to a uintptr (as @ncruces noted offline) makes it decay to a plain integer, so the GC will no longer account for it.