Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThreadLocalValue is not fiber-aware #15088

Open
straight-shoota opened this issue Oct 16, 2024 · 7 comments
Open

ThreadLocalValue is not fiber-aware #15088

straight-shoota opened this issue Oct 16, 2024 · 7 comments
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. status:discussion topic:multithreading topic:stdlib:concurrency

Comments

@straight-shoota
Copy link
Member

straight-shoota commented Oct 16, 2024

Crystal::ThreadLocalValue(T) is an internal mechanism to store data in a thread-local datastore i.e. it's a different value for every thread.

However, threads are a pretty insignificant concept in the Crystal runtime. Their primary purpose is to serve as a vehicle for running fibers.

This is particularly relevant for the upcoming execution contexts from crystal-lang/rfcs#2: Currently with -Dpreview_mt, fibers are pinned to a thread. But the new schedulers will move fibers between threads. We cannot expect a fiber runs on the same thread consistently. This invalidates the very reason to use thread-local values as a means to store contextual information for a sequentially executed section of code.

Current uses of ThreadLocalValue:

  • readers, writers and their events in IO::Evented (most of that usage goes away in Event Loop #14996)
  • recursive reference detection
    • Reference::ExecRecursive: #to_s and #pretty_print for Array, Hash, Deque
    • Reference::ExecRecursiveClone: Object#clone (for reference types)
  • PCRE2 bindings:
    • Regex::PCRE2.@@jit_stack: This is exclusively used as scrap space for the pcre2_match function and can be reused as soon as the function returns. There should be no risk of the currently executing fiber switches threads during that. So keeping this thread local seems reasonable and more efficient than a per-fiber allocation. Alternative proposals are welcome, though.
    • Regex::PCRE2#@match_data: This is used within Regex#match_impl and provides scrap space for pcre2_match but it also store output data (ovector) which is clones afterwards to adopt it into the GC (and re-use the buffer). It might be feasible to replace this shared space with a fresh GC allocation on each match.

Only the usage in IO::Evented matches the actual thread-scope intentionally. For the other uses, a fiber-scoped value would actually be more appropriate. But thread-scoped is an acceptable fill-in when the usage context guarantees that there's no chance for a fiber swap which could interrupt execution. For example, Regex::PCRE2#@match_data is a re-usable buffer and there should be no potential yield point between writing and reading the buffer.

The situation is different with ExecRecursive though: It's used in the implementation of Array#to_s and similar methods. As these work directly with IO, there are many possibilities for a fiber swap. And this is indeed a problem, even in a single threaded runtime.

Demonstration with a forced fiber swap within Array#to_s, simulating a natural trigger through IO operations:

require "wait_group"

class YieldInspect
  def inspect(io : IO)
    io << "YieldInspect"

    Fiber.yield
  end
end

WaitGroup.wait do |wg|
  ary = [YieldInspect.new]

  2.times do |i|
    x = i
    wg.spawn do
      puts "#{i}: #{ary.to_s}"
    end
  end
end

This program prints:

[...]
[YieldInspect]

When the first fiber executes Array#to_s it puts ary into the thread-local reference registry. Upon executing YieldInspect#inspect it yields to the other fiber. When the second fiber executes Array#to_s it finds ary is already registered and thus assumes it's recursive iteration, which it indicates by ....

I believe most of these uses of ThreadLocalValue are conceptually incorrect. They should use fiber-local values instead which would ensure correct semantics.
So an obvious solution would be to introduce Crystal::FiberLocalValue(T) as a (partial) replacement with basically the same implementation, just using Fiber.current as a key.

However, we might also take this opportunity to consider alternative implementations of the value storage or avoiding it entirely through refactoring the code that uses it.

@yxhuvud
Copy link
Contributor

yxhuvud commented Oct 17, 2024

So an obvious solution would be to introduce Crystal::FiberLocalValue(T) as a (partial) replacement with basically the same implementation, just using Fiber.current

Then we'd have to also delete it on fiber deletion, so there'd be potentielly some contention on many fibers. (do we do that on thread destruction for thread locals? We probably should, but the lifecycle for threads are not fully explored as they are not typically dynamically created or destroyed (yet)).

But - do we really need that? How about just adding a potentially empty pointer to the fiber itself? Is there even any need of synchronization for fibers? Is it allowed to refer to the fiber local value of a different fiber?

(I kinda like having threadlocals around though even if they are not used very often)

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Oct 17, 2024

Sadly: there can be nested recursive checks at the same time in the same fiber, so we need something fiber-aware for the recursion checks. If the checks cleanup after themselves in an ensure block, I guess we don't need to clean anything when the fiber dies? 🤔

IO::Evented (libevent only): if we have an incentive to actually fix it, we could migrate to a pair of Hash(IO, Deque) right on each EventLoop instance instead.

PCRE: since there isn't any concurrency issue, it could just use ivars on Thread; that would be much simpler (no need for a mutex, hash, ...).

@straight-shoota
Copy link
Member Author

there can be nested recursive checks at the same time in the same fiber

What do you mean by that? I'd expect there's always only a single recursive check per fiber.

@straight-shoota
Copy link
Member Author

do we do that on thread destruction for thread locals? We probably should, but the lifecycle for threads are not fully explored as they are not typically dynamically created or destroyed (yet)

That's a good point. With the preview_mt runtime, threads don't get destroyed, so this wasn't really relevant. But in the future they will be. So we should find a way to clean up.

The thread-local ivars in PCRE2 (@match_data) and IO::Evented all get cleaned up in the respective finalizer.
Thread-local class variables however do not get cleaned up. Until now there was no real reason to so.

@straight-shoota
Copy link
Member Author

But - do we really need that? How about just adding a potentially empty pointer to the fiber itself? Is there even any need of synchronization for fibers? Is it allowed to refer to the fiber local value of a different fiber?

Yeah I agree this could be a good solution.

I don't think fiber-local (or thread-locals) should/need to be accessed from outside the respective scope.

@crysbot
Copy link

crysbot commented Oct 18, 2024

This issue has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/charting-the-route-to-multi-threading-support/7320/1

@straight-shoota
Copy link
Member Author

Storing context information in the fiber itself is also pretty neat for cleanup. When the fiber is freed, the references go out of scope automatically. There's no need to explicitly clean up references like it would be necessary for the hash in ThreadLocalValue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. status:discussion topic:multithreading topic:stdlib:concurrency
Projects
Status: Backlog
Development

No branches or pull requests

4 participants