-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Avoid blocking sends to a channel without timeout #131
Comments
Oof, I hope not! I can't speak to specific packages that may serve specific purposes for you, but in general, it should be very rare that you make a nonblocking send on a channel. Go already gives you async where it matters, on syscalls; you want your application code to be nice and synchronous, i.e. deterministic, so that it takes advantage of the runtime. In the hopefully rare cases you use channels to communicate between components, they should almost always be blocking sends and recvs on chans with capacity zero. Buffered channels are usually queues, and queues exist solely to solve burstiness. That means they belong at the edges of a system, not within it 😇 |
This issue isn't to indicate that all sends must be non-blocking, but that blocking sends probably should have a fallback, e.g., context cancellation or timeout. |
When a chan send occurs in a context-cancelable context 😉 I agree, you want that cancelation respected. But I don't think it's correct to claim blocking sends should have a fallback/timeout generally. |
It's true that there's not really a "one-size-fits-all" approach, but I don't think the issue is whether the send is sync/async, or whether channel usage is error-prone: the issue that it isn't defensive - it's fragile - unless you add an unblocking strategy, and channel capacity is not an unblocking strategy. In any event, I would argue that one should have a good reason for not having any control mechanisms in |
Absolutely true! A rule like "Don't add capacity to a channel without knowing exactly why" would be 😎
I'm not sure the rule generalizes in the way you're expressing it here. A goroutine blocked in a chan send, or recv, or select statement, isn't by itself a problem — it's only a problem if that introduces deadlocks, races, etc. as a consequence. And if that can happen, adding a timeout to the select statement isn't any better than adding capacity to the channel — both just paper over the underlying structural errors in the code. To be clear, I get the idea you're trying to convey, here. I tend to express it slightly differently, not in terms of the channel operations, but in terms of goroutine lifecycle management. From my own style guide —
The key points here being the first two. And just for context, so it doesn't look like I'm swooping in here with all these unsolicited complaints 😅 — the Uber style guide is frequently referenced as a source of truth among the Gophers I help and mentor and stuff, so I'm somewhat motivated to nudge the (few!) points that in my experience tend to lead those folks down suboptimal paths. |
They're good points overall. Like anything, there are exceptions, but in common cases these hold true. This is effectively the spirit in which this
I think that's a situational assessment, and is predicated on the effect that it has on the rest of the program relative to that behavior (i.e., it could either be a feature or a defect, depending on how you're rationalizing it). If the effects of said blocking are insulated from the rest of the program, or if transitive backpressure is intended, then yeah, I agree with you that it's not an issue in the same sense as I'm describing. That argument would not hold for situations with leaks, obviously. However, we should avoid assuming specialized application behaviors/contracts/semantics/etc, so I think that we need to treat those "situationally okay" scenarios as exceptions, rather than the common case (hence taking a more general stance that "a bug is the possibility of defective behavior, regardless of how many times it manifests" rather than "it's not a bug if we haven't observed it [yet]"), and observing that a lot of the time, basic flow management or other synchronization tools reduce the possibility of various related bugs. (We definitely can't cover every possible related bug, and there is certainly some amount of RTFM required because folks should understand the implications of what they're doing, but I'm personally of the opinion that it's good to document behaviors that encourage thoughtfulness, clear intent, and consistently nominal behavior, which I hope that this falls into.)
We appreciate the feedback! Glad to hear that folks are (hopefully) finding it helpful, and we're also quite motivated to ensure not only that it's as useful to others as it is for us, but also that it evolves as we (and the community) do. |
Sure! But a goroutine blocked in a chan send, recv, or select does not by itself constitute the possibility of defective behavior, any moreso than does an integer with a value of 0, say. The possibility of deadlock, or a goroutine leak, or whatever else is the defective behavior, and that requires more than just the one goroutine's code to assess. Similarly, a function that uses that integer as a denominator is required for its 0 value to be worrisome. The thing I'm guarding against here is the addition of timeout cases to channel operations that serve no purpose. Today I wrote some code that selected on one of four channels; three were errors that may or may not come, and one was the result of an operation that would take an indeterminate amount of time. The function containing that select statement is meant to block until one of those results had manifest. It would be incorrect to add a timeout to the select, or to thread a context through, in this situation. But this rule would suggest that my code is faulty. Or, even easier: consider signal.Notify — same semantics! Sometimes goroutines can block indefinitely. And it's not a rare situation: the Go runtime is optimized for exactly this situation. So it's not necessarily a problem. Right? |
This is essentially the crux of the issue. I'm not suggesting that we add channel dogma that should be followed in all situations, but rather to be sensitive to context/semantics and provide reasonable heuristics for when you probably should (and if there are any overwhelmingly common cases, call them out as examples). Agreed that it doesn't make sense to "just add timeouts" everywhere because that itself somehow magically makes channel usage "resilient" or "correct", because it doesn't. To that end, whatever guidance we add should probably be something to the effect of: "if your channel send/recv should not block for an unknown amount of time (potentially forever), add a timeout/etc case". (TBD, of course; we're not in a hurry to rush any guidance.) |
Semi-related to Channel Size is One or None
We've seen cases of deadlocks/unexpected blocking caused by blocking sends to a channel (example: tchannel-go#528). It's generally much safer to use a select with another case for timeout /
ctx.Done()
, or a non-blocking send.There are some use-cases where it's safe; typically when the number of writers is bound to channel size and can be verified easily, so not sure how generally applicable this is.
The text was updated successfully, but these errors were encountered: