Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the blocking wait during message routing #413

Merged
merged 3 commits into from
Jul 16, 2021

Conversation

mdickinson
Copy link
Member

@mdickinson mdickinson commented Jul 15, 2021

This PR simultaneously improves a failure mode and makes failure more likely. :-)

In more detail:

When a message is sent from a background task to the corresponding future, the following sequence of events occurs:

  1. The background task puts the message on the message queue (a regular queue.Queue instance)
  2. The background task sends a "ping" to the message router to let it know that there's a message to be processed
  3. When the message router receives the "ping", it gets the message from the message queue and dispatches it.

Currently, in the main branch, for step 3 we use a blocking message_queue.get() operation to get the message. In effect we're assuming that if a message has been put on the queue, message_queue.get() will not block for long while the message is retrieved. If something goes wrong with some part of the machinery and we somehow get a ping without a corresponding message, then the router will block forever. That's not a great failure mode.

With this PR, we use a non-blocking message_queue.get operation to get the message. We're now making a stronger assumption than before, namely that we don't need to wait at all for the message that's been put on the queue to arrive. I believe that assumption is justified, though it's difficult to extract a promise of this from the documentation. Now if something goes wrong (a ping arriving without a message being queued), the router will crash with a queue.Empty exception. This seems like a better failure mode.

While we're touching the route_until methods, this PR takes a liberty and cleans up the implementations of those methods slightly. It also adds docstrings for the _route_message private method.

Closes #313 (really, this PR makes #313 invalid, but it resolves the issue that prompted #313).

N.B. The duplication between the MultithreadingRouter and MultiprocessingRouter is still biting us. (cf. #383)

@mdickinson
Copy link
Member Author

FWIW, with this change I've run the test suite a couple of hundred times locally and not observed any related failures. (I did see a couple of occurrences of #412, but that's not related.)

Copy link
Contributor

@rahulporuri rahulporuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few suggestions

traits_futures/multiprocessing_router.py Show resolved Hide resolved
traits_futures/multiprocessing_router.py Outdated Show resolved Hide resolved
traits_futures/multithreading_router.py Show resolved Hide resolved
traits_futures/multithreading_router.py Outdated Show resolved Hide resolved
@mdickinson mdickinson merged commit 018b8ab into main Jul 16, 2021
@mdickinson mdickinson deleted the refactor/remove-router-blocking-wait branch July 16, 2021 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add timeout and fallback code for blocking wait in message router
2 participants