-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CV2-3551 add local queue consumption and re-work a ton of the startup flow to accommodate #33
Conversation
… flow to accommodate
if messages_with_queues: | ||
logger.debug(f"About to respond to: ({messages_with_queues})") | ||
bodies = [schemas.Message(**json.loads(message.body)) for message, queue in messages_with_queues] | ||
for body in bodies: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not in this PR, but we should probably call this in parallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cosign
|
||
logger.info("Beginning callback loop...") | ||
while True: | ||
queue.send_callbacks() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably want some (tuneable?) time interval in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens when the worker dies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all happens in the context of sqs messages, so the message eventually gets tossed back into available messages and some other worker will pick it up - should be fairly foolproof as it just ties into existing foolproof frameworks...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there will be dataloss because queues, but if the worker dies, we will stop pulling things off the queue while other parts of the system are still happily putting things in queues ... but will AWS know that the service needs to be restarted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our discussions so far, my understanding is that one of the big reasons to use SQS directly would be to make our system scale according to queue depth. As the queue grows, we would increase the number of workers to address. A hung machine would still potentially need to be taken out back individually, but it shouldn't arrest all work entirely.
Alright, so a few things happening here as per conversations last week:
This should allow us to run everything end-to-end locally at this point. We should be able to just set MODEL_NAME= mean_tokens__Model and then start up the environment, and be able to operate full round trips