Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track application restart generations #2564

Open
wants to merge 25 commits into
base: stable-6.0
Choose a base branch
from

Conversation

CamJN
Copy link
Member

@CamJN CamJN commented Sep 9, 2024

The generation number groups all application processes according to the most recent restart initiation. We use this information to intelligently route to newer generations if there is capacity.

Fixes one aspect of #2551.

@CamJN CamJN self-assigned this Sep 9, 2024
@CamJN CamJN added this to the 6.0.24 milestone Sep 9, 2024
@FooBarWidget FooBarWidget changed the title app generations feature Track application restart generations Sep 17, 2024
not that there’s a point in this test really
src/agent/Core/ApplicationPool/Process.h Outdated Show resolved Hide resolved
src/agent/Core/ApplicationPool/Process.h Outdated Show resolved Hide resolved
test/cxx/Core/ApplicationPool/PoolTest.cpp Outdated Show resolved Hide resolved
test/cxx/Core/ApplicationPool/PoolTest.cpp Show resolved Hide resolved
test/cxx/Core/ApplicationPool/ProcessTest.cpp Show resolved Hide resolved
test/cxx/Core/ApplicationPool/ProcessTest.cpp Outdated Show resolved Hide resolved
}

TEST_METHOD(16) {
// Test that the correct process from the pool is routed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is incomplete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, I asked for help with this one.

*/
Process *
Group::findEnabledProcessWithLowestBusyness() const {
Group::findBestEnabledProcess() const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My head is clearer today and I thought of a problem with the routing algorithm. Recall that the algorithm has now turned into essentially this:

Select the first process according to the order generation DESC, startTime ASC, busyness ASC.

This means that during a restart, new requests are only routed to next-generation processes, while previous-generation processes are only allowed to finish their existing requests. That's pretty bad.

I suggest a two-phase routing algorithm:

  1. First, we select the "best process" according to the algorithm as you have already coded in this branch.
  2. If it turns out that this process isTotallyBusy(), then we fallback to the old algorithm where we select the least busy process amongst all generations. This way we won't waste the old processes.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes more sense to me than the last algorithm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also since we know we're going to punt if the process isTotallyBusy() there's no need to loop twice, just reject if isTotallyBusy() is true as we iterate.

It does raise another question though, what do we do if all processes are totally busy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a possible solution. I keep track of the best process which is totally busy, until I find one that is not totally busy, at the end of the loop I return either the best process which is not totally busy or if all processes are totally busy, then the best totally busy process.

The mechanism for choosing which process is best is decided as above for both processes, but tracked separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Passenger Enterprise zero-downtime restarts & request routing can prolong outages and delay recovery
2 participants