Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix connections in the pool depleting due to unfulfilled Promises and timeouts #683

Merged
merged 5 commits into from
Aug 5, 2018
Merged

Fix connections in the pool depleting due to unfulfilled Promises and timeouts #683

merged 5 commits into from
Aug 5, 2018

Conversation

willmorgan
Copy link
Collaborator

@willmorgan willmorgan commented Jul 24, 2018

Fixes #610
Closes #671

The number of available connections will routinely, gradually reduce to 0 over a period of hours. This seems to be prevalent on Azure hosted SQL Server instances.

We've spent a few weeks monitoring the problem and logging out connection pool metrics - the number of available connections (that can be loaned), the number of pool factory create operations (resolves to _poolCreate in this library), and so on...

{"pool_all_objects_size":"9","potentially_allocable_resource_count":"10","threshold":"10","pool_reported_size":"10","available_connections":"9","factory_create_operations":"1"}

Unless you're monitoring connection metrics yourself, the problem will manifest when you see EREQUEST errors. We believe this to be due to the generic pool library being unable to loan a connection to a borrower, as the "count" calculation also takes into account the number of factory create operations in progress: https://github.com/coopernurse/node-pool/blob/master/lib/Pool.js#L638-L647

This means that while you may have a pool max size of 10, you could have 10 factory create operations in progress, resulting in 0 usable connections. When node-pool evaluates whether more connections need to be created, it won't create any due to the above functionality: https://github.com/coopernurse/node-pool/blob/master/lib/Pool.js#L332-L340

In master as of now, the _poolCreate method does not always resolve upon an error, and additionally, there is currently no explicit timeout handling on establishing connections.

Therefore we now explicitly timeout if the connectTimeout config option is exceeded (thanks @jacqt), and reject the _poolCreate promise on any error. In practice this doesn't solve the ESOCKET problem with Azure (we're still working on them to get this fixed), but it should stop the catastrophic pool exhaustion.

@willmorgan
Copy link
Collaborator Author

AppVeyor build seems to fail due to msnodesql8...

@dhensby
Copy link
Collaborator

dhensby commented Jul 25, 2018

The first 3 commits can all be squashed ;)

@dhensby
Copy link
Collaborator

dhensby commented Aug 2, 2018

@patriksimek could we get some eyeballs on this, please?

@dhensby
Copy link
Collaborator

dhensby commented Aug 2, 2018

Any thoughts on why the appvayor tests are broken? @willmorgan

@willmorgan
Copy link
Collaborator Author

Any thoughts on why the appvayor tests are broken?

Nope. It isn't obvious. It's failing on an entirely different driver to the one that this PR attempts to patch.

@patriksimek
Copy link
Collaborator

Changes seem fine to me. I'm not sure about the timeout check - Tedious is already doing that. Do we need to duplicate the timeout check? Is the timeout check built in Tedious not working correctly?

@willmorgan
Copy link
Collaborator Author

Hi @patriksimek, I've checked logs on a deployed instance we have had the code running on for a few weeks now. There was one instance of the message occurring, but looking at Tedious's core code, it looks like it should be getting handled by their own connectTimeout checks.

I've removed the manual check for now.

What else needs to be done to get this merged?

@patriksimek patriksimek merged commit 06468cd into tediousjs:master Aug 5, 2018
@patriksimek
Copy link
Collaborator

Thanks, I'll release a new version with the patch after I figure out what causes AppVeyor to fail.

@patriksimek
Copy link
Collaborator

Released in 4.2.1. Please, install with npm i mssql@next.

@willmorgan willmorgan deleted the pool-fix-contrib branch August 5, 2018 22:00
learnmews added a commit to learnmews/loopback-connector-mssql that referenced this pull request Aug 28, 2018
upgrade mssql dependency to 4.2.1. mssql package has been updated to resolve an issue related to resource timeout error. Loopback connector is dependent on this package and can face similar issue. 

tediousjs/node-mssql#610
tediousjs/node-mssql#683
@dhensby
Copy link
Collaborator

dhensby commented Sep 12, 2018

@patriksimek - sorry to double up on this, but 4.2.1 has been tagged for over a month and we've been using it in prod for the majority of that time with no ill-effects, please can we get 4.2.1 promoted to @stable? see #703

Thanks!

@patriksimek
Copy link
Collaborator

@dhensby Yep, I have just promoted 4.2.1 to latest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ResourceRequest timed out
4 participants