Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server down on freshly estabilshed connection #131

Open
david0 opened this issue Jun 14, 2022 · 10 comments
Open

Server down on freshly estabilshed connection #131

david0 opened this issue Jun 14, 2022 · 10 comments

Comments

@david0
Copy link

david0 commented Jun 14, 2022

Hi, we get the following error on an connection that has been freshly connected before:

Caused by: LDAPException(resultCode=81 (server down), errorMessage='The connection is not established.', ldapSDKVersion=5.0.0-verboselogging-SNAPSHOT, revision=286d9a21386aee242dfaa6cc1052a606f11dedc8)
	at com.unboundid.ldap.sdk.LDAPConnection.registerResponseAcceptor(LDAPConnection.java:4725)
	at com.unboundid.ldap.sdk.SimpleBindRequest.process(SimpleBindRequest.java:561)
	at com.unboundid.ldap.sdk.LDAPConnection.processBindOperation(LDAPConnection.java:4400)
	at com.unboundid.ldap.sdk.LDAPConnection.bind(LDAPConnection.java:2311)
	at com.unboundid.ldap.sdk.LDAPConnection.bind(LDAPConnection.java:2274)

This is part of our tests where we switch connections a lot between two servers. The LDAPConnection was used before on a completely different target system.

Looking at the logs, to me there seems to be a race condition that the old LDAPConnectionReader is shut down after the new connection was established (I added some additional logs):


[07:22:28,457]  INFO - Change LDAP operations connection to runner:39807

...
(LDAPConnection.connect)
INFO: { "timestamp":"2022-06-14T07:22:28.458Z", "debug-type":"connect", "level":"INFO", "thread-id":1, "thread-name":"Test worker", "connection-id":2, "disconnected-from-address":"runner", "disconnected-from-port":58875, "disconnect-type":"RECONNECT", "ldap-sdk-version":"5.0.0-verboselogging-SNAPSHOT", "ldap-sdk-revision":"286d9a21386aee242dfaa6cc1052a606f11dedc8" }
WARNING: { "timestamp":"2022-06-14T07:22:28.459Z", "debug-type":"connect", "level":"WARNING", "thread-id":915, "thread-name":"Background connect thread for runner/172.17.0.1:39807", "message":"Connecting to 39807", "caught-exception":{ "message":"Connecting", "stack-trace":[ "run(ConnectThread.java:165)" ] }, "ldap-sdk-version":"5.0.0-verboselogging-SNAPSHOT", "ldap-sdk-revision":"286d9a21386aee242dfaa6cc1052a606f11dedc8" }
WARNING: { "timestamp":"2022-06-14T07:22:28.458Z", "debug-type":"connect", "level":"WARNING", "thread-id":914, "thread-name":"Connection reader for connection 2 to runner:58875 (closed)", "message":"Closing connection to 58875", "caught-exception":{ "message":"Closing connection to 58875", "stack-trace":[ "logClosingConnection(Debug.java:1651)", "close(LDAPConnectionInternals.java:697)", "setClosed(LDAPConnection.java:4681)", "closeInternal(LDAPConnectionReader.java:1017)", "run(LDAPConnectionReader.java:419)" ] }, "ldap-sdk-version":"5.0.0-verboselogging-SNAPSHOT", "ldap-sdk-revision":"286d9a21386aee242dfaa6cc1052a606f11dedc8" }
java.lang.Exception: Closing connection to 58875
	at com.unboundid.util.Debug.logClosingConnection(Debug.java:1651)
	at com.unboundid.ldap.sdk.LDAPConnectionInternals.close(LDAPConnectionInternals.java:697)
	at com.unboundid.ldap.sdk.LDAPConnection.setClosed(LDAPConnection.java:4681)
	at com.unboundid.ldap.sdk.LDAPConnectionReader.closeInternal(LDAPConnectionReader.java:1017)
	at com.unboundid.ldap.sdk.LDAPConnectionReader.run(LDAPConnectionReader.java:419)

INFO: { "timestamp":"2022-06-14T07:22:28.459Z", "debug-type":"connect", "level":"INFO", "thread-id":1, "thread-name":"Test worker", "connected-to-address":"runner", "connected-to-port":39807, "connection-id":2, "ldap-sdk-version":"5.0.0-verboselogging-SNAPSHOT", "ldap-sdk-revision":"286d9a21386aee242dfaa6cc1052a606f11dedc8" }

INFO: { "timestamp":"2022-06-14T07:22:28.460Z", "debug-type":"connect", "level":"INFO", "thread-id":1, "thread-name":"Test worker", "message":"Setting the SO_TIMEOUT value for connection LDAPConnection(not connected) to 300000ms.", "ldap-sdk-version":"5.0.0-verboselogging-SNAPSHOT", "ldap-sdk-revision":"286d9a21386aee242dfaa6cc1052a606f11dedc8" }

INFO: { "timestamp":"2022-06-14T07:22:28.460Z", "debug-type":"connect", "level":"INFO", "thread-id":914, "thread-name":"Connection reader for connection 2 to runner:58875 (closed)", "connection-id":2, "disconnected-from-address":"runner", "disconnected-from-port":58875, "disconnect-type":"RECONNECT", "ldap-sdk-version":"5.0.0-verboselogging-SNAPSHOT", "ldap-sdk-revision":"286d9a21386aee242dfaa6cc1052a606f11dedc8" }

[07:22:28,461] (LDAPConnection.bind fails with error 81, error shown)

Imho it is not to be guaranteed that LDAPConnectionReader.close will always wait for the thread shut down. If the shut down takes longer than 500ms, it will return anyway which could lead to this situation I suppose (That seems to be not the case here, but there at least it proves the point).

@dirmgr
Copy link
Collaborator

dirmgr commented Jun 14, 2022

I'm currently looking into this. I have thus far not been able to see the problem for myself. Do you have a reproducible test case that seems to work?

As an alternative, you could try using the LDAP SDK in synchronous mode rather than the default asynchronous mode. When run in synchronous mode, it doesn't maintain a separate listener thread per connection, which actually yields slightly better performance and less overhead, but you can't process asynchronous operations or use the same connection for multiple concurrent operations on separate threads. To use synchronous mode, create an LDAPConnectionOptions object, call setUseSynchronousMode(true) on that object, and then pass that object as an argument when creating the LDAPConnection object.

@david0
Copy link
Author

david0 commented Jun 15, 2022

I was not yet successful reproducing that problem with a smaller test.

I changed our code to always use a fresh LDAPConnection and -since then- it is stable.

Will it help you if I try to use synchronous mode? We are using async operations at some places and using new LDAPConnection objects all the time is an acceptable workaround. So I would only dare to try sync mode if that helps you somehow.

@dirmgr
Copy link
Collaborator

dirmgr commented Jun 15, 2022

Thanks for the update. It may well have something to do with the use of asynchronous operations, especially if there are still any outstanding operations in progress when attempting to close the connection. I'll continue looking into this.

For now, your approach of simply creating a new connection object is probably the best one to use. There's no need to test with synchronous mode if you need asynchronous operations.

@dirmgr
Copy link
Collaborator

dirmgr commented Jun 15, 2022

When testing with asynchronous operations, I was able to reproduce an occasional failure every few thousand attempts or so. I've just committed a couple of changes, and with those changes in place, I've not been able to reproduce that failure.

But if you've got the changes in place to use new connections rather than to keep using the same connection object, I'd recommend keeping the new behavior.

@david0
Copy link
Author

david0 commented Jun 20, 2022

I tested your commit from master with out old code again, unfortunately still I get the server down.

Looking deeper at our code I can now say that we only use async*-operations at one special place with a dedicated connection. Its does not share LDAPConnection objects with the place where we get the server downs.

@totaljfb
Copy link

I got the same error message and this is 100% replicable in my test case.
LDAPException(resultCode=81 (server down), numEntries=0, numReferences=0, errorMessage='The connection is not established.', ldapSDKVersion=6.0.5, revision=b50104702e7cc07334a9c64953a91f3e6dbdb27c')

The workaround is to establish a new LDAP connection, then everything works.

@dirmgr I may have a chance to verify it if you can let me know how to set LDAP connection in synchronous mode.

@dirmgr
Copy link
Collaborator

dirmgr commented Jun 22, 2022

@totaljfb Could you describe your test case? Is it something that could be converted into a standalone program for testing?

To use the LDAP SDK in synchronous mode, you need to use the LDAPConnectionOptions.setUseSynchronousMode method, and then provide that LDAPConnectionOptions object to the connection when creating it (or at least set it before the connection is established). For example:

LDAPConnectionOptions options = new LDAPConnectionOptions();
options.setUseSynchronousMode(true);

LDAPConnection conn = new LDAPConnection(
     sslUtil.createSSLSocketFactory(),
     options, host, port);

@totaljfb
Copy link

@dirmgr Thank you.

In my test case, the LDAPConnection is created at the beginning of the process, then it gets some user groups info from the LDAP Server, then it executes some other tasks and then uses the same connection object to retrieve some user info from the LDAP Server. It calls the LDAPConnection search funtion and errors happens right after that line.

Unfortunately the I cannot replicate the issue today, everything is working. I will see if have another chance to test it in the furture. I have updated the code to use a new connection though.

Thank you for your help!

@dirmgr
Copy link
Collaborator

dirmgr commented Jun 23, 2022

@totaljfb are you intentionally closing and re-establishing the connection at any point in the process, or are you just trying to continue using a connection that had previously been used and finding that it is no longer valid? If it's the latter, then that could be attributed to a few other things, like a networking issue or the server closing the connection after being idle for too long or after a maximum connection duration had been reached.

In any case, I would actually recommend using a connection pool rather than an individual connection. Even if you only need one connection, using a connection pool can provide substantial benefits. Unlike a lot of APIs, a connection pool in the UnboundID LDAP SDK for Java can be used like an individual connection (for example, you can just call pool.search to perform a search on a pooled connection rather than manually checking out a connnection, issuing a search, and releasing the connection back to the pool). Doing this has several benefits, including:

  • A connection pool can be configured to automatically retry operations that fail in a way that suggests the connection might no longer be valid. In this case, it will throw the former connection away, establish a new connection (authenticating it if appropriate), and re-send the request on that new connection.
  • A connection pool can be configured with a maximum connection age. If the server imposes a time limit on connections that are established, then this can be useful for ensuring that your connection gets periodically refreshed and replaced with a new one.
  • A connection pool can be configured to automatically perform background health checks on connections to ensure that they're still valid. Not only can this help prevent the server from seeing the connection as idle, on connections that aren't all that frequently used, it can help detect and work around problems resulting from lost connections before your application needs to send a request.
  • You can create a connection pool with a ServerSet rather than just a single server address. In the event that the primary server goes down or becomes unavailable, a connection pool can switch to a different server.

@totaljfb
Copy link

@dirmgr The answer to your question, it's the latter. I just wanted to reuse the LDAPConenction object which was created at the beginning of the process. Your analysis of the root cause makes sense to me. And, thanks for the suggestion for using connection pool as a best practice, I appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants