Skip to content
This repository has been archived by the owner on Jan 9, 2024. It is now read-only.

Possible connection leak in ClusterBlockingConnectionPool #458

Open
FranGM opened this issue Jun 2, 2021 · 2 comments · May be fixed by #473
Open

Possible connection leak in ClusterBlockingConnectionPool #458

FranGM opened this issue Jun 2, 2021 · 2 comments · May be fixed by #473

Comments

@FranGM
Copy link
Contributor

FranGM commented Jun 2, 2021

Hey there!

I've been tracking down an issue that I believe could have been caused (or at least made worse) by a possible connection leak in ClusterBlockingConnectionPool. Let me try to explain my theory to see if it makes sense.

In get_connection_by_node we pull connections from the queue one by one until we find a connection already established to this node or a None (free slot for a connection we can make).

My concern is with the logic that follows if we run out of items to pull from the queue:

            # queue is full of connections to other nodes
            if len(connections_to_other_nodes) == self.max_connections:
                # is the earliest released / longest un-used connection
                connection_to_clear = connections_to_other_nodes.pop()
                self._connections.remove(connection_to_clear)
                connection_to_clear.disconnect()
                connection = None  # get a new connection
            else:
                # Note that this is not caught by the redis cluster client and will be
                # raised unless handled by application code.
                # ``ConnectionError`` is raised when timeout is hit on the queue.
                raise ConnectionError("No connection available")

If we have pulled exactly max_connections from the pool and all of them are for different nodes then we repurpose one of those connections, otherwise we raise an Exception. Note that when raising this exception we don't put any connections we might have pulled back into the pool.

I think the logic on
if len(connections_to_other_nodes) == self.max_connections:
is a bug because it implies that all connections are available inside the queue at all times, but connections that are currently being used won't be in the pool, which means there are cases when you have less than max_connections elements in the queue (most of the time really), so we will potentially raise an exception when we have connections we could repurpose.

I'm currently testing a fix that would change that line with if len(connections_to_other_nodes) >0: but wanted to hear your opinion on this hypothesis before submitting a PR.

@Grokzen
Copy link
Owner

Grokzen commented Jun 2, 2021

@FranGM I could not say for sure i was not the one who made the blockingconnection pool implementation so to even say anything about it i would have to dig into that code and understand it to be able to answer your question really. I have never really been a fan of this shared style of connection pool as it cause these kinds of issues and other, In 3.0 this kind of error should maybe be gone, not sure with blocking solultion thos, but each node will get their individual connectionpool solution again so each node should sort itself out in the exact same way that redis-py does already so there will be even less reimplementation of code needed which i like much more.

@FranGM FranGM linked a pull request Jul 19, 2021 that will close this issue
@FranGM
Copy link
Contributor Author

FranGM commented Jul 19, 2021

Hey @Grokzen , sorry for the radio silence, been busy lately.

Yeah that makes sense, we've had our own fix running in production for a while so I've gone ahead and made a PR for it (along with a couple other things). We've been running our own internally built version of redis-py-cluster but we'd love to go back to an unpatched version if possible.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants