Possible connection leak in ClusterBlockingConnectionPool #458

FranGM · 2021-06-02T15:02:56Z

Hey there!

I've been tracking down an issue that I believe could have been caused (or at least made worse) by a possible connection leak in ClusterBlockingConnectionPool. Let me try to explain my theory to see if it makes sense.

In get_connection_by_node we pull connections from the queue one by one until we find a connection already established to this node or a None (free slot for a connection we can make).

My concern is with the logic that follows if we run out of items to pull from the queue:

            # queue is full of connections to other nodes
            if len(connections_to_other_nodes) == self.max_connections:
                # is the earliest released / longest un-used connection
                connection_to_clear = connections_to_other_nodes.pop()
                self._connections.remove(connection_to_clear)
                connection_to_clear.disconnect()
                connection = None  # get a new connection
            else:
                # Note that this is not caught by the redis cluster client and will be
                # raised unless handled by application code.
                # ``ConnectionError`` is raised when timeout is hit on the queue.
                raise ConnectionError("No connection available")

If we have pulled exactly max_connections from the pool and all of them are for different nodes then we repurpose one of those connections, otherwise we raise an Exception. Note that when raising this exception we don't put any connections we might have pulled back into the pool.

I think the logic on
if len(connections_to_other_nodes) == self.max_connections:
is a bug because it implies that all connections are available inside the queue at all times, but connections that are currently being used won't be in the pool, which means there are cases when you have less than max_connections elements in the queue (most of the time really), so we will potentially raise an exception when we have connections we could repurpose.

I'm currently testing a fix that would change that line with if len(connections_to_other_nodes) >0: but wanted to hear your opinion on this hypothesis before submitting a PR.

The text was updated successfully, but these errors were encountered:

Grokzen · 2021-06-02T15:27:16Z

@FranGM I could not say for sure i was not the one who made the blockingconnection pool implementation so to even say anything about it i would have to dig into that code and understand it to be able to answer your question really. I have never really been a fan of this shared style of connection pool as it cause these kinds of issues and other, In 3.0 this kind of error should maybe be gone, not sure with blocking solultion thos, but each node will get their individual connectionpool solution again so each node should sort itself out in the exact same way that redis-py does already so there will be even less reimplementation of code needed which i like much more.

FranGM · 2021-07-19T11:59:13Z

Hey @Grokzen , sorry for the radio silence, been busy lately.

Yeah that makes sense, we've had our own fix running in production for a while so I've gone ahead and made a PR for it (along with a couple other things). We've been running our own internally built version of redis-py-cluster but we'd love to go back to an unpatched version if possible.

FranGM linked a pull request Jul 19, 2021 that will close this issue

Stop leaking connections when pool is full #473

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible connection leak in ClusterBlockingConnectionPool #458

Possible connection leak in ClusterBlockingConnectionPool #458

FranGM commented Jun 2, 2021

Grokzen commented Jun 2, 2021

FranGM commented Jul 19, 2021

Possible connection leak in ClusterBlockingConnectionPool #458

Possible connection leak in ClusterBlockingConnectionPool #458

Comments

FranGM commented Jun 2, 2021

Grokzen commented Jun 2, 2021

FranGM commented Jul 19, 2021