Skip to content
This repository has been archived by the owner on Jan 9, 2024. It is now read-only.

Bunch of exceptions occured when scan_iter() method is called during failover #389

Open
ofhellsfire opened this issue Sep 16, 2020 · 1 comment
Labels
3.0.0 All issues that will be looked at for 3.0.0 release

Comments

@ofhellsfire
Copy link

Issue Description:
Bunch of exceptions occured (redis.exceptions.ConnectionError is one of those) when scan_iter() method is called during failover.

Scenario

Env

  • Linux OS
  • Redis Cluster: 3 master + 3 slave nodes that runs via Docker-Compose locally
  • python 3.6.12
  • redis-py version: 3.5.3
  • redis-py-cluster: 2.1.0

Steps to reproduce

Expected result
Script proceeds without exceptions.

Actual Result
Script gets stuck and there are a bunch of exceptions are raised.

Output

...
Key: 8: 2020-09-15 17:16:54.135722
Key: 13: 2020-09-15 17:16:54.135748
Key: 49: 2020-09-15 17:16:54.135776
Traceback (most recent call last):
  File "rediscluster_failover_scan_iter_test.py", line 25, in <module>
    for key in rc.scan_iter(match='*', count=10):
  File "/home/venv/lib/python3.6/site-packages/rediscluster/client.py", line 969, in scan_iter
    raw_resp = conn.read_response()
  File "/home/venv/lib/python3.6/site-packages/redis/connection.py", line 739, in read_response
    response = self._parser.read_response()
  File "/home/venv/lib/python3.6/site-packages/redis/connection.py", line 324, in read_response
    raw = self._buffer.readline()
  File "/home/venv/lib/python3.6/site-packages/redis/connection.py", line 256, in readline
    self._read_from_socket()
  File "/home/venv/lib/python3.6/site-packages/redis/connection.py", line 201, in _read_from_socket
    raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
redis.exceptions.ConnectionError: Connection closed by server.
Key: 45: 2020-09-15 17:16:54.139949
...
Key: 49: 2020-09-15 17:16:54.140345
Traceback (most recent call last):
  File "/home/venv/lib/python3.6/site-packages/redis/connection.py", line 559, in connect
    sock = self._connect()
  File "/home/venv/lib/python3.6/site-packages/redis/connection.py", line 615, in _connect
    raise err
  File "/home/venv/lib/python3.6/site-packages/redis/connection.py", line 603, in _connect
    sock.connect(socket_address)
OSError: [Errno 113] No route to host

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "rediscluster_failover_scan_iter_test.py", line 25, in <module>
    for key in rc.scan_iter(match='*', count=10):
  File "/home/venv/lib/python3.6/site-packages/rediscluster/client.py", line 967, in scan_iter
    conn.send_command(*pieces)
  File "/home/venv/lib/python3.6/site-packages/redis/connection.py", line 726, in send_command
    check_health=kwargs.get('check_health', True))
  File "/home/venv/lib/python3.6/site-packages/redis/connection.py", line 698, in send_packed_command
    self.connect()
  File "/home/venv/lib/python3.6/site-packages/redis/connection.py", line 563, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 113 connecting to 172.19.0.5:6379. No route to host.
Key: 45: 2020-09-15 17:18:01.548966
...
@ofhellsfire
Copy link
Author

@Grokzen Just in case I'm sharing how we fixed/patched this issue for ourselves temporarily:

    def scan_iter(self, match=None, count=None, _type=None):
        success_flg = False
        retry_count = -1
        while (not success_flg and retry_count < self.cluster_down_retry_attempts):
            try:
                yield from self._scan_iter(match, count, _type)
                success_flg = True
            except ConnectionError:
                self.connection_pool.disconnect()
                self.connection_pool.nodes.reset()
                retry_count += 1
                if retry_count < self.cluster_down_retry_attempts:
                    time.sleep(self.cluster_down_retry_timeout)
                else:
                    raise

...
    # original scan_iter()
    def _scan_iter(self, match=None, count=None, _type=None):
        ...

I know this is not optimal (you've written about it somewhere), but we just needed a quick fix since we cannot afford the redis client failure from one side, from the other side handling failure in the app code would be not optimal for the long run.

@Grokzen Grokzen added the 3.0.0 All issues that will be looked at for 3.0.0 release label Sep 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
3.0.0 All issues that will be looked at for 3.0.0 release
Projects
None yet
Development

No branches or pull requests

2 participants