Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in grpc ("Too many open files") #10

Open
amk43 opened this issue Jul 22, 2020 · 2 comments
Open

Crash in grpc ("Too many open files") #10

amk43 opened this issue Jul 22, 2020 · 2 comments

Comments

@amk43
Copy link

amk43 commented Jul 22, 2020

Posting here as recommended by YC support.

We use Ubuntu 18.04.3, python 3.6.9, yandexcloud 0.34.0, grpc 1.28.1.

Our application continuously starts and stops instances in YC, making no more than a few hundred API requests an hour (probably less). We ran into the problem that after running this way for some time (perhaps a couple of days) the application inevitably crashes with a stack trace like

Traceback (most recent call last):
  File "./dispatcher.py", line 86, in runInstance
    disks = ysdk.client(DiskServiceStub).List(ListDisksRequest(folder_id = CONF['folder_id'])).disks
  File "/home/ubuntu/.local/lib/python3.6/site-packages/grpc/_interceptor.py", line 221, in __call__
    compression=compression)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/grpc/_interceptor.py", line 257, in _with_call
    return call.result(), call
  File "/home/ubuntu/.local/lib/python3.6/site-packages/grpc/_channel.py", line 333, in result
    raise self
  File "/home/ubuntu/.local/lib/python3.6/site-packages/grpc/_interceptor.py", line 247, in continuation
    compression=new_compression)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/grpc/_channel.py", line 837, in with_call
    return _end_unary_response_blocking(state, call, True, None)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "Getting metadata from plugin failed with error: HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/token (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc975e55a58>: Failed to establish a new connection: [Errno 24] Too many open files',))"
        debug_error_string = "{"created":"@1590184822.768147799","description":"Getting metadata from plugin failed with error: HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/token (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc975e55a58>: Failed to establish a new connection: [Errno 24] Too many open files',))","file":"src/core/lib/security/credentials/plugin/plugin_credentials.cc","file_line":79,"grpc_status":14}"
>

Before the crash the grpc library also outputs error messages, e.g.:

E0522 22:00:22.710785667   14565 ev_epollex_linux.cc:1458]   pollset_set_add_pollset: {"created":"@1590184822.710768750","description":"Too many open files","errno":24,"file":"src/core/lib/iomgr/wakeup_fd_eventfd.cc","file_line":38,"os_error":"Too many open files","syscall":"eventfd"}
E0522 22:00:26.276082489   14563 ev_epollex_linux.cc:1306]   pollset_add_fd: {"created":"@1590184826.276050019","description":"pollset_transition_pollable_from_empty_to_fd","file":"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":325,"referenced_errors":[{"created":"@1590184826.276048606","description":"get_fd_pollable","file":"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":325,"referenced_errors":[{"created":"@1590184826.276041326","description":"Too many open files","errno":24,"file":"src/core/lib/iomgr/wakeup_fd_eventfd.cc","file_line":38,"os_error":"Too many open files","syscall":"eventfd"}]}]}
E0522 22:00:27.901743430   14560 ev_epollex_linux.cc:1458]   pollset_set_add_pollset: {"created":"@1590184827.901723028","description":"Too many open files","errno":24,"file":"src/core/lib/iomgr/wakeup_fd_eventfd.cc","file_line":38,"os_error":"Too many open files","syscall":"eventfd"}
E0522 22:00:29.869932962   14563 ev_epollex_linux.cc:1306]   pollset_add_fd: {"created":"@1590184829.869899748","description":"pollset_transition_pollable_from_empty_to_fd","file":"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":325,"referenced_errors":[{"created":"@1590184829.869898650","description":"get_fd_pollable","file":"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":325,"referenced_errors":[{"created":"@1590184829.869897060","description":"Too many open files","errno":24,"file":"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":568,"os_error":"Too many open files","syscall":"epoll_create1"}]}]}
E0522 22:00:33.867603041   27147 ev_epollex_linux.cc:1408]   assertion failed: i != pss->pollset_count

This may be caused by a known problem in grpc. E.g. see grpc/grpc#15759 and related issues.

As a workaround, we tried setting nofile OS limit to a very high value. This results in the following behavior: over the course of several days (or weeks) average cpu load of the application grows (presumably caused by an ever-growing number of open files) until it hits 100% the app becomes completely unresponsive.

It should be noted that when using AWS EC2 SDK/cloud for instance management in an otherwise identical app under a very similar load, no issues of this kind occur. This is an indication that the problem is truly an issue in YC SDK.

@l0kix2
Copy link
Contributor

l0kix2 commented Feb 11, 2022

Is the issue still relevant?
Can you provide minimal example, which can reproduce the problem?

@amk43
Copy link
Author

amk43 commented Feb 17, 2022

Hi! I cannot test this on the latest version right now. We are currently using an old version (0.60.0) and the issue is still there.
I am pretty sure this is caused by a gRPC issue, which does not appear do be fixed. See e.g. grpc/grpc#20418
I will get back to you if I can confirm this on an up to date version of yandexcloud.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants