Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Demo] Nested epoll fd avoid to call add_interest, because there is a global kernel lock when doing epoll_ctl #388

Conversation

beef9999
Copy link
Collaborator

@beef9999 beef9999 commented Mar 2, 2024

Cascading engine has a performance issue when used in multi-thread program.

Its wait_for_events will firstly call wait_for_fd, and then add_interest with one-shot.

For epoll engine, the kernel epoll_ctl will compete in multi-threads to acquire one global mutex, if the fd it watches on is epoll fd. See https://elixir.bootlin.com/linux/v5.15.125/source/fs/eventpoll.c#L2130

io_uring engine doesn't have this problem.

According to my observation, in a 24 threads program, the lock acquisition can consume as much as 80% CPU workload, which is totally unacceptable.

The solution is to use multi-shot poll to replace those epoll_ctl, so we need to keep tracks on the epoll fd.

After applying this demo changes, the osq_lock has disappered. You can run this demo in your environment.

RuiKub7YK1

Ym4sSChYLt

The demo has used a limited size of arrays (8 elements) to store fd, instead of map. Comparisons are also made.

Search Time (nanoseconds) array std::map std::unordered_map
Result at index 1 <1 2.8 9.6
Result at index 4 1.8 4.6 9.7
Result at index 8 3.8 5.6 9.7

The idea is that we should only allow a small number of nested epoll fd to be registered, and use array to reduce overhead in the I/O path.

Finally, this is just a demo. We need formal patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant