-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Move/Example] Run tests in parallel #18802
Conversation
## Description Run the Move example tests in parallel, so that they don't timeout and ruin everyone's day. ## Test plan ``` sui-framework-test$ cargo nextest run -- run_examples_move_unit_tests ```
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
|
||
futures::future::join_all(move_packages.into_iter().map(|p| { | ||
tokio::task::spawn(async move { | ||
check_package_builds(&p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around with splitting this up, but it only shaved about a second of the run-time, so thought it would be clearer to do it as one step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any async here, why use this over spawning threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact this is an anti-pattern, executing synchronous work in an async threadpool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, there is nothing async here. I ended up implementing it this way because it seemed like the easiest way to take advantage of a multi-threaded pool (to limit the number of threads being spawned). If you can point me to a better way of doing that, I'll do that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.rs/rayon/latest/rayon/iter/index.html
Does this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just dogscience-ing my way through life here, so I can be convinced that this is the right thing to do. My impression was that rayon
was primarily intended to introduce parallelism into functional stream-based programs. It looks like it has a thread pool library, and we could use that, but I can't really make out what the moral difference is between abusing rayon
for its thread pool implementation vs abusing tokio
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll admit that in this exact instance its not a huge issue to abuse tokio
like this (because nothing else is sharing this particular runtime) but it is a very bad anti-pattern and if was done in production code could cause bad things to happen.
You can essentially break computation down into two camps: io bound (async) and cpu bound (sync) work. for io bound tasks, the idea is that lots of independent tasks can operate concurrently on a single (or multiple) thread, relinquishing control of execution back to the schedule when the task needs to wait for some io operation to happen. cpu bound tasks generally operate in a way such that they monopolize the thread they are scheduled on until the entire task is complete.
rayon
is specifically designed to handle and schedule cpu bound work while tokio
actually has two thread pools, a blocking
pool for scheduling cpu bound work and an async
pool (where things go when tokio::spawn
d) for async/io bound tasks.
If you schedule blocking/cpu bound work on an async pool then other async tasks could get stuck behind them waiting for a very long time till they're able to make any forward progress.
All of this aside, did you take a look at using datatest-stable
like we use for a number of other file-based tests such that each move example would easily be their own test with little to no care to those who are working or adding examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think datatest-stable
would be a great fit here. I will put up a separate PR for that, but will keep this one around in case the timeouts do cause a problem, because I haven't had to set-up a fresh data test before, so I don't know how long it will take me, and I don't want people to be blocked if it takes a little while.
The reason I wasn't too hot on rayon
in this case was that
- although the test is not
async
, it's also not CPU bound -- it's I/O bound but written to use blocking I/O, rayon
really seems geared towards running a computation to produce a certain result, while these tests are purely running for their effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although the test is not async, it's also not CPU bound -- it's I/O bound but written to use blocking I/O,
Yes, i would generally put "blocking I/O" in the same camp as CPU bound tasks as they block the current thread until the operation is complete vs handing it off to something else to attempt to make progress on.
## Description Use `datatest-stable` to find all the Move examples we might want to build and test, instead of stashing this away in a rust test. ## Test plan ``` sui$ cargo nextest run -p sui-framework-tests --test move_tests ``` + CI Closes #18802 --- ## Release notes Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [ ] GraphQL: - [ ] CLI: - [ ] Rust SDK: - [ ] REST API:
Description
Run the Move example tests in parallel, so that they don't timeout and ruin everyone's day.
Test plan
Locally, tests used to run in 180s, after this change it takes about 30 seconds.
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.