Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Move/Example] Run tests in parallel #18802

Closed
wants to merge 1 commit into from
Closed

[Move/Example] Run tests in parallel #18802

wants to merge 1 commit into from

Conversation

amnn
Copy link
Member

@amnn amnn commented Jul 25, 2024

Description

Run the Move example tests in parallel, so that they don't timeout and ruin everyone's day.

Test plan

sui-framework-test$ cargo nextest run -- run_examples_move_unit_tests

Locally, tests used to run in 180s, after this change it takes about 30 seconds.


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:
  • REST API:

## Description
Run the Move example tests in parallel, so that they don't timeout and
ruin everyone's day.

## Test plan

```
sui-framework-test$ cargo nextest run -- run_examples_move_unit_tests
```
@amnn amnn requested review from bmwill and a team July 25, 2024 17:23
@amnn amnn self-assigned this Jul 25, 2024
Copy link

vercel bot commented Jul 25, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 25, 2024 5:26pm
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Jul 25, 2024 5:26pm
sui-kiosk ⬜️ Ignored (Inspect) Jul 25, 2024 5:26pm
sui-typescript-docs ⬜️ Ignored (Inspect) Jul 25, 2024 5:26pm


futures::future::join_all(move_packages.into_iter().map(|p| {
tokio::task::spawn(async move {
check_package_builds(&p);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with splitting this up, but it only shaved about a second of the run-time, so thought it would be clearer to do it as one step.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any async here, why use this over spawning threads?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact this is an anti-pattern, executing synchronous work in an async threadpool

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, there is nothing async here. I ended up implementing it this way because it seemed like the easiest way to take advantage of a multi-threaded pool (to limit the number of threads being spawned). If you can point me to a better way of doing that, I'll do that instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just dogscience-ing my way through life here, so I can be convinced that this is the right thing to do. My impression was that rayon was primarily intended to introduce parallelism into functional stream-based programs. It looks like it has a thread pool library, and we could use that, but I can't really make out what the moral difference is between abusing rayon for its thread pool implementation vs abusing tokio.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll admit that in this exact instance its not a huge issue to abuse tokio like this (because nothing else is sharing this particular runtime) but it is a very bad anti-pattern and if was done in production code could cause bad things to happen.

You can essentially break computation down into two camps: io bound (async) and cpu bound (sync) work. for io bound tasks, the idea is that lots of independent tasks can operate concurrently on a single (or multiple) thread, relinquishing control of execution back to the schedule when the task needs to wait for some io operation to happen. cpu bound tasks generally operate in a way such that they monopolize the thread they are scheduled on until the entire task is complete.

rayon is specifically designed to handle and schedule cpu bound work while tokio actually has two thread pools, a blocking pool for scheduling cpu bound work and an async pool (where things go when tokio::spawnd) for async/io bound tasks.

If you schedule blocking/cpu bound work on an async pool then other async tasks could get stuck behind them waiting for a very long time till they're able to make any forward progress.

All of this aside, did you take a look at using datatest-stable like we use for a number of other file-based tests such that each move example would easily be their own test with little to no care to those who are working or adding examples?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think datatest-stable would be a great fit here. I will put up a separate PR for that, but will keep this one around in case the timeouts do cause a problem, because I haven't had to set-up a fresh data test before, so I don't know how long it will take me, and I don't want people to be blocked if it takes a little while.

The reason I wasn't too hot on rayon in this case was that

  • although the test is not async, it's also not CPU bound -- it's I/O bound but written to use blocking I/O,
  • rayon really seems geared towards running a computation to produce a certain result, while these tests are purely running for their effects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#18813 -- thanks for the suggestion @bmwill !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although the test is not async, it's also not CPU bound -- it's I/O bound but written to use blocking I/O,

Yes, i would generally put "blocking I/O" in the same camp as CPU bound tasks as they block the current thread until the operation is complete vs handing it off to something else to attempt to make progress on.

@amnn amnn mentioned this pull request Jul 26, 2024
8 tasks
@amnn amnn closed this in #18813 Jul 26, 2024
@amnn amnn closed this in 3dd9ddd Jul 26, 2024
@amnn amnn deleted the amnn/fast-move-test branch August 4, 2024 13:12
suiwombat pushed a commit that referenced this pull request Sep 16, 2024
## Description
Use `datatest-stable` to find all the Move examples we might want to
build and test, instead of stashing this away in a rust test.

## Test plan

```
sui$ cargo nextest run -p sui-framework-tests --test move_tests
```

+ CI

Closes #18802 

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
- [ ] REST API:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants