Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Blog] AsyncBril #424

Open
wants to merge 1 commit into
base: 2023fa
Choose a base branch
from
Open

Conversation

evanmwilliams
Copy link
Contributor

Final project for CS 6120 with @emwangs and @he-andy

@evanmwilliams
Copy link
Contributor Author

Here's the link to our actual code: sampsyo/bril#305

https://github.com/he-andy/bril

Copy link
Owner

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really interesting idea! I like how it resulted in a design with a separate heap for explicitly allocated atomics. I think it would be even more awesome if you had gotten some actual parallel computations going for the benchmarking effort, but the implementation itself is pretty cool.

I have one major IL design suggestion inline. I'd be interested in your thoughts on this alternative.

I also note that you changed your project entirely from the original proposal in #394. Can you say anything about why and how you decided to scrap your original idea?

"""
+++
# AsyncBril: Enhancing Bril with Asynchronous Programming and Thread-Style Features
By Andy He, Emily Wang, and Evan Williams
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your names display at the top of the post when rendered, so no need to duplicate them here.

the challenges they raise for compiler developers in the modern era of
computing. In particular, we examined how the problem of semantics is
inherently embedded in shared-memory multithreading and came across a key idea
by Boehm: threads cannot be implemented as a library.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be nice to make this a link to the paper?

## Design and Implementation

In this projet, we added two main features to Bril: promises and atomic
primitive types (i.e. `atomicint`). Promises are a fundamental
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend always using a comma after "e.g." and "i.e.". https://capra.cs.cornell.edu/styleguide/#egie


Although the interpreter's heap is not thread safe, calls to `alloc` are still
made atomically, so no thread can accidentally be assigned the same index
in `state.heap.memory` when `alloc` is called simultaneously between threads.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to assume an audience that knows about Bril but not about the Rust implementation… so your reader probably doesn't know what state.heap.memory is. Maybe use a direct description of what this is (e.g., the interpreter data structures that implement the heap)?

### Promises

We have introduced two critical syntax features to Bril: `Promise<T>` and
`Resolve`, significantly enhancing its capabilities in threaded execution and
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`Resolve`, significantly enhancing its capabilities in threaded execution and
`resolve`, significantly enhancing its capabilities in threaded execution and

thread has finished executing the critical section, it uses another atomic
operation to set the lock flag back to its unlocked state.

Here's an example of how atomics work in Bril:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think you could include a direct listing of all the new operations you added? Just a simple bulleted list could help complement the in-context use in the example.

Comment on lines +220 to +221
`res2`. This swap operation is particularly useful for updating a shared
variable while simultaneously retrieving its old value. Another `loadatomic`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by "particularly useful" here… this seems like the only thing it is useful for?

The `@release_lock` function releases the lock. It sets the lock to `0` using
an atomic swap operation, indicating that the lock is free. The use of atomic
operations ensures that these lock acquire and release operations are
thread-safe and prevent race conditions in concurrent environments.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool! I like your suite of atomic operations, and I like this demo of how to use them to build up to a more usable synchronization construct.

Comment on lines +284 to +286
the second thread it performs 1000000, and so on and so forth. One good example
of when a benchmark like this would be used is in multiplying two large matrices
together. Below, we show the execution time in miliseconds against the number
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would delete the sentence about the matrix multiply… I don't think your microbenchmark/stress-test has much in common with a GEMM. 😃 It's fine for it just to be itself, i.e., a contrived microbenchmark does nothing more than seeing how fast you can issue atomics.

Comment on lines +291 to +292
We observe that from 1 thread to 5 threads, we see substantial performance
improvements. This makes sense, as we are taking direct advantage of
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I don't quite understand about this benchmark: how does it change as you scale up the number of threads? Is the amount of work (total increments) held constant, so when you have N threads you are incrementing the counter, like, 2000000/N times?

@evanmwilliams
Copy link
Contributor Author

@sampsyo For sure, there were two main reasons we pivoted from the original idea.

The first is that we weren't confident in our ability to implement re-looper with the time we had for the project. We read the paper and it was pretty gnarly to understand, and without re-looper the translations become pretty trivial so maybe the scope of the project wasn't well suited for us.

The other reason is that we were much more interested in concurrency (especially after the last lecture). We wanted to do something related to performance and lost interest in our original idea. We thought the Async idea was a cool idea way to explore this. We probably should've let you know before we pivoted - in all honesty we switched up pretty late and were heads down on development so it slipped our minds - sorry about that!

@sampsyo
Copy link
Owner

sampsyo commented Dec 18, 2023

Cool; thanks for the extra detail!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants