-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Blog] AsyncBril #424
base: 2023fa
Are you sure you want to change the base?
[Blog] AsyncBril #424
Conversation
Here's the link to our actual code: sampsyo/bril#305 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really interesting idea! I like how it resulted in a design with a separate heap for explicitly allocated atomics. I think it would be even more awesome if you had gotten some actual parallel computations going for the benchmarking effort, but the implementation itself is pretty cool.
I have one major IL design suggestion inline. I'd be interested in your thoughts on this alternative.
I also note that you changed your project entirely from the original proposal in #394. Can you say anything about why and how you decided to scrap your original idea?
""" | ||
+++ | ||
# AsyncBril: Enhancing Bril with Asynchronous Programming and Thread-Style Features | ||
By Andy He, Emily Wang, and Evan Williams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your names display at the top of the post when rendered, so no need to duplicate them here.
the challenges they raise for compiler developers in the modern era of | ||
computing. In particular, we examined how the problem of semantics is | ||
inherently embedded in shared-memory multithreading and came across a key idea | ||
by Boehm: threads cannot be implemented as a library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be nice to make this a link to the paper?
## Design and Implementation | ||
|
||
In this projet, we added two main features to Bril: promises and atomic | ||
primitive types (i.e. `atomicint`). Promises are a fundamental |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend always using a comma after "e.g." and "i.e.". https://capra.cs.cornell.edu/styleguide/#egie
|
||
Although the interpreter's heap is not thread safe, calls to `alloc` are still | ||
made atomically, so no thread can accidentally be assigned the same index | ||
in `state.heap.memory` when `alloc` is called simultaneously between threads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to assume an audience that knows about Bril but not about the Rust implementation… so your reader probably doesn't know what state.heap.memory
is. Maybe use a direct description of what this is (e.g., the interpreter data structures that implement the heap)?
### Promises | ||
|
||
We have introduced two critical syntax features to Bril: `Promise<T>` and | ||
`Resolve`, significantly enhancing its capabilities in threaded execution and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`Resolve`, significantly enhancing its capabilities in threaded execution and | |
`resolve`, significantly enhancing its capabilities in threaded execution and |
thread has finished executing the critical section, it uses another atomic | ||
operation to set the lock flag back to its unlocked state. | ||
|
||
Here's an example of how atomics work in Bril: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think you could include a direct listing of all the new operations you added? Just a simple bulleted list could help complement the in-context use in the example.
`res2`. This swap operation is particularly useful for updating a shared | ||
variable while simultaneously retrieving its old value. Another `loadatomic` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by "particularly useful" here… this seems like the only thing it is useful for?
The `@release_lock` function releases the lock. It sets the lock to `0` using | ||
an atomic swap operation, indicating that the lock is free. The use of atomic | ||
operations ensures that these lock acquire and release operations are | ||
thread-safe and prevent race conditions in concurrent environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool! I like your suite of atomic operations, and I like this demo of how to use them to build up to a more usable synchronization construct.
the second thread it performs 1000000, and so on and so forth. One good example | ||
of when a benchmark like this would be used is in multiplying two large matrices | ||
together. Below, we show the execution time in miliseconds against the number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would delete the sentence about the matrix multiply… I don't think your microbenchmark/stress-test has much in common with a GEMM. 😃 It's fine for it just to be itself, i.e., a contrived microbenchmark does nothing more than seeing how fast you can issue atomics.
We observe that from 1 thread to 5 threads, we see substantial performance | ||
improvements. This makes sense, as we are taking direct advantage of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I don't quite understand about this benchmark: how does it change as you scale up the number of threads? Is the amount of work (total increments) held constant, so when you have N threads you are incrementing the counter, like, 2000000/N times?
@sampsyo For sure, there were two main reasons we pivoted from the original idea. The first is that we weren't confident in our ability to implement re-looper with the time we had for the project. We read the paper and it was pretty gnarly to understand, and without re-looper the translations become pretty trivial so maybe the scope of the project wasn't well suited for us. The other reason is that we were much more interested in concurrency (especially after the last lecture). We wanted to do something related to performance and lost interest in our original idea. We thought the Async idea was a cool idea way to explore this. We probably should've let you know before we pivoted - in all honesty we switched up pretty late and were heads down on development so it slipped our minds - sorry about that! |
Cool; thanks for the extra detail! |
Final project for CS 6120 with @emwangs and @he-andy