Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Blog] AsyncBril #424

Open
wants to merge 1 commit into
base: 2023fa
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
328 changes: 328 additions & 0 deletions content/blog/2023-12-11-AsyncBril.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,328 @@
+++
title = "AsyncBril: Enhancing Bril with Asynchronous Programming"
[[extra.authors]]
name = "Andy He"
[[extra.authors]]
name = "Emily Wang"
[[extra.authors]]
name = "Evan Williams"
[extra]
bio="""
Andy He is a junior studying CS and Math. He is particularly interested in systems and compilers development.

Emily Wang is an M.Eng student studying CS. She is very interested in systems, architecture, and compilers.

Evan Williams is an M.Eng student studying CS. He is broadly interested in distributed systems and compilers.
"""
+++
# AsyncBril: Enhancing Bril with Asynchronous Programming and Thread-Style Features
By Andy He, Emily Wang, and Evan Williams
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your names display at the top of the post when rendered, so no need to duplicate them here.


## Overview and Motivation

In the final lecture of CS 6120, we talked about concurrency and parallelism and
the challenges they raise for compiler developers in the modern era of
computing. In particular, we examined how the problem of semantics is
inherently embedded in shared-memory multithreading and came across a key idea
by Boehm: threads cannot be implemented as a library.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be nice to make this a link to the paper?


The end of Moore's law indicates a need for parallelism to sustain advances
in computing. Thus, our goal in this project was to implement asynchronous
programming features for Bril. Modern programming languages rely heavily on
asynchronous programming and multithreading to optimize performance and manage
complex tasks. We thought the best way to learn about how these features are
implemented in real languages was to try it for ourselves. Further, Bril's
easily extensible nature makes it ideal for experimentation. We thought that
future CS 6120 students would benefit from being able to use our library to
implement their own synchronization primitives and other asynchronous
programming features using our extension.


## Design and Implementation

In this projet, we added two main features to Bril: promises and atomic
primitive types (i.e. `atomicint`). Promises are a fundamental
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend always using a comma after "e.g." and "i.e.". https://capra.cs.cornell.edu/styleguide/#egie

part of asynchronous programming. They allow the language to handle operations
that might take an unknown amount of time to complete (like I/O operations)
without blocking the execution of the rest of the program. Atomic operations are
crucial for safe concurrency. They ensure that operations on shared data are
completed as indivisible steps, preventing race conditions where multiple
threads might try to modify the same data simultaneously.

To implement these new language features we directly modified the Rust
interpreter for Bril: `brilirs`. We chose to implement the feature in Rust for
a few reasons. First, the three of us are more experienced with Rust than we
are with TypeScript, so it seemed the natural choice for us. Second, we wanted
to take advantage of Rust's safe-concurrency features while implementing the
asynchronous extensions, since concurrency can be tricky to reason about.

We did not want to make major changes to the structure of `brilirs`, and wanted our changes to be as modular as possible, as to minimize the changes needed to extend `brilirs` in the future. A major goal of ours while designing and implementing this extension was that we wanted to ensure that future contributors did not have to understand our work to build features that do not require asynchronous programming. For instance, any type extension in Bril will automatically be supported by AsyncBril.

The heap is thread-unsafe, allowing multiple threads to concurrently read/write
to the heap without checking. This is done using Rust `unsafe` blocks.
Alternatively, we could've used a `Rw<T>` to lock/unlock the heap when threads
were modifying it, but we ultimately decided that it would be more performant
to not implement it this way and provide support for synchronization primitives,
allowing for more flexibility.

Although the interpreter's heap is not thread safe, calls to `alloc` are still
made atomically, so no thread can accidentally be assigned the same index
in `state.heap.memory` when `alloc` is called simultaneously between threads.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to assume an audience that knows about Bril but not about the Rust implementation… so your reader probably doesn't know what state.heap.memory is. Maybe use a direct description of what this is (e.g., the interpreter data structures that implement the heap)?


### Promises

We have introduced two critical syntax features to Bril: `Promise<T>` and
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We have introduced two critical syntax features to Bril: `Promise<T>` and
We have introduced two critical syntax features to Bril: `promise<T>` and

`Resolve`, significantly enhancing its capabilities in threaded execution and
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`Resolve`, significantly enhancing its capabilities in threaded execution and
`resolve`, significantly enhancing its capabilities in threaded execution and

asynchronous programming. The core of this implementation lies in how threads
are created and managed. Upon encountering a function defined with a
`Promise<T>` return type, a thread is immediately started. This allows for
concurrent function execution, enabling the language to handle complex,
asynchronous tasks more effectively. These threads have local stack and global
heap access, ensuring independent operation without interfering with each
other's state. However, they also access a global heap, which is crucial for
modifying pointers shared across threads.

Here's an example of how our notion of promises is implemented in Bril:
```
@main {
size: int = const 4;
size2: bool = const false;
prom: promise<int> = call @mod size;
prom2: promise<bool> = call @mod2 size2;
Comment on lines +90 to +91
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to do anything particular about this now, but here is a broader comment on the language design here. Unlike many async/await frameworks, you've elected to avoid any changes to function definitions and the call instruction; instead, the call instruction is "overloaded" to do two things: (1) call a function normally, and (2) spawn a new thread.

While this minimizes the surface area of the changes, it does create a bit of awkwardness in the final language. The big one I can see is that it's not possible to create or call "normal" functions that happen to manipulate promises. For example, the identity function on integer promises cannot be used:

@id(p: promise<int>): promise<int> {
  ret p;
}

In other words, because the way you define async functions is with their return type, it's not possible to write non-async functions that happen to use the promise<T> type in call cases.


Thinking about it a little bit, one possible alternative here would be to put all the async-ification into the call instruction. That is, in this alternative version, functions need not mark themselves as returning promise<T> in order to be called from a spawn. Instead, we introduce a special spawn instruction that works like call. Something like this:

@something(x: int): int {  // just a normal function!!
  y: int = add x x;
  ret y;
}

@main() {
  five: int = const 5;
  i: int = call @something five;  // normal call
  p: promise<int> = spawn @something five;  // async call
}

That is, the same function @something could be used in either kind of call (synchronous or asynchronous); we just distinguish based on the invocation rather than the definition.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think originally what I had in mind was that we could use a this as a drop in replacement for functions already implemented, so if we had a function f: int -> int, we could wrap it in a function f_promise: int -> promise. But I think your idea where we could have a dedicated async call function would've definitely been a cleaner approach to satisfy what I had in mind (and simpler to implement too, I think). Thank you for the suggestion!

x : int = resolve prom;
y : bool = resolve prom2;
print x;
print y;
}

@mod(r: int): promise<int> {
ret r;
}

@mod2(r: bool): promise<bool> {
ret r;
}
```

This Bril program exemplifies the newly implemented asynchronous features in
the language, particularly demonstrating the use of `Promise<T>` and the
`Resolve` mechanism. In the `@main` function, two promises are created: `prom`
Comment on lines +108 to +109
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I think you want to globally make Promise and Resolve lower-case to match the syntax…

and `prom2`. The `prom` is a promise of an integer (`promise<int>`) and is
assigned the result of calling the `@mod` function with an integer argument
(`size`). Similarly, `prom2` is a promise of a boolean (`promise<bool>`) and is
linked to the outcome of the `@mod2` function, which is called with a
boolean argument (`size2`).

These promises, once initialized, signify the start of asynchronous operations.
The `@mod` and `@mod2` functions, defined with return types `promise<int>` and
`promise<bool>` respectively, indicate that the function will be execute
asynchronously. Each function simply returns the passed argument, but in a
real-world scenario, these could represent more complex operations executed in
separate threads.

The use of the resolve keyword in the `@main` function is critical. `resolve`
is blocking and waits for the promise to be resolved to a value, synchronizes
the asynchronous function calls. The variables `x` and `y` are assigned the
resolved values of `prom` and `prom2`, respectively. The print statements at the
end of the `@main` function display these resolved values, showcasing how
asynchronous tasks can be integrated into the main flow of a Bril program.

The `Resolve` feature, synchronized with the promise mechanism, is instrumental
in managing the life cycle of asynchronous operations. It allows threads to
signal the completion of tasks and communicate results back to the main
execution flow.

Unresolved promises are automatically cleaned up when they go out of scope. This
is implemented through an atomic flag which indicates to all sub-threads to clean
up all resources used and abort. However, as this may cause undefined behavior
when interacting with the global heap, a warning is printed to indicate that a
promise was left unresolved. In general, all promises should be resolved.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be good to take a stance here and say "it is an error to leave a promise unresolved"? As it stands, I can't tell whether it's actually disallowed or just discouraged.


The introduction of these asynchronous capabilities necessitated a medium-sized
refactor of the `brilirs` interpreter, including parameterizing the types of
State to accommodate the new features. A significant enhancement is the use of
atomics in the Rust interpreter. The atomics ensure a consistent and thread-safe
mechanism to increment the dynamic count across all threads, which is crucial
for tracking and managing active threads. Overall, this implementation not only
enriches Bril with robust asynchronous programming and safe concurrency
management but also significantly extends its utility as a tool for learning
and experimenting with compiler design and parallel computing concepts.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could point out the exact parts of the rust stdlib that you used for these atomics? Do you literally mean the elementary types from std::sync::atomic? Maybe you could discuss why you chose this over std::sync::Mutex?



### Atomics

In the context of Bril, atomic primitives provide a way to perform certain
operations on integers in a way that is guaranteed to be atomic, meaning that
the operations are indivisible. This atomicity is crucial in multi-threaded
environments to prevent race conditions. Race conditions occur when two or more
threads access shared data and try to change it simultaneously. If one thread's
operations interleave with another's in an uncontrolled manner, it can lead to
inconsistent or unpredictable results. Atomics prevent this by ensuring that
operations like incrementing a value, setting it, or comparing and swapping it
happen as a single, uninterruptible step.

A particularly powerful application of atomic primitives is in implementing
mutexes (mutual exclusion locks). Mutexes are a foundational concept in
concurrent programming, used to prevent multiple threads from accessing a
critical section of code simultaneously. By using atomic primitives to implement
mutexes, you can ensure that only one thread can enter a critical section at a
time, thereby protecting shared resources from concurrent access.

In a typical mutex implementation using atomics, an atomic variable is used as
a lock flag. Before a thread enters a critical section, it uses atomic
operations to check and set this flag. If the flag indicates that the mutex
is already locked, the thread will wait (or "spin") until the flag changes.
This is often implemented using a compare-and-swap atomic operation, where the
flag is only set if it is currently in the expected unlocked state. Once the
thread has finished executing the critical section, it uses another atomic
operation to set the lock flag back to its unlocked state.

Here's an example of how atomics work in Bril:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think you could include a direct listing of all the new operations you added? Just a simple bulleted list could help complement the in-context use in the example.

```
@main {
one : int = const 1;
size : atomicint = newatomic one;
expect : int = const 1;
upd : int = const 2;
res : int = cas size expect upd;

three : int = const 3;
res1 : int = loadatomic size;
res2 : int = swapatomic size three;
res3 : int = loadatomic size;
print res;
print res1;
print res2;
print res3;
}
```

In our Bril program, we demonstrate the use of atomic operations to manage a
shared atomic integer. The program starts by initializing a regular integer
`one` with a value of 1, which is then used to create an atomic integer `size`
through the `newatomic` operation. The utilization of atomic integers is crucial
in concurrent programming environments, as it ensures that the integer can be
safely read and modified by multiple threads without causing race conditions.

We then perform a compare-and-swap (CAS) operation on the atomic integer `size`.
This atomic action is essential for achieving synchronization in multi-threaded
programs. The `cas` operation attempts to update `size` from its expected value
(`expect`, which is 1) to a new value (`upd`, which is 2). The success of this
operation, stored in `res`, hinges on whether the current value of `size`
matches the expected value, ensuring that the update occurs only if `size` has
not been altered by another thread. We print the result of this operation to
reflect its success or failure.

Furthermore, we employ the `loadatomic` operation to safely read the current
value of `size` into `res1`, ensuring a consistent and uncorrupted value.
Following this, we use the `swapatomic` operation, which atomically swaps
`size`'s value with `three` (3). The previous value of `size` is captured in
`res2`. This swap operation is particularly useful for updating a shared
variable while simultaneously retrieving its old value. Another `loadatomic`
Comment on lines +220 to +221
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by "particularly useful" here… this seems like the only thing it is useful for?

operation follows, storing the post-swap value of `size` in `res3`.

Finally, our program concludes by printing the results of these operations
(`res`, `res1`, `res2`, and `res3`). This output showcases the effects of
atomic operations on the shared atomic integer, providing a clear example of
how atomics can be used to manage shared state in a concurrent environment
without the need for complex and potentially cumbersome lock-based
synchronization mechanisms.

Elaborating more on how our atomics can implement a mutex lock and allow threads
to concurrently modify an array, here is another useful code snippet:
```
@acquire_lock(lock: atomicint) {
.loop:
res: int = loadatomic lock;
one: int = const 1;
eq: bool = eq one res;
br eq .loop .acquire_lock;
.acquire_lock:
upd: int = add res one;
old: int = cas lock res upd;
eq: bool = eq old res;
br eq .end .loop;
.end:
ret;
}

@release_lock(lock: atomicint) {
zero: int = const 0;
val: int = swapatomic lock zero;
}
```

The `@acquire_lock` function tries to acquire a lock on a shared resource
represented by `lock`, an atomic integer. The function first loads the current
value of `lock` atomically, and then loops to keep retrying to acquire the lock
if it's already being used by another thread. To attempt to acquire the lock, a
Compare-and-Swap (CAS) operation is used.

The `@release_lock` function releases the lock. It sets the lock to `0` using
an atomic swap operation, indicating that the lock is free. The use of atomic
operations ensures that these lock acquire and release operations are
thread-safe and prevent race conditions in concurrent environments.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool! I like your suite of atomic operations, and I like this demo of how to use them to build up to a more usable synchronization construct.


## Challenges
The original implementation of `brilirs` was understandably not designed with
multithreading in mind. As a result, we had to change some of the architecture
of the data structures and main function headers. To simplify the design,
`State` no longer has a parameterized lifetime, and instead uses an atomic
reference counter, `Arc<T>`, to hold the program and environment. Further, to
implement support for atomic ints, we needed a secondary state variable that
would all the instantiated atomics. We could not repurpose the heap for this, as
`Value` needs to derive the `Copy` trait, but this is not supported for Rust
`AtomicI64`. To remedy this, we added an additional data structure, modeled
after `Heap`, to store all atomics.

## Evaluation

To evaluate our extension, we primarily focused on two key attributes:
correctness and runtime. To evaluate the runtime gains from asynchronous
programming, consider the following simple benchmark: we use a loop that just
adds 1 to a variable. On the first thread it performs 2000000 iterations, on
the second thread it performs 1000000, and so on and so forth. One good example
of when a benchmark like this would be used is in multiplying two large matrices
together. Below, we show the execution time in miliseconds against the number
Comment on lines +284 to +286
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would delete the sentence about the matrix multiply… I don't think your microbenchmark/stress-test has much in common with a GEMM. 😃 It's fine for it just to be itself, i.e., a contrived microbenchmark does nothing more than seeing how fast you can issue atomics.

of threads used in the task:

![Benchmark](eval.jpg)

We observe that from 1 thread to 5 threads, we see substantial performance
improvements. This makes sense, as we are taking direct advantage of
Comment on lines +291 to +292
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I don't quite understand about this benchmark: how does it change as you scale up the number of threads? Is the amount of work (total increments) held constant, so when you have N threads you are incrementing the counter, like, 2000000/N times?

parallelization in our programs. From 5 threads to 50 threads however, there
is actually a slight increase in the execution time. This also makes sense,
as increased number of threads requires more careful resource allocation by
the operating system. Since our implementation requires locks on the global
heap, more threads means more actors vying for the locked resources,
causing a general slowdown in overall performance.

To evaluate the correctness of our programs we noted that it is not particularly
useful to use pre-existing Bril benchmarks, as none of them are written with
asynchronous features in mind. We instead took a different approach. We can look
at asynchronous Bril programs that are implemented *incorrectly* and then
observe the behavior that is different from the expected behavior. In other
words, concurrent array accesses with variables that are not atomic will lead
to undefined behavior. Likewise, when atomics are used correctly the result
of the program is deterministic and predictable. This can easily be observed
by taking the Bril program above used to showcase atomics and removing the
atomicity of the integer (i.e. make it an `int` instead of an `atomicint`).
We quickly see why the atomics are necessary, and how it guarantees the
correctness of the program.

Overall evaluating asynchronous programming features can be challenging, but
we believe that through our thorough testing within the interpreter and our
Bril benchmarks that we have created a robust extension that can be used by
many future Bril users.

## Conclusion

Concurrent programming is hard to understand, and it is important to begin
to understand how to implement concurrent programming features at the compilers
level. As we noted at the start, threads cannot be implemented as a library -
threading is inherently embedded into the semantics of the language. We gained
a lot of experience in this project with asynchronous programming and
concurrent programming features in Rust. Overall, we are excited to see this
extension be used in future iterations of CS 6120, and hope that more students
will build upon the work we've done to create many interesting optimizations
involving parallel programming in Bril.
Binary file added content/blog/eval.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.