diff --git a/docs/contributing/contributing.rst b/docs/contributing/contributing.rst index 450a6439..ab3859a7 100644 --- a/docs/contributing/contributing.rst +++ b/docs/contributing/contributing.rst @@ -1,3 +1,5 @@ +.. _contributing: + ============ Contributing ============ diff --git a/docs/examples/deployment/_web-server.png b/docs/examples/deployment/_web-server.png deleted file mode 100644 index 20b85af4..00000000 Binary files a/docs/examples/deployment/_web-server.png and /dev/null differ diff --git a/docs/examples/deployment/aws.md b/docs/examples/deployment/aws.md deleted file mode 100644 index 99869df3..00000000 --- a/docs/examples/deployment/aws.md +++ /dev/null @@ -1,101 +0,0 @@ -# AWS Lambda - -[AWS Lambda](https://aws.amazon.com/lambda/) - serverless computation service in AWS. - -Here we have an example how to deploy "hello-world" AWS Lambda with a simple Burr application. -This example is based on the official instruction: https://docs.aws.amazon.com/lambda/latest/dg/python-image.html#python-image-instructions - -Burr can be deployed within a Lambda function. This is a good option if you want to run your -application in response to events, or if you want to run your application in a serverless environment. - -See the [repository on GitHub](https://github.com/DAGWorks-Inc/burr/tree/main/examples/deployment/aws/lambda) for the full code example and for step-by-step instructions on how to deploy a Burr application to AWS Lambda. - -## Prerequisites - -- **AWS CLI Setup**: Make sure the AWS CLI is set up on your machine. If you haven't done this yet, no worries! You can follow the [Quick Start guide](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html) for easy setup instructions. - -## Step-by-Step Guide - -### 1. Build Docker image: - -- **Build Docker image for deploy in AWS ECR** - - ```shell - docker build --platform linux/amd64 -t aws-lambda-burr . - ``` - -- **Local tests:** - - Run Docker container: - - ```shell - docker run -p 9000:8080 aws-lambda-burr - ``` - - Send test request to check if Docker container executes it correctly: - - ```shell - curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{"body": {"number":3}}' - ``` - -### 2. Create AWS ECR repository: - -Ensure the AWS account number (`111122223333`) is correctly replaced with yours: - -- **Authenticate Docker to Amazon ECR**: - - Retrieve an authentication token to authenticate your Docker client to your Amazon Elastic Container Registry (ECR): - - ```shell - aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 111122223333.dkr.ecr.us-east-1.amazonaws.com - ``` - -- **Create the ECR Repository**: - - ```shell - aws ecr create-repository --repository-name aws-lambda-burr \ - --region us-east-1 \ - --image-scanning-configuration scanOnPush=true \ - --image-tag-mutability MUTABLE - ``` - -### 3. Deploy the Image to AWS ECR - -Ensure the AWS account number (`111122223333`) is correctly replaced with yours: - -```shell -docker tag aws-lambda-burr 111122223333.dkr.ecr.us-east-1.amazonaws.com/aws-lambda-burr:latest -docker push 111122223333.dkr.ecr.us-east-1.amazonaws.com/aws-lambda-burr:latest -``` - -### 4. Create a simple AWS Lambda role: - -Example of creating an AWS Role for Lambda execution: - -```shell -aws iam create-role \ - --role-name lambda-ex \ - --assume-role-policy-document '{"Version": "2012-10-17","Statement": [{ "Effect": "Allow", "Principal": {"Service": "lambda.amazonaws.com"}, "Action": "sts:AssumeRole"}]}' -``` - -### 5. Create AWS Lambda - -Ensure the AWS account number (`111122223333`) is correctly replaced with yours: - -```shell -aws lambda create-function \ - --function-name aws-lambda-burr \ - --package-type Image \ - --code ImageUri=111122223333.dkr.ecr.us-east-1.amazonaws.com/aws-lambda-burr:latest \ - --role arn:aws:iam::111122223333:role/lambda-ex -``` - -### 6. Test AWS Lambda - -```shell -aws lambda invoke \ - --function-name aws-lambda-burr \ - --cli-binary-format raw-in-base64-out \ - --payload '{"body": {"number": 5}}' \ - response.json -``` diff --git a/docs/examples/deployment/index.rst b/docs/examples/deployment/index.rst deleted file mode 100644 index 7fb0c272..00000000 --- a/docs/examples/deployment/index.rst +++ /dev/null @@ -1,9 +0,0 @@ -============= -✈ Deployment -============= - -.. toctree:: - :maxdepth: 2 - - web-server - aws diff --git a/docs/examples/deployment/infrastructure.rst b/docs/examples/deployment/infrastructure.rst new file mode 100644 index 00000000..be4d7ecc --- /dev/null +++ b/docs/examples/deployment/infrastructure.rst @@ -0,0 +1,8 @@ +------------------------------------- +Provisioning Infrastructure/Deploying +------------------------------------- + +Burr is not opinionated about the method of deployment/cloud one uses. Any method that can run a python server will work +(AWS, vercel, etc...). Note we aim to have more examples here -- see `this issue `_ to track! + +- `Deploying Burr in an AWS lambda function `_ diff --git a/docs/examples/deployment/monitoring.rst b/docs/examples/deployment/monitoring.rst new file mode 100644 index 00000000..4138b6af --- /dev/null +++ b/docs/examples/deployment/monitoring.rst @@ -0,0 +1,24 @@ +------------------------ +Monitoring in Production +------------------------ + +Burr's telemetry UI is meant both for debugging and running in production. It can consume `OpenTelemetry traces `_, +and has a suite of useful capabilities for debugging Burr applications. + +It has two (current) implementations: + +1. `Local (filesystem) tracking `_ (default, for debugging or lower-scale production use-cases with a distributed file-system) +2. `S3-based tracking `_ (meant for production use-cases) + +Which each come with an implementation of data storage on the server. + +To deploy these in production, you can follow the following examples: + +1. `Burr + FastAPI + docker `_ by `Matthew Rideout `_. This contains a sample API + UI + tracking server all bundled in one! +2. `Docker compose + nginx proxy `_ by `Aditha Kaushik `_ for the email assistant example, demonstrates running the docker image with the tracking server. + +We also have a few issues to document deploying Burr's monitoring system in production: + +- `deploy on AWS `_ +- `deploy on GCP `_ +- `deploy on Azure `_ diff --git a/docs/examples/deployment/web-server.md b/docs/examples/deployment/web-server.md deleted file mode 100644 index 495dad11..00000000 --- a/docs/examples/deployment/web-server.md +++ /dev/null @@ -1,224 +0,0 @@ -# Web service (FastAPI, Flask, Django, etc.) - -Burr is meant to run interactive apps. This means running it as part of a web-service that -responds to requests, manages state, and documents its capabilities. The interactive nature of Burr -(moving in/out of programmatic control) means we want to think carefully about how to expose -our Burr applications to the web. Burr makes it natural to integrate with a web-server such as FastAPI. - -In this tutorial we will use the [email assistant example](https://github.com/DAGWorks-Inc/burr/tree/main/examples/email-assistant) as a walk-through. -Our goal is to expose the email assistant in a web-server that a UI can easily be built on top of. -While we will not be building the UI here, we will link out to the final product for you to explore. - -## Email Assistant - -The email assistant is an example of a "human-in-the-loop" generative AI application. This means that -it requires human assistance at multiple points to build a better product. - -### Running the example - -If you want to get a sense for how this looks, open the burr UI: - -```bash -pip install "burr[start]" -burr -``` - -Then navigate to the email assistant via http://localhost:7241/demos/email-assistant, - -You can create a new "application" and see it run through, with the telemetry on the right side. - -### Conceptual Model -At a high-level, the email assistant does the following: - -1. Accepts an email + instructions to respond -2. Comes up with a set of clarifying questions (if the LLM deems it required) -3. Using the answer to those questions, generates a draft -4. Accepts feedback to that draft and generates another one, repeating until the user is happy -5. Returns the final draft - -Due to the stochastic, often complex nature of LLMs, this has been shown to be one of the most promising -applications -- a collaboration between humans and AI to quickly build high-quality responses. - -### Modeling with Burr -This is a brief overview, for a more in-depth look at the email assistant, see the [email assistant example](https://github.com/DAGWorks-Inc/burr/tree/main/examples/email-assistant). -To model our email assistant with burr, we can use the following diagram: - -![Modeling](./_web-server.png) - -There are three points at which the user can interact: -1. `process_input`: This is where the user inputs the email and instructions -2. `clarify_instructions`: The LLM has created a set of clarification questions -3. `process_feedback`: The user has provided feedback on the draft - -(3) repeats until the user is happy with the draft (in our implementation this occurs when the feedback they provide is empty) - -Recall that we use the word "application" in Burr to refer to an instance of this process above -(with persisted state). - -You can see the full application in [application.py](https://github.com/DAGWorks-Inc/burr/tree/main/examples/email-assistant/application.py). - -## Integrating in a web server - -For this example we will use [FastApi](https://fastapi.tiangolo.com/) and [pydantic](https://docs.pydantic.dev/latest/), -but it should work with any other web-stack that uses python. - -### Endpoints - -We construct the following endpoints: - -1. `POST /create`: This will create a new application and return the ID -2. `PUT /initialize_draft/{id}/`: This calls out to `process_input`, passing in the email and instructions -3. `PUT /clarify_instructions/{id}`: This will gives answers back to the LLM -4. `PUT /process_feedback/{id}`: This will give feedback back to the LLM -5. `GET /{id}/state`: This will return the current state of the application - -The `GET` endpoint allows us to get the current state of the application -- this enables -the user to reload if they quit the browser/get distracted. Each of these endpoints will return the full state of the application, -which can be rendered on the frontend. Furthermore, it will indicate the next API endpoint -we call, which allows the UI to render the appropriate form and - -Using FastAPI + Pydantic, this becomes very simple to implement. First, let's add a utility to -get the `application` object. This will use a cached version or instantiate it: - -```python -@functools.lru_cache(maxsize=128) -def _get_application(app_id: str) -> Application: - app = email_assistant_application.application(app_id=app_id) - return app -``` - -All this does is call to our function `application` in `email_assistant` that -recreates the application. We have not included the `create` function here, -but it calls out to the same API. - -### Data Model - -Let's then define a pydantic model to represent the state, and the app object in FastAPI: -```python - -class EmailAssistantState(pydantic.BaseModel): - app_id: str - email_to_respond: Optional[str] - response_instructions: Optional[str] - questions: Optional[List[str]] - answers: Optional[List[str]] - drafts: List[str] - feedback_history: List[str] - final_draft: Optional[str] - # This stores the next step, which tells the frontend which ones to call - next_step: Literal["process_input", "clarify_instructions", "process_feedback", None] - - @staticmethod - def from_app(app: Application): - # implementation left out, call app.state and translate to pydantic model - # we can use `app.get_next_action()` to get the next step and return it to the user - ... -``` - -### Execution - -Next, we can run through to the next step, starting from any point: - -```python -def _run_through(project_id: str, app_id: Optional[str], inputs: Dict[str, Any]) -> EmailAssistantState: - email_assistant_app = _get_application(project_id, app_id) - email_assistant_app.run( # Using this as a side-effect, we'll just get the state aft - halt_before=["clarify_instructions", "process_feedback"], - halt_after=["final_result"], - inputs=inputs, - ) - return EmailAssistantState.from_app(email_assistant_app) -``` - -We `halt_before` the steps that require user instructions, and `halt_after` -the final result. This allows us to get the state after each step. - -Finally, we can define our endpoints. For instance: - -```python -@router.post("/provide_feedback/{id}") -def provide_feedback(project_id: str, app_id: str, feedback: Feedback) -> EmailAssistantState: - return _run_through(project_id, app_id, dict(feedback=feedback.feedback)) -``` - -This represents a simple but powerful architecture. We can continue calling these endpoints -until we're at a "terminal" state, at which point we can always ask for the state. -If we decide to add more input steps, we can modify the state machine and add more input steps. -We are not required to hold state in the app (it is all delegated to Burr's persistence), -so we can easily load up from any given point, allowing the user to wait for seconds, -minutes, hours, or even days before continuing. - -As the frontend simply renders based on the current state and the next step, it will always -be correct, and the user can always pick up where they left off. With Burr's telemetry capabilities -they can debug any state-related issues, ensuring a smooth user experience. - -### Persistence - -Note that we never called out to databases. It all just magically worked.. This is all because we decouple the persistence -layer from the web-call. The application will be persisted (to whatever database you want), -by burr's plugin capabilities -- read more [here](https://burr.dagworks.io/concepts/state-persistence/). -This greatly reduces the amount you have to think about when developing. As Burr persistence is -pluggable, you can write to your own database with whichever schema you prefer, customizing -the schema for your project or using a generic one (state is just a JSON object -- you can easily serialize/deseriealize it). - -### Additional concerns - -#### Scaling - -But [is this webscale](https://www.youtube.com/watch?v=b2F-DItXtZs)? As anything, it depends on how you implement it. -Two factors determine the scalability of this system: - -1. database layer -- can the database support the volume of inputs/outputs? -2. compute layer -- can the server run fast enough to keep up with the users? - -For the database layer, it depends largely on the underlying database, as well as the -schema you use. That said, Burr makes it easier due to natural partitioning of the data -into `application_id` and `partition_key` (the latter could be the `user ID`), allowing common -operations (such as _give me all applications for X user_ and _give me the state of application Y_) -simple if you index your state table on the application ID and `partition_key`. - -For the compute layer, you can simply scale horizontally. The only tricky aspect is ensuring state synchronization -and locking. As we cached the application object, we could potentially get into a position -in which the state is out of sync. To solve this, you can either: - -1. Use a locking method (either in the database) to ensure that only one server is running a given application at any point -2. Use sticky sessions/sharding to ensure that a given user always hits the same server -3. Handle forking/resolution of state at the persistence layer with a custom implementation - -Or possibly some combination of the above. - -#### Async - -While we implemented synchronous calls, you can easily make these async by using `async def` and `await` in the appropriate places, -and using the `arun` method in Burr. Read more about async capabilities in [applications](https://burr.dagworks.io/concepts/state-machine/), -and [actions](https://burr.dagworks.io/concepts/actions/). - -#### Streaming - -You can use streaming to send back the stream of the output at any given point. You do this by creating a -[streaming action](https://burr.dagworks.io/concepts/streaming-actions/). You can then integrate with the -streaming respose in FastAPI to send back the stream of the output. You can do this with any steps -(intermediate or final) in your application. - -#### Authentication/Data access - -While Burr does not operate at the data access layer, this can be easily handles at the application layer. -Any authentication system will tell you the user ID, which you can look in your DB to determine access -to your partition key. - -## Wrap-up - -In this tutorial we showed how to integrate Burr into a web-server. We used FastAPI and Pydantic -to create a simple but powerful API that allows users to interact with the email assistant, leveraging -Burr's persistence capabilities to ensure that the user can always pick up where they left off. - -At a high-level, the real value of representing your application as a state machine (as Burr does) -is that it all becomes easier to think about. You don't have to conceptually model state persistence, -dataflow, and the web infrastructure in one piece -- they can all be built separately. - -In the future we will be automating this process, allowing you to generate a FastAPI app from the Burr application. - -For now though, you can find the resources for the current implementation: -- [application.py](https://github.com/DAGWorks-Inc/burr/tree/main/examples/email-assistant/application.py) -- [server.py](https://github.com/DAGWorks-Inc/burr/tree/main/examples/email-assistant/server.py) -- [ui](https://github.com/DAGWorks-Inc/burr/tree/main/telemetry/ui/src/examples/EmailAssistant.tsx) -- this uses [ReactQuery](https://tanstack.com/query/latest/docs/framework/react/overview) to call the API and [react](https://react.dev/) to render the state. diff --git a/docs/examples/deployment/web-server.rst b/docs/examples/deployment/web-server.rst new file mode 100644 index 00000000..a139db49 --- /dev/null +++ b/docs/examples/deployment/web-server.rst @@ -0,0 +1,22 @@ +-------------------- +Burr in a web server +-------------------- + +We largely use `fastAPI `_ as our web server, but Burr can work with any python-friendly server framework +(`django `_, `flask `_, etc...). + +To run Burr in a FastAPI server, see the following examples: + +- `Human in the loop FastAPI server `_ (`TDS blog post `__ ) +- `OpenAI-compatible agent with FastAPI `_ +- `Streaming server using SSE + FastAPI `_ (`TDS blog post `__ ) +- `Use typed state with Pydantic + FastAPI `_ + +Connecting to a database +------------------------ + +To connect Burr to a database, you can use one of the provided persisters, or build your own: + +- :ref:`Documentation on persistence ` +- :ref:`Set of available persisters ` +- `Simple chatbot intro with persistence to SQLLite `_