Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.
Stable:
pip install FastServeAI
Latest:
pip install git+https://github.com/gradsflow/fastserve-ai.git@main
YouTube: How to serve your own GPT like LLM in 1 minute with FastServe.
To serve a custom model, you will have to implement handle
method for FastServe
that processes a batch of inputs and
returns the response as a list.
from fastserve import FastServe
class MyModelServing(FastServe):
def __init__(self):
super().__init__(batch_size=2, timeout=0.1)
self.model = create_model(...)
def handle(self, batch: List[BaseRequest]) -> List[float]:
inputs = [b.request for b in batch]
response = self.model(inputs)
return response
app = MyModelServing()
app.run_server()
You can run the above script in terminal, and it will launch a FastAPI server for your custom model.
python fastserve.deploy.lightning --filename main.py \
--user LIGHTNING_USERNAME \
--teamspace LIGHTNING_TEAMSPACE \
--machine "CPU" # T4, A10G or A10G_X_4
Install in editable mode:
git clone https://github.com/gradsflow/fastserve-ai.git
cd fastserve
pip install -e .
Create a new branch
git checkout -b <new-branch>
Make your changes, commit and create a PR.