Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openai triton server #104

Draft
wants to merge 66 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
acdaac1
updating with torch compile option for baseline
nnshah1 Mar 5, 2024
452c70f
updated to build model on initialization
nnshah1 Mar 5, 2024
e518923
moving torch tensorrt import
nnshah1 Mar 6, 2024
90a2266
updating dependencies
nnshah1 Apr 27, 2024
c25db8f
updates for meetup tutorials
nnshah1 Apr 29, 2024
3885dbf
updated with tutorial repl
nnshah1 Apr 29, 2024
4bec62b
updating to volume mount local directory
nnshah1 Apr 29, 2024
9b5ce1f
update to install 24.03 version of genai-perf
nnshah1 Apr 30, 2024
25692bd
updates to make versions configurable
nnshah1 Apr 30, 2024
49b526e
update with repl
nnshah1 Apr 30, 2024
53e12b9
updates with partial openai api support
nnshah1 May 4, 2024
2b27882
more elaborate openai server
nnshah1 May 4, 2024
0c8ee72
adding in initial chat completions
nnshah1 May 4, 2024
884278f
updated to exclude openai.yml taken directly https://raw.githubuserco…
nnshah1 May 5, 2024
72135a5
updated to exclude external unmodified api spec from pre-commit
nnshah1 May 5, 2024
85c9c06
moving generated files into subdirectories
nnshah1 May 5, 2024
058a6cb
initial readme
nnshah1 May 5, 2024
0414fa1
updates
nnshah1 May 5, 2024
f96e550
update to make repl installation optional
nnshah1 May 5, 2024
32b9a7a
updating to 24.04 base image for stable diffusion
nnshah1 May 5, 2024
c78767d
update with script for fastapi code gen
nnshah1 May 5, 2024
7a061ca
initial check in for codegen. use this as base for showing steps to m…
nnshah1 May 5, 2024
9b98479
added in models api
nnshah1 May 5, 2024
5c9acd2
updated with model info call
nnshah1 May 5, 2024
b051b5a
remove delete
nnshah1 May 6, 2024
67b1ab6
remove delete
nnshah1 May 6, 2024
e952195
adding support for completions api
nnshah1 May 6, 2024
680499c
disabled logging
nnshah1 May 6, 2024
de62259
incremental updates for chat completions
nnshah1 May 7, 2024
2509c34
updates for chat completion api
nnshah1 May 7, 2024
73b3ce0
update with assitant strings
nnshah1 May 7, 2024
57b5ac2
updates for trt-llm - note code is not portable between vllm and
nnshah1 May 8, 2024
b1ec52f
adding stop words
nnshah1 May 8, 2024
c1dfd0a
updates to make vllm and trt-llm work in same code base
nnshah1 May 8, 2024
e759096
updates to support both trtlm and vllm
nnshah1 May 8, 2024
25de682
adding transformer utilities to get tokenizer - may need further tweaks
nnshah1 May 8, 2024
52dc3f4
updated to return string unless decode exception
nnshah1 May 8, 2024
2b42bee
updated with metrics
nnshah1 May 8, 2024
e53e763
deleting unused files
nnshah1 May 8, 2024
0774e84
adding basic support for arg parsing
nnshah1 May 8, 2024
7228d9e
update to list both triton id and source id
nnshah1 May 9, 2024
f7181a0
matched version in route
nnshah1 May 9, 2024
e0109e6
renaming application
nnshah1 May 9, 2024
20e14a9
adding developer tool
nnshah1 May 9, 2024
7237a79
update typo
nnshah1 May 9, 2024
31a5765
updated
nnshah1 May 9, 2024
afec851
updated
nnshah1 May 9, 2024
cc72ee1
updated tag for genaiperf
nnshah1 May 14, 2024
3d96c2d
Merge branch 'nnshah1-meetup-04-2024' of https://github.com/triton-in…
nnshah1 May 14, 2024
f1ed67f
updated readme with known issues / limitations
nnshah1 May 14, 2024
be5bb9d
updating with pytorch base example
nnshah1 May 19, 2024
73244cd
Merge branch 'nnshah1-meetup-04-2024' of https://github.com/triton-in…
nnshah1 May 19, 2024
cc4d886
updating genai perf tag
nnshah1 May 19, 2024
41ef140
updated to build pytorch without torchvision
nnshah1 May 22, 2024
3573aaa
upgrade to 24.05
nnshah1 Jun 4, 2024
3556d99
update run to 24.05
nnshah1 Jun 4, 2024
246a237
update to exclude none in settng vllm request
nnshah1 Jun 4, 2024
b1653d7
Fix the parameter to tensor conversion in TRTLLM FastAPI implementati…
tanmayv25 Jul 25, 2024
0eb2bd4
moving to 24.06 as base
nnshah1 Jul 30, 2024
ce39d40
Merge branch 'nnshah1-meetup-04-2024' of https://github.com/triton-in…
nnshah1 Jul 30, 2024
d70d377
Merge branch 'nnshah1-meetup-04-2024' into openai-triton-server
nnshah1 Aug 2, 2024
db86092
updating to remove unnecessary parts
nnshah1 Aug 2, 2024
45bf9a6
updated to not rely on triton cli internals
nnshah1 Aug 2, 2024
1cb3932
typo
nnshah1 Aug 2, 2024
8402989
updated to remove pip install
nnshah1 Aug 2, 2024
c00bf15
updated for style
nnshah1 Aug 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ repos:
- id: codespell
additional_dependencies: [tomli]
args: ["--toml", "pyproject.toml"]
exclude: (?x)^(.*stemmer.*|.*stop_words.*|^CHANGELOG.md$)
exclude: (?x)^(.*stemmer.*|.*stop_words.*|^CHANGELOG.md$)||^Triton_Inference_Server_Python_API/examples/fastapi/api-spec/openai.yml$
# More details about these pre-commit hooks here:
# https://pre-commit.com/hooks.html
- repo: https://github.com/pre-commit/pre-commit-hooks
Expand All @@ -65,7 +65,7 @@ repos:
- id: check-json
- id: check-toml
- id: check-yaml
exclude: ^Deployment/Kubernetes/[^/]+/chart/templates/.+$
exclude: ^Deployment/Kubernetes/[^/]+/chart/templates/.+$|^Triton_Inference_Server_Python_API/examples/fastapi/api-spec/openai.yml$
- id: check-shebang-scripts-are-executable
- id: end-of-file-fixer
types_or: [c, c++, cuda, proto, textproto, java, python]
Expand Down
46 changes: 25 additions & 21 deletions Triton_Inference_Server_Python_API/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,18 +30,19 @@ RUN_PREFIX=
BUILD_MODELS=

# Frameworks
declare -A FRAMEWORKS=(["DIFFUSION"]=1 ["TRT_LLM"]=2 ["IDENTITY"]=3)
declare -A FRAMEWORKS=(["DIFFUSION"]=1 ["TRT_LLM"]=2 ["IDENTITY"]=3 ["VLLM"]=4)
DEFAULT_FRAMEWORK=IDENTITY

SOURCE_DIR=$(dirname "$(readlink -f "$0")")
DOCKERFILE=${SOURCE_DIR}/docker/Dockerfile


# Base Images
BASE_IMAGE=nvcr.io/nvidia/tritonserver
BASE_IMAGE_TAG_IDENTITY=24.01-py3
BASE_IMAGE_TAG_DIFFUSION=24.01-py3
BASE_IMAGE_TAG_TRT_LLM=24.01-trtllm-python-py3
BASE_IMAGE_DEFAULT=nvcr.io/nvidia/tritonserver
BASE_IMAGE_TAG_IDENTITY=24.06-py3
BASE_IMAGE_TAG_DIFFUSION=24.06-py3
BASE_IMAGE_TAG_TRT_LLM=24.06-trtllm-python-py3
BASE_IMAGE_TAG_VLLM=24.06-vllm-python-py3

get_options() {
while :; do
Expand All @@ -61,7 +62,7 @@ get_options() {
--build-models)
BUILD_MODELS=TRUE
;;
--base)
--base-image)
if [ "$2" ]; then
BASE_IMAGE=$2
shift
Expand Down Expand Up @@ -135,10 +136,20 @@ get_options() {
BASE_IMAGE_TAG=BASE_IMAGE_TAG_${FRAMEWORK}
BASE_IMAGE_TAG=${!BASE_IMAGE_TAG}
fi

if [ -z $BASE_IMAGE ]; then
BASE_IMAGE=BASE_IMAGE_${FRAMEWORK}
BASE_IMAGE=${!BASE_IMAGE}
fi

if [ -z $BASE_IMAGE ]; then
BASE_IMAGE=${BASE_IMAGE_DEFAULT}
fi

fi

if [ -z "$TAG" ]; then
TAG="triton-python-api:r24.01"
TAG="triton-python-api:r24.06"

if [[ $FRAMEWORK == "TRT_LLM" ]]; then
TAG+="-trt-llm"
Expand All @@ -148,6 +159,10 @@ get_options() {
TAG+="-diffusion"
fi

if [[ $FRAMEWORK == "VLLM" ]]; then
TAG+="-vllm"
fi

fi

}
Expand Down Expand Up @@ -186,7 +201,7 @@ get_options "$@"

if [[ $FRAMEWORK == DIFFUSION ]]; then
BASE_IMAGE="tritonserver"
BASE_IMAGE_TAG="r24.01-diffusion"
BASE_IMAGE_TAG="r24.06-diffusion"
fi

# BUILD RUN TIME IMAGE
Expand All @@ -208,7 +223,7 @@ if [[ $FRAMEWORK == DIFFUSION ]]; then
set -x
fi
$RUN_PREFIX mkdir -p backend/diffusion
$RUN_PREFIX $SOURCE_DIR/../Popular_Models_Guide/StableDiffusion/build.sh --framework diffusion --tag tritonserver:r24.01-diffusion
$RUN_PREFIX $SOURCE_DIR/../Popular_Models_Guide/StableDiffusion/build.sh --framework diffusion --tag $BASE_IMAGE:$BASE_IMAGE_TAG $NO_CACHE
$RUN_PREFIX cp $SOURCE_DIR/../Popular_Models_Guide/StableDiffusion/backend/diffusion/model.py backend/diffusion/model.py
$RUN_PREFIX mkdir -p diffusion-models/stable_diffusion_1_5/1
$RUN_PREFIX cp $SOURCE_DIR/../Popular_Models_Guide/StableDiffusion/diffusion-models/stable_diffusion_1_5/config.pbtxt diffusion-models/stable_diffusion_1_5/config.pbtxt
Expand All @@ -231,25 +246,14 @@ $RUN_PREFIX docker build -f $DOCKERFILE $BUILD_OPTIONS $BUILD_ARGS -t $TAG $SOUR
{ set +x; } 2>/dev/null


if [[ $FRAMEWORK == TRT_LLM ]]; then
if [ -z "$RUN_PREFIX" ]; then
set -x
fi

$RUN_PREFIX docker build -f $SOURCE_DIR/docker/Dockerfile.trt-llm-engine-builder $BUILD_OPTIONS $BUILD_ARGS -t trt-llm-engine-builder $SOURCE_DIR $NO_CACHE

{ set +x; } 2>/dev/null

fi;

if [[ $FRAMEWORK == IDENTITY ]] || [[ $BUILD_MODELS == TRUE ]]; then

if [[ $FRAMEWORK == DIFFUSION ]]; then
if [ -z "$RUN_PREFIX" ]; then
set -x
fi

$RUN_PREFIX docker run --rm -it -v $PWD:/workspace $TAG /bin/bash -c "/workspace/scripts/stable_diffusion/build_models.sh --model stable_diffusion_1_5"
$RUN_PREFIX docker run --gpus all --rm -it -v $PWD:/workspace $TAG /bin/bash -c "/workspace/scripts/stable_diffusion/build_models.sh --model stable_diffusion_1_5"

{ set +x; } 2>/dev/null
fi
Expand Down
29 changes: 29 additions & 0 deletions Triton_Inference_Server_Python_API/deps/requirements_trt_llm.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Copyright 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

fastapi==0.111.0
openai==1.26.0
pydantic==2.7.1
28 changes: 28 additions & 0 deletions Triton_Inference_Server_Python_API/deps/requirements_vllm.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Copyright 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

openai==1.26.0
vllm[all]==0.4.1
37 changes: 29 additions & 8 deletions Triton_Inference_Server_Python_API/docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -25,37 +25,58 @@
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

ARG BASE_IMAGE=nvcr.io/nvidia/tritonserver
ARG BASE_IMAGE_TAG=24.01-py3
ARG BASE_IMAGE_TAG=24.06-py3
ARG FRAMEWORK=DIFFUSION

FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} as triton-python-api

RUN apt-get update; apt-get install -y gdb

COPY ./deps/requirements.txt /tmp/requirements.txt

COPY ./deps/requirements_trt_llm.txt /tmp/requirements_trt_llm.txt

COPY ./deps/requirements_vllm.txt /tmp/requirements_vllm.txt

COPY ./deps/requirements_python_repl.txt /tmp/requirements_python_repl.txt

RUN pip install --timeout=2000 -r /tmp/requirements.txt

# Finish pyright install

RUN pyright --help

COPY ./deps/tritonserver-2.41.0.dev0-py3-none-any.whl /tmp/tritonserver-2.41.0.dev0-py3-none-any.whl

RUN find /opt/tritonserver/python -maxdepth 1 -type f -name \
"tritonserver-*.whl" | xargs -I {} pip3 install --force-reinstall --upgrade {}[all]

RUN pip3 show tritonserver 1>/dev/null || \
if [ $? != 0 ]; then \
pip3 install /tmp/tritonserver-2.41.0.dev0-py3-none-any.whl[all] ;\
fi
ARG TRITON_CLI_TAG="0.0.8"

RUN pip install git+https://github.com/triton-inference-server/triton_cli.git@${TRITON_CLI_TAG}

ARG GENAI_PERF_TAG="r24.05"

RUN pip install "git+https://github.com/triton-inference-server/client.git@${GENAI_PERF_TAG}#subdirectory=src/c++/perf_analyzer/genai-perf"

ARG INCLUDE_PYTHON_REPL

RUN if [[ "$INCLUDE_PYTHON_REPL" != "" ]] ; then pip install --timeout=2000 -r /tmp/requirements_python_repl.txt ; fi

ARG FRAMEWORK=DIFFUSION

RUN if [[ "$FRAMEWORK" == "TRT_LLM" ]] ; then pip install --timeout=2000 -r /tmp/requirements_trt_llm.txt ; fi

RUN if [[ "$FRAMEWORK" == "VLLM" ]] ; then pip install --timeout=2000 -r /tmp/requirements_vllm.txt ; fi

RUN ln -sf /bin/bash /bin/sh

COPY . /workspace

ARG RUN_TESTS=FALSE

RUN if [[ "$RUN_TESTS" == "TRUE" ]] ; then cd /tmp && git clone -b r23.12-python-api https://github.com/triton-inference-server/core.git && cp -rf /tmp/core/python/test /workspace/deps/ ; fi
RUN if [[ "$RUN_TESTS" == "TRUE" ]] ; then cd /tmp && git clone -b r24.04 https://github.com/triton-inference-server/core.git && cp -rf /tmp/core/python/test /workspace/deps/ ; fi

RUN if [[ "$RUN_TESTS" == "TRUE" ]] ; then pytest /workspace/deps ; fi

ARG INCLUDE_EMACS

RUN if [[ "$INCLUDE_EMACS" == "TRUE" ]] ; then apt-get install -y emacs ; fi
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,10 @@
**/*onnx*
**/*engine*
**/*pytorch_model*
**/*.pth*
**/*.pth*
**/*.pt
**/*.models/*
**/*.model-store/*
**/*.model.*/*
**/*.cache/*
**/*.libtorch_model_store/*
Loading
Loading