diff --git a/model-deployment/containers/nim/Dockerfile b/model-deployment/containers/nim/Dockerfile index f3cc9967..76f85c83 100644 --- a/model-deployment/containers/nim/Dockerfile +++ b/model-deployment/containers/nim/Dockerfile @@ -2,9 +2,10 @@ FROM nvcr.io/nim/meta/llama3-8b-instruct:latest USER root COPY start.sh /opt/nim/start.sh +RUN chmod +x /opt/nim/start.sh ENV NIM_CACHE_PATH /opt/ds/model/deployed_model -ENV NIM_SERVER_PORT 8080 +ENV NIM_SERVER_PORT 8000 EXPOSE ${NIM_SERVER_PORT} ENTRYPOINT [ "/bin/bash", "--login", "-c"] diff --git a/model-deployment/containers/nim/README.md b/model-deployment/containers/nim/README.md index 00fbe257..8c37de9a 100644 --- a/model-deployment/containers/nim/README.md +++ b/model-deployment/containers/nim/README.md @@ -73,6 +73,7 @@ This file will be available to container on location `/opt/ds/model/deployed_mod * Key: `MODEL_DEPLOY_PREDICT_ENDPOINT`, Value: `/v1/completions` * Key: `MODEL_DEPLOY_HEALTH_ENDPOINT`, Value: `/v1/health/ready` * Key: `NGC_API_KEY_FILE`, Value: `/opt/ds/model/deployed_model/token` + * Key: `SHM_SIZE`, Value: `5g` * Under `Models` click on the `Select` button and select the Model Catalog entry we created earlier * Under `Compute` and then `Specialty and previous generation` select the `VM.GPU.A10.1` instance * Under `Networking` choose the `Custom Networking` option and bring the VCN and subnet, which allows Internet access.