-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using vLLM to deploy LLM as an API to accelerate inference #100
Comments
Hi, Can I know if is possible to run in with ollama and then host the LLM locally? |
I find comfyui_omost show a way to accelerate inference by TGI(text generation inference). |
Good idea! Could you kindly share the code? |
Based on practical tests, deploying omost-llama-3-8b on an A100 using torch==2.3.0+cu118, vllm==0.5.0.post1+cu118, and xformers==0.0.26.post1+cu118 works well. if want to speed up the process, can refer to this setup.
vllm: https://docs.vllm.ai/en/stable/getting_started/quickstart.html
The text was updated successfully, but these errors were encountered: