Use with langchain? #898

frankandrobot · 2023-08-29T04:33:28Z

frankandrobot
Aug 29, 2023

How can vllm be used with langchain?

since it produces an OpenAPI compatible API, the first idea that I had was to spin up a vvlm server and then use langchain's OpenApi class and point it at that instance
but ....
is there a simpler way? Ex: is it possible to use the LLM class directly with langchain? If so, is there an example?

mspronesti · 2023-09-10T16:11:54Z

mspronesti
Sep 10, 2023

Hi, have a look at this notebook.
Hope this helps.

1 reply

mspronesti Sep 27, 2023

Hi again @frankandrobot. Besides the notebook linked above, you can check the one that I recently added to langchain's docs (here).

In addition, I opened a PR to include some instructions on how to use vLLM with Langchain.
Hope this helps :) If it does, please mark this as an answer.

pvtoan · 2023-11-02T09:10:04Z

pvtoan
Nov 2, 2023

Hi @mspronesti , does this LangChain-VLLM support quantized model?

Because the vllm-project already supported quantized model (AWQ format) as shown in #1032

However, when I use the same way and just pass "quantization='awq" to your LangChain-VLLM, it seems does not work and just show OOM.

model_path = "/home/quadrep/toan/projects/LLMs/weights/vicuna-33B-AWQ"
model = VLLM(model=model_path, tensor_parallel_size=2, trust_remote_code=True, max_new_tokens=512, quantization='awq')
--> Error: torch.cuda.OutOfMemoryError: CUDA out of memory.

3 replies

hymie122 Nov 15, 2023

I met the same problem

pvtoan Nov 15, 2023

Hi,

@mspronesti answered me here #1162 (comment)

It works well.

hymie122 Nov 20, 2023

Thanks!

mspronesti · 2023-12-01T10:13:29Z

mspronesti
Dec 1, 2023

The PR has been merged. I suppose we can mark this discussion as completed :)

0 replies

tonyaw · 2024-02-26T09:41:56Z

tonyaw
Feb 26, 2024

Hello, I'm using https://python.langchain.com/docs/integrations/llms/vllm#openai-compatible-server to communicate a vllm server.
How to set openai_api_key?
My code:

llm = VLLMOpenAI(
    openai_api_key="EMPTY",
    openai_api_base="https://vllm_server_url/v1",
    model_name="mistralai/Mixtral-8x7B-Instruct-v0.1",
    model_kwargs={"stop": ["."]},
)

I got following error:

  File "/home/worker/app/.venv/lib/python3.11/site-packages/pydantic/v1/main.py", line 341, in __init__
    raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for TavilySearchAPIWrapper
__root__
  Did not find tavily_api_key, please add an environment variable `TAVILY_API_KEY` which contains it, or pass `tavily_api_key` as a named parameter. (type=value_error)

1 reply

tonyaw Feb 26, 2024

I think I got it. TAVILY_API_KEY is for
tools = [TavilySearchResults(max_results=1)]
The next question is, my vllm server doesn't set "api_key". Any way to also not set api_key from langchain side?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use with langchain? #898

{{title}}

Replies: 4 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Use with langchain? #898

Replies: 4 comments · 5 replies

Replies: 4 comments 5 replies