/v1/embeddings please #310
Replies: 14 comments 13 replies
-
Hi! Adding support to return embeddings is definitely on our road map. In addition, I believe the modifications to support embedding should not be very complicated and is a very good first issue. If you're interested, feel free to contribute! |
Beta Was this translation helpful? Give feedback.
-
I looked into it to maybe pick it up as a "good first issue", but did not find it to be straightforward to implement. I'm afraid any changes I would make, would just be hacks. If you have any pointers on how and where I could best add it, I'd be happy to give it a second look. |
Beta Was this translation helpful? Give feedback.
-
@zhuohan123 and @yuhai-china are you talking about a multilingual or the monolingual model? |
Beta Was this translation helpful? Give feedback.
-
@Vinno97 are you still working on it? I would love to help because I'm interested to use it too. |
Beta Was this translation helpful? Give feedback.
-
No I haven't come back to it. I hoped I could just create a new endpoint that hooked into the model and returned the last hidden state. But I found that the LLMEngine was so written around text generation that I didn't see myself easily and cleanly adding embeddings into it. But do give it a try! I must admit I spent less than an hour looking into it. |
Beta Was this translation helpful? Give feedback.
-
@yuhai-china @Vinno97 @bm777 Thanks for your interest in this. I previously misunderstood this API to be getting the hidden states for the generated sequence, and that should be easy. However, it turns out that this API is for a completely different set of models (i.e., BERT-like embedding models). The current vLLM mainly focuses on autoregressive generation. For embedding, both paged attention and continuous batching cannot help performance. Therefore, I think it's better to use other libraries for embedding for now. In the future, when we are extending the scope of vLLM, we will look into this again. |
Beta Was this translation helpful? Give feedback.
-
Move this issue to discussions as it's more of a longer future plan. |
Beta Was this translation helpful? Give feedback.
-
Does anyone have recommandation tools like vLLM for embedding models ? |
Beta Was this translation helpful? Give feedback.
-
Waiting of this major feature of VLLM, I created a very simple merge version of VLLM and HuggingFace Text Embeddings Inference to have one API with full OpenAI endpoints : /v1/embeddings, /v1/chat/completion ... : https://github.com/leoguillaume/VLLMEmbeddings |
Beta Was this translation helpful? Give feedback.
-
this looks like its been implemented! #3734 |
Beta Was this translation helpful? Give feedback.
-
Hi all, as output in the generate() method, I would like to also get the hidden_states associated to the generated sequences. As fas as I searched, this wasnt available, has this been implemented now? |
Beta Was this translation helpful? Give feedback.
-
Have you found an alternative method? |
Beta Was this translation helpful? Give feedback.
-
So now you support it? |
Beta Was this translation helpful? Give feedback.
-
Is it supported? If possible, could this be added into the vllm_worker of Fastchat? Thanks https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/vllm_worker.py |
Beta Was this translation helpful? Give feedback.
-
when will /v1/embeddings API available?
Thank you
Beta Was this translation helpful? Give feedback.
All reactions