Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual interruption of Ollama embedding during asynchronous thread execution #1595

Open
viosay opened this issue Oct 24, 2024 · 3 comments
Open
Labels

Comments

@viosay
Copy link

viosay commented Oct 24, 2024

When I make vectorization embedding requests using Ollama, it takes a long time due to the performance issues of the server where Ollama is hosted. Therefore, I used CompletableFuture for asynchronous thread calls. However, in some cases, I need to manually interrupt the thread to stop the vectorization embedding request to Ollama. Normally, the request to Ollama should be interrupted when the thread is interrupted. But things didn't go as expected, because the embed method of OllamaApi uses RestClient. After calling future.cancel() on the thread, the underlying I/O operation (such as an HTTP request) doesn't immediately respond to the interrupt signal, so the request continues even after the thread is interrupted.

I noticed that OpenaiApi uses WebClient, and I'm considering whether the embed method of OllamaApi could be enhanced to support requests using WebClient. I could use Mono.fromCallable to start an asynchronous thread and manage the subscription with Disposable. When I need to interrupt the thread, I could use disposable.dispose() to cancel the task. WebClient should be able to respond better to the cancellation, thus interrupting the request to the Ollama service.

This is just my perspective. I'm not sure if it's correct, but I will try to validate it.

@tzolov
Copy link
Contributor

tzolov commented Oct 24, 2024

@viosay, before jumping into conclusion can you please share some context.
What Spring AI version are you using?
What Ollama Embedding model have you configured to use?

@viosay
Copy link
Author

viosay commented Oct 24, 2024

@viosay, before jumping into conclusion can you please share some context. What Spring AI version are you using? What Ollama Embedding model have you configured to use?

Sorry, my description was indeed lacking. I am using SpringAI version 1.0.0-M3, and the Embedding models I am using on Ollama, such as shaw/dmeta-embedding-zh, 893379029/piccolo-large-zh-v2, and viosay/conan-embedding-v1, all have this issue, but it should not be related to the models.

@tzolov
Copy link
Contributor

tzolov commented Oct 25, 2024

Thank you for the update @viosay ,
I see what you are trying to to achieve, but I'm not convinced using WebClient for non-streaming endpoint is the right solution. Let me think about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants