-
-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
graphrag can't index using mistral large 123B with exllamav2 #582
Comments
I don't really know what graphrag is or what sorts of requests it's sending. I take it this is with TabbyAPI? The reason the requests show up as cancelled would likely be that connections are closed by the frontend before they finish streaming. But I have no idea if that's intentional or not. Could also be a timeout. If you want actual concurrency for 25 requests you need a cache large enough to accommodate that, i.e. 25x the length of each prompt+max_new_tokens. Otherwise the requests that can't fit in the cache are scheduled for sequential inference instead. So what could be happening is that graphrag sends 25 requests, Tabby can fit 20 of them in the cache, they start streaming right away but the last 5 will appear to stall and maybe the frontend just gives up on them? Just a guess. |
graphrag concurrent_requests: 25 , timeout:180
after 180s, only 5 chat complete , there are too many chat timeout like :
ERROR: Chat completion 2a725977bfa24ff5ad768d0f0cf563d7 cancelled by user.
ERROR: Chat completion b1854e0f70b14a6c906310c3e5a7a7c6 cancelled by user.
ERROR: Chat completion 9aa757db07ba407dab480a61dcd1f44a cancelled by user.
ERROR: Chat completion faab131c61564578b653c5cda80494fe cancelled by user.
ERROR: Chat completion ce1e6fe36f1a4166935bbe211188cbf1 cancelled by user
why and how ?
The text was updated successfully, but these errors were encountered: