graphrag can't index using mistral large 123B with exllamav2 #582

xxll88 · 2024-08-03T01:45:14Z

graphrag concurrent_requests: 25 , timeout:180
after 180s, only 5 chat complete , there are too many chat timeout like :
ERROR: Chat completion 2a725977bfa24ff5ad768d0f0cf563d7 cancelled by user.
ERROR: Chat completion b1854e0f70b14a6c906310c3e5a7a7c6 cancelled by user.
ERROR: Chat completion 9aa757db07ba407dab480a61dcd1f44a cancelled by user.
ERROR: Chat completion faab131c61564578b653c5cda80494fe cancelled by user.
ERROR: Chat completion ce1e6fe36f1a4166935bbe211188cbf1 cancelled by user

why and how ?

turboderp · 2024-08-08T11:37:10Z

I don't really know what graphrag is or what sorts of requests it's sending. I take it this is with TabbyAPI?

The reason the requests show up as cancelled would likely be that connections are closed by the frontend before they finish streaming. But I have no idea if that's intentional or not. Could also be a timeout.

If you want actual concurrency for 25 requests you need a cache large enough to accommodate that, i.e. 25x the length of each prompt+max_new_tokens. Otherwise the requests that can't fit in the cache are scheduled for sequential inference instead. So what could be happening is that graphrag sends 25 requests, Tabby can fit 20 of them in the cache, they start streaming right away but the last 5 will appear to stall and maybe the frontend just gives up on them?

Just a guess.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graphrag can't index using mistral large 123B with exllamav2 #582

graphrag can't index using mistral large 123B with exllamav2 #582

xxll88 commented Aug 3, 2024 •

edited

Loading

turboderp commented Aug 8, 2024

graphrag can't index using mistral large 123B with exllamav2 #582

graphrag can't index using mistral large 123B with exllamav2 #582

Comments

xxll88 commented Aug 3, 2024 • edited Loading

turboderp commented Aug 8, 2024

xxll88 commented Aug 3, 2024 •

edited

Loading