-
The non-streaming chat/completions API from OpenAI has a {
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "\n\nHello there, how may I assist you today?",
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
} Which is great when you have different users and want to put limits depending on usage. Any recommended way to do the same with the Vercel AI SDK? |
Beta Was this translation helpful? Give feedback.
Replies: 15 comments 21 replies
-
One potential option might be to track prompt tokens by using a tokenizer library before starting the stream or in the onStart callback. For the completion tokens, the streaming APIs return one token at a time, so you can track this in the onToken callback. From a quick search, dqbd/tiktoken seems to support Vercel's Edge Runtime. |
Beta Was this translation helpful? Give feedback.
-
Great feedback. We want to add this to the playground too. |
Beta Was this translation helpful? Give feedback.
-
I use https://github.com/dqbd/tiktoken for our production application. I've noticed that it gets very slow as the number of tokens you are counting goes up, so if you try and count in the |
Beta Was this translation helpful? Give feedback.
-
We built this for this exact use case with David’s tiktokenizer package and David’s help. https://tiktokenizer.vercel.app/ |
Beta Was this translation helpful? Give feedback.
-
Apparently, OpenAI already has a feature for this, but it's disabled. From https://community.openai.com/t/usage-info-in-api-responses/18862/3 :
Maybe someone can convince them to enable it with a flag or something. |
Beta Was this translation helpful? Give feedback.
-
We've done a bit of research here and every tokenizer is too large (generally due to wasm) for us to include by default with the SDK. Our recommendation going forward will be to use your tokenizer of choice paired with the |
Beta Was this translation helpful? Give feedback.
-
Makes sense. To be honest, my hope was that Vercel could convince OpenAI to add the |
Beta Was this translation helpful? Give feedback.
-
maybe not the right place to ask, but how can we access the stop_reason when streaming with the new v4 sdks? |
Beta Was this translation helpful? Give feedback.
-
How does everyone approach this? |
Beta Was this translation helpful? Give feedback.
-
OpenAI should just provide this in the stream response. Following this thread to see how ppl are doing it in the meantime. |
Beta Was this translation helpful? Give feedback.
-
@pomber what solution did you settle on? |
Beta Was this translation helpful? Give feedback.
-
I have been using https://www.npmjs.com/package/@dqbd/tiktoken for months but have found issues since the 16k context was released for GPT-3.5. If the prompt context is long enough, then tiktoken takes a noticeable time to run. It takes longer as the prompt grows. I have some ideas to optimize this. |
Beta Was this translation helpful? Give feedback.
-
found this quite useful if you want to a clean counting on tokens https://github.com/Cainier/gpt-tokens |
Beta Was this translation helpful? Give feedback.
-
streamText has a usage option which will give you the token usage. Usage response const result = await streamText({
model: openai('gpt-4o'),
messages,
temperature: 0
})
const stream = result.toAIStream({
async onFinal(completion: any) {
const tokenCount = await result.usage // if you want to call the usage tokens onCompletion + save stuff etc
}) returns an object |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
streamText
has aonFinish
callback (starting withv3.1.15
) that sendsusage
(among other things)