Token usage metrics adding up strangely #1609

habuma · 2024-10-28T03:20:34Z

This feels like a bug to me, but it may be expected behavior. Reporting it in case it is in fact a problem.

I was inspecting the /actuator/metrics/gen_ai.client.token.usage endpoint and found that the total number of tokens used were 5336.

So then I pivoted on the "gen_ai.operation.name" tag to find out how many of those are for "chat" (as opposed to "embedding"). I got 5206.

Then I wondered how many of those are input tokens. So I applied the "gen_ai.token.type" tag along with "gen_ai.operation.name" to find out that there were 2562 input tokens. Great! So then should I assume that all of the other tokens are output tokens? That is, should I expect that there were 5206-2562=2644 output tokens?

No. When I asked for chat output tokens (by changing the "gen_ai.token.type" tag to "output"), I got back only 41.

So...scratching my head a little before realizing that the original total of 5206 chat tokens is actually the sum of input + outout + total tokens. In effect, the total number of tokens given without using the "gen_ai.token.type" tag is double what it should be. The actual total should be 2603.

Again, this may be expected behavior. I get it that if I ask for "gen_ai.token.type" with "total", I will get back 2603, which is the actual total. But it's a bit misleading to ask for "gen_ai.operation.name" for "chat" without also using "gen_ai.token.type" gives me double the actual count of tokens.

The text was updated successfully, but these errors were encountered:

tzolov · 2024-11-06T11:58:31Z

Hi @habuma ,

I get why it might feel strange, but the behavior you're seeing is expected - Spring AI's metrics handler simply reports what the model provides without additional processing.

The total token count comes directly from the model API response, even though it may seem redundant given we have input and output counts. While unclear why models include this explicit total, we need to preserve this information in our metrics.

Regarding metrics organization: Unlike JMX's hierarchical structure, time-series databases like Prometheus use labels/tags for aggregation, which is why we keep all usage metrics grouped together rather than splitting them out.

Moving the total usage metrics outside the usage metrics wold feel strange for TSDB.

asaikali added bug Something isn't working Observability labels Oct 30, 2024

markpollack assigned tzolov Nov 5, 2024

markpollack added this to the 1.0.0-M4 milestone Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token usage metrics adding up strangely #1609

Token usage metrics adding up strangely #1609

habuma commented Oct 28, 2024

tzolov commented Nov 6, 2024

Token usage metrics adding up strangely #1609

Token usage metrics adding up strangely #1609

Comments

habuma commented Oct 28, 2024

tzolov commented Nov 6, 2024