You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This feels like a bug to me, but it may be expected behavior. Reporting it in case it is in fact a problem.
I was inspecting the /actuator/metrics/gen_ai.client.token.usage endpoint and found that the total number of tokens used were 5336.
So then I pivoted on the "gen_ai.operation.name" tag to find out how many of those are for "chat" (as opposed to "embedding"). I got 5206.
Then I wondered how many of those are input tokens. So I applied the "gen_ai.token.type" tag along with "gen_ai.operation.name" to find out that there were 2562 input tokens. Great! So then should I assume that all of the other tokens are output tokens? That is, should I expect that there were 5206-2562=2644 output tokens?
No. When I asked for chat output tokens (by changing the "gen_ai.token.type" tag to "output"), I got back only 41.
So...scratching my head a little before realizing that the original total of 5206 chat tokens is actually the sum of input + outout + total tokens. In effect, the total number of tokens given without using the "gen_ai.token.type" tag is double what it should be. The actual total should be 2603.
Again, this may be expected behavior. I get it that if I ask for "gen_ai.token.type" with "total", I will get back 2603, which is the actual total. But it's a bit misleading to ask for "gen_ai.operation.name" for "chat" without also using "gen_ai.token.type" gives me double the actual count of tokens.
The text was updated successfully, but these errors were encountered:
I get why it might feel strange, but the behavior you're seeing is expected - Spring AI's metrics handler simply reports what the model provides without additional processing.
The total token count comes directly from the model API response, even though it may seem redundant given we have input and output counts. While unclear why models include this explicit total, we need to preserve this information in our metrics.
Regarding metrics organization: Unlike JMX's hierarchical structure, time-series databases like Prometheus use labels/tags for aggregation, which is why we keep all usage metrics grouped together rather than splitting them out.
Moving the total usage metrics outside the usage metrics wold feel strange for TSDB.
This feels like a bug to me, but it may be expected behavior. Reporting it in case it is in fact a problem.
I was inspecting the
/actuator/metrics/gen_ai.client.token.usage
endpoint and found that the total number of tokens used were 5336.So then I pivoted on the "gen_ai.operation.name" tag to find out how many of those are for "chat" (as opposed to "embedding"). I got 5206.
Then I wondered how many of those are input tokens. So I applied the "gen_ai.token.type" tag along with "gen_ai.operation.name" to find out that there were 2562 input tokens. Great! So then should I assume that all of the other tokens are output tokens? That is, should I expect that there were 5206-2562=2644 output tokens?
No. When I asked for chat output tokens (by changing the "gen_ai.token.type" tag to "output"), I got back only 41.
So...scratching my head a little before realizing that the original total of 5206 chat tokens is actually the sum of input + outout + total tokens. In effect, the total number of tokens given without using the "gen_ai.token.type" tag is double what it should be. The actual total should be 2603.
Again, this may be expected behavior. I get it that if I ask for "gen_ai.token.type" with "total", I will get back 2603, which is the actual total. But it's a bit misleading to ask for "gen_ai.operation.name" for "chat" without also using "gen_ai.token.type" gives me double the actual count of tokens.
The text was updated successfully, but these errors were encountered: