Usage Statistics
Track token consumption and performance metrics from AI provider responses. All providers return normalized usage statistics for consistent cost tracking and monitoring.
Monitor Usage in Production
See Instrumentation to monitor usage statistics in real-time using ActiveSupport::Notifications.
Accessing Usage
Get usage statistics from any response:
# Normalized fields (available across all providers)
response.usage.input_tokens
response.usage.output_tokens
response.usage.total_tokensCommon Fields
These fields work across all providers:
usage = response.usage
# All providers support these
usage.input_tokens # Tokens in the prompt/input
usage.output_tokens # Tokens in the completion/output
usage.total_tokens # Total tokens used (auto-calculated if not provided)Provider-Specific Fields
Access advanced metrics when available:
usage = response.usage
# OpenAI-specific fields
usage.cached_tokens # Prompt tokens served from cache
usage.reasoning_tokens # Tokens used for reasoning (o1 models)
usage.audio_tokens # Tokens for audio input/outputusage = response.usage
# Anthropic-specific fields
usage.cached_tokens # Tokens read from cache
usage.cache_creation_tokens # Tokens written to cache
usage.service_tier # "standard" or "prioritized"usage = response.usage
# Ollama-specific fields
usage.duration_ms # Total request duration in ms
usage.provider_details[:tokens_per_second] # Generation throughputProvider Details
Raw provider data preserved in provider_details:
usage = response.usage
# Access raw provider-specific data
usage.provider_details
# Contains: prompt_tokens_details, completion_tokens_details, etc.usage = response.usage
# Ollama provides detailed timing metrics
usage.provider_details[:load_duration_ms] # Model load time
usage.provider_details[:prompt_eval_duration_ms] # Prompt processing time
usage.provider_details[:eval_duration_ms] # Generation timeCost Tracking
Calculate costs using token counts:
INPUT_PRICE_PER_TOKEN = 0.00001
OUTPUT_PRICE_PER_TOKEN = 0.00003
CACHE_DISCOUNT_PER_TOKEN = 0.000005
# Track usage per request
input_cost = response.usage.input_tokens * INPUT_PRICE_PER_TOKEN
output_cost = response.usage.output_tokens * OUTPUT_PRICE_PER_TOKEN
total_cost = input_cost + output_cost
# Account for cached tokens (reduced cost)
if response.usage.cached_tokens
cache_savings = response.usage.cached_tokens * CACHE_DISCOUNT_PER_TOKEN
total_cost -= cache_savings
endMonitor costs in production: Use Instrumentation to automatically track costs across all requests.
Embeddings Usage
Embedding responses have zero output tokens:
# Embeddings only consume input tokens
response.usage.input_tokens # Text vectorized
response.usage.output_tokens # Always 0 for embeddings
response.usage.total_tokens # Same as input_tokensField Mapping
How provider fields map to normalized names:
| Provider | input_tokens | output_tokens | total_tokens |
|---|---|---|---|
| OpenAI Chat | prompt_tokens | completion_tokens | total_tokens |
| OpenAI Embed | prompt_tokens | 0 | total_tokens |
| OpenAI Responses | input_tokens | output_tokens | total_tokens |
| Anthropic | input_tokens | output_tokens | calculated |
| Ollama | prompt_eval_count | eval_count | calculated |
| OpenRouter | prompt_tokens | completion_tokens | total_tokens |
Note: total_tokens is automatically calculated as input_tokens + output_tokens when not provided by the provider.