Skip to content

Usage Statistics

Track token consumption and performance metrics from AI provider responses. All providers return normalized usage statistics for consistent cost tracking and monitoring.

Monitor Usage in Production

See Instrumentation to monitor usage statistics in real-time using ActiveSupport::Notifications.

Accessing Usage

Get usage statistics from any response:

ruby
# Normalized fields (available across all providers)
response.usage.input_tokens
response.usage.output_tokens
response.usage.total_tokens

Common Fields

These fields work across all providers:

ruby
usage = response.usage

# All providers support these
usage.input_tokens    # Tokens in the prompt/input
usage.output_tokens   # Tokens in the completion/output
usage.total_tokens    # Total tokens used (auto-calculated if not provided)

Provider-Specific Fields

Access advanced metrics when available:

ruby
usage = response.usage

# OpenAI-specific fields
usage.cached_tokens      # Prompt tokens served from cache
usage.reasoning_tokens   # Tokens used for reasoning (o1 models)
usage.audio_tokens       # Tokens for audio input/output
ruby
usage = response.usage

# Anthropic-specific fields
usage.cached_tokens             # Tokens read from cache
usage.cache_creation_tokens     # Tokens written to cache
usage.service_tier              # "standard" or "prioritized"
ruby
usage = response.usage

# Ollama-specific fields
usage.duration_ms                             # Total request duration in ms
usage.provider_details[:tokens_per_second]    # Generation throughput

Provider Details

Raw provider data preserved in provider_details:

ruby
usage = response.usage

# Access raw provider-specific data
usage.provider_details
# Contains: prompt_tokens_details, completion_tokens_details, etc.
ruby
usage = response.usage

# Ollama provides detailed timing metrics
usage.provider_details[:load_duration_ms]         # Model load time
usage.provider_details[:prompt_eval_duration_ms]  # Prompt processing time
usage.provider_details[:eval_duration_ms]         # Generation time

Cost Tracking

Calculate costs using token counts:

ruby
INPUT_PRICE_PER_TOKEN = 0.00001
OUTPUT_PRICE_PER_TOKEN = 0.00003
CACHE_DISCOUNT_PER_TOKEN = 0.000005

# Track usage per request
input_cost = response.usage.input_tokens * INPUT_PRICE_PER_TOKEN
output_cost = response.usage.output_tokens * OUTPUT_PRICE_PER_TOKEN
total_cost = input_cost + output_cost

# Account for cached tokens (reduced cost)
if response.usage.cached_tokens
  cache_savings = response.usage.cached_tokens * CACHE_DISCOUNT_PER_TOKEN
  total_cost -= cache_savings
end

Monitor costs in production: Use Instrumentation to automatically track costs across all requests.

Embeddings Usage

Embedding responses have zero output tokens:

ruby
# Embeddings only consume input tokens
response.usage.input_tokens   # Text vectorized
response.usage.output_tokens  # Always 0 for embeddings
response.usage.total_tokens   # Same as input_tokens

Field Mapping

How provider fields map to normalized names:

Providerinput_tokensoutput_tokenstotal_tokens
OpenAI Chatprompt_tokenscompletion_tokenstotal_tokens
OpenAI Embedprompt_tokens0total_tokens
OpenAI Responsesinput_tokensoutput_tokenstotal_tokens
Anthropicinput_tokensoutput_tokenscalculated
Ollamaprompt_eval_counteval_countcalculated
OpenRouterprompt_tokenscompletion_tokenstotal_tokens

Note: total_tokens is automatically calculated as input_tokens + output_tokens when not provided by the provider.