Ollama Provider
The Ollama provider enables local LLM inference using the Ollama platform. Run models like Llama 3, Mistral, and Gemma locally without sending data to external APIs, perfect for privacy-sensitive applications and development.
Configuration
Basic Setup
Configure Ollama in your agent:
class OllamaAgent < ApplicationAgent
layout "agent"
generate_with :ollama, model: "gemma3:latest", instructions: "You're a basic Ollama agent."
end
Configuration File
Set up Ollama in config/active_agent.yml
:
ollama: &ollama
service: "Ollama"
access_token: ""
host: "http://localhost:11434"
model: "gemma3:latest"
temperature: 0.7
ollama:
<<: *ollama
Environment Variables
Configure via environment:
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=llama3
Installing Ollama
macOS/Linux
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Start Ollama service
ollama serve
# Pull a model
ollama pull llama3
Docker
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec -it ollama ollama pull llama3
Supported Models
Popular Models
- llama3 - Meta's Llama 3 (8B, 70B)
- mistral - Mistral 7B
- gemma - Google's Gemma (2B, 7B)
- codellama - Code-specialized Llama
- mixtral - Mixture of experts model
- phi - Microsoft's Phi-2
- neural-chat - Intel's fine-tuned model
- qwen - Alibaba's Qwen models
List Available Models
class OllamaAdmin < ApplicationAgent
generate_with :ollama
def list_models
# Get list of installed models
response = HTTParty.get("#{ollama_host}/api/tags")
response["models"]
end
private
def ollama_host
Rails.configuration.active_agent.dig(:ollama, :host) || "http://localhost:11434"
end
end
Features
Local Inference
Run models completely offline:
class PrivateDataAgent < ApplicationAgent
generate_with :ollama, model: "llama3"
def process_sensitive_data
@data = params[:sensitive_data]
# Data never leaves your infrastructure
prompt instructions: "Process this confidential information"
end
end
Model Switching
Easily switch between models:
class MultiModelAgent < ApplicationAgent
def code_review
# Use specialized code model
self.class.generate_with :ollama, model: "codellama"
@code = params[:code]
prompt
end
def general_chat
# Use general purpose model
self.class.generate_with :ollama, model: "llama3"
@message = params[:message]
prompt
end
end
Custom Models
Use fine-tuned or custom models:
class CustomModelAgent < ApplicationAgent
generate_with :ollama, model: "my-custom-model:latest"
before_action :ensure_model_exists
private
def ensure_model_exists
# Check if model is available
models = fetch_available_models
unless models.include?(generation_provider.model)
raise "Model #{generation_provider.model} not found. Run: ollama pull #{generation_provider.model}"
end
end
end
Structured Output
Ollama can generate JSON-formatted responses through careful prompting and model selection. While Ollama doesn't have native structured output like OpenAI, many models can reliably produce JSON when properly instructed.
Approach
To get structured output from Ollama:
- Choose the right model - Models like Llama 3, Mixtral, and Mistral are good at following formatting instructions
- Use clear prompts - Explicitly request JSON format in your instructions
- Set low temperature - Use values like 0.1-0.3 for more consistent formatting
- Parse and validate - Always validate the response as it may not be valid JSON
Example Approach
class OllamaAgent < ApplicationAgent
generate_with :ollama,
model: "llama3",
temperature: 0.1 # Lower temperature for consistency
def extract_with_json_prompt
prompt(
instructions: <<~INST,
You must respond ONLY with valid JSON.
Extract the key information and format as:
{"field1": "value", "field2": "value"}
No explanation, just the JSON object.
INST
message: params[:text]
)
end
end
# Usage - parse with error handling
response = agent.extract_with_json_prompt.generate_now
begin
data = JSON.parse(response.message.content)
rescue JSON::ParserError
# Handle malformed JSON
end
Best Practices
- Model Selection: Test different models to find which works best for your use case
- Prompt Engineering: Be very explicit about JSON requirements
- Validation: Always validate and handle parsing errors
- Local Processing: Ideal for sensitive data that must stay on-premise
Limitations
- No guaranteed JSON output like OpenAI's strict mode
- Quality varies significantly by model
- May require multiple attempts or fallback logic
- Complex schemas may be challenging
For reliable structured output, consider using OpenAI or OpenRouter providers. For local processing requirements where Ollama is necessary, implement robust validation and error handling.
See the Structured Output guide for more information about structured output patterns.
Streaming Responses
Stream responses for better UX:
class StreamingOllamaAgent < ApplicationAgent
generate_with :ollama,
model: "llama3",
stream: true
on_message_chunk do |chunk|
# Handle streaming chunks
Rails.logger.info "Chunk: #{chunk}"
broadcast_to_client(chunk)
end
def chat
prompt(message: params[:message])
end
end
Embeddings Support
Generate embeddings locally using Ollama's embedding models. See the Embeddings Framework Documentation for comprehensive coverage.
Basic Embedding Generation
provider = ActiveAgent::GenerationProvider::OllamaProvider.new(@config)
prompt = ActiveAgent::ActionPrompt::Prompt.new(
message: ActiveAgent::ActionPrompt::Message.new(content: "Generate an embedding for this text"),
instructions: "You are an embedding test agent"
)
response = provider.embed(prompt)
Response Example
Connection Required
Ollama must be running locally. If you see connection errors, start Ollama with:
ollama serve
Available Embedding Models
- nomic-embed-text - High-quality text embeddings (768 dimensions)
- mxbai-embed-large - Large embedding model (1024 dimensions)
- all-minilm - Lightweight embeddings (384 dimensions)
Pull Embedding Models
# Install embedding models
ollama pull nomic-embed-text
ollama pull mxbai-embed-large
Error Handling
Ollama provides helpful error messages when the service is not available:
require "test_helper"
require "openai"
require "active_agent/action_prompt"
require "active_agent/generation_provider/ollama_provider"
class OllamaProviderTest < ActiveSupport::TestCase
setup do
@config = {
"service" => "Ollama",
"model" => "gemma3:latest",
"host" => "http://localhost:11434",
"api_version" => "v1",
"embedding_model" => "nomic-embed-text"
}
@provider = ActiveAgent::GenerationProvider::OllamaProvider.new(@config)
@prompt = ActiveAgent::ActionPrompt::Prompt.new(
message: ActiveAgent::ActionPrompt::Message.new(content: "Test content for embedding"),
instructions: "You are a test agent"
)
end
test "initializes with correct configuration" do
assert_equal "gemma3:latest", @provider.instance_variable_get(:@model_name)
assert_equal "http://localhost:11434", @provider.instance_variable_get(:@host)
assert_equal "v1", @provider.instance_variable_get(:@api_version)
client = @provider.instance_variable_get(:@client)
assert_instance_of OpenAI::Client, client
end
test "uses default values when config values not provided" do
minimal_config = {
"service" => "Ollama",
"model" => "llama2:latest"
}
provider = ActiveAgent::GenerationProvider::OllamaProvider.new(minimal_config)
assert_equal "http://localhost:11434", provider.instance_variable_get(:@host)
assert_equal "v1", provider.instance_variable_get(:@api_version)
end
test "embeddings_parameters returns correct structure" do
params = @provider.send(:embeddings_parameters, input: "Test text", model: "nomic-embed-text")
assert_equal "nomic-embed-text", params[:model]
assert_equal "Test text", params[:input]
end
test "embeddings_parameters uses config embedding_model when available" do
params = @provider.send(:embeddings_parameters, input: "Test text")
assert_equal "nomic-embed-text", params[:model]
assert_equal "Test text", params[:input]
end
test "embeddings_parameters uses prompt message content by default" do
@provider.instance_variable_set(:@prompt, @prompt)
params = @provider.send(:embeddings_parameters)
assert_equal "nomic-embed-text", params[:model]
assert_equal "Test content for embedding", params[:input]
end
test "embeddings_response creates proper response object" do
mock_response = {
"embedding" => [ 0.1, 0.2, 0.3, 0.4, 0.5 ],
"model" => "nomic-embed-text",
"created" => 1234567890
}
request_params = {
model: "nomic-embed-text",
input: "Test text"
}
@provider.instance_variable_set(:@prompt, @prompt)
response = @provider.send(:embeddings_response, mock_response, request_params)
assert_instance_of ActiveAgent::GenerationProvider::Response, response
assert_equal @prompt, response.prompt
assert_instance_of ActiveAgent::ActionPrompt::Message, response.message
assert_equal [ 0.1, 0.2, 0.3, 0.4, 0.5 ], response.message.content
assert_equal "assistant", response.message.role
assert_equal mock_response, response.raw_response
assert_equal request_params, response.raw_request
end
test "embed method works with Ollama provider" do
VCR.use_cassette("ollama_provider_embed") do
# region ollama_provider_embed
provider = ActiveAgent::GenerationProvider::OllamaProvider.new(@config)
prompt = ActiveAgent::ActionPrompt::Prompt.new(
message: ActiveAgent::ActionPrompt::Message.new(content: "Generate an embedding for this text"),
instructions: "You are an embedding test agent"
)
response = provider.embed(prompt)
# endregion ollama_provider_embed
assert_not_nil response
assert_instance_of ActiveAgent::GenerationProvider::Response, response
assert_not_nil response.message.content
assert_kind_of Array, response.message.content
assert response.message.content.all? { |val| val.is_a?(Numeric) }
doc_example_output(response)
rescue Errno::ECONNREFUSED, Net::OpenTimeout, Net::ReadTimeout => e
skip "Ollama is not running locally: #{e.message}"
end
end
test "embed method provides helpful error when Ollama not running" do
# Configure with a bad port to simulate Ollama not running
# Disable VCR for this test to allow actual connection failure
VCR.turn_off!
WebMock.allow_net_connect!
bad_config = @config.merge("host" => "http://localhost:99999")
provider = ActiveAgent::GenerationProvider::OllamaProvider.new(bad_config)
prompt = ActiveAgent::ActionPrompt::Prompt.new(
message: ActiveAgent::ActionPrompt::Message.new(content: "Test embedding"),
instructions: "Test agent"
)
error = assert_raises(ActiveAgent::GenerationProvider::Base::GenerationProviderError) do
provider.embed(prompt)
end
assert_match(/Unable to connect to Ollama at http:\/\/localhost:99999/, error.message)
assert_match(/Please ensure Ollama is running/, error.message)
assert_match(/ollama serve/, error.message)
ensure
VCR.turn_on!
WebMock.disable_net_connect!
end
test "inherits from OpenAIProvider" do
assert ActiveAgent::GenerationProvider::OllamaProvider < ActiveAgent::GenerationProvider::OpenAIProvider
end
test "overrides embeddings methods from parent class" do
# Verify that OllamaProvider has its own implementation of these methods
assert @provider.respond_to?(:embeddings_parameters, true)
assert @provider.respond_to?(:embeddings_response, true)
# Verify the methods are defined in OllamaProvider, not just inherited
ollama_methods = ActiveAgent::GenerationProvider::OllamaProvider.instance_methods(false)
assert_includes ollama_methods, :embeddings_parameters
assert_includes ollama_methods, :embeddings_response
end
test "handles Ollama-specific embedding format" do
# Test native Ollama format
ollama_response = {
"embedding" => [ 0.1, 0.2, 0.3 ],
"model" => "nomic-embed-text"
}
@provider.instance_variable_set(:@prompt, @prompt)
response = @provider.send(:embeddings_response, ollama_response)
assert_equal [ 0.1, 0.2, 0.3 ], response.message.content
end
test "handles OpenAI-compatible embedding format from Ollama" do
# Test OpenAI-compatible format that newer Ollama versions return
openai_format_response = {
"data" => [
{
"embedding" => [ 0.4, 0.5, 0.6 ],
"object" => "embedding"
}
],
"model" => "nomic-embed-text",
"object" => "list"
}
@provider.instance_variable_set(:@prompt, @prompt)
response = @provider.send(:embeddings_response, openai_format_response)
assert_equal [ 0.4, 0.5, 0.6 ], response.message.content
end
end
This ensures developers get clear feedback about connection issues.
For more embedding patterns and examples, see the Embeddings Documentation.
Provider-Specific Parameters
Model Parameters
model
- Model name (e.g., "llama3", "mistral")embedding_model
- Embedding model name (e.g., "nomic-embed-text")temperature
- Controls randomness (0.0 to 1.0)top_p
- Nucleus samplingtop_k
- Top-k samplingnum_predict
- Maximum tokens to generatestop
- Stop sequencesseed
- For reproducible outputs
System Configuration
host
- Ollama server URL (default:http://localhost:11434
)timeout
- Request timeout in secondskeep_alive
- Keep model loaded in memory
Advanced Options
class AdvancedOllamaAgent < ApplicationAgent
generate_with :ollama,
model: "llama3",
options: {
num_ctx: 4096, # Context window size
num_gpu: 1, # Number of GPUs to use
num_thread: 8, # Number of threads
repeat_penalty: 1.1, # Penalize repetition
mirostat: 2, # Mirostat sampling
mirostat_tau: 5.0, # Mirostat tau parameter
mirostat_eta: 0.1 # Mirostat learning rate
}
end
Performance Optimization
Model Loading
Keep models in memory for faster responses:
class FastOllamaAgent < ApplicationAgent
generate_with :ollama,
model: "llama3",
keep_alive: "5m" # Keep model loaded for 5 minutes
def quick_response
@query = params[:query]
prompt
end
end
Hardware Acceleration
Configure GPU usage:
class GPUAgent < ApplicationAgent
generate_with :ollama,
model: "llama3",
options: {
num_gpu: -1, # Use all available GPUs
main_gpu: 0 # Primary GPU index
}
end
Quantization
Use quantized models for better performance:
# Pull quantized versions
ollama pull llama3:8b-q4_0 # 4-bit quantization
ollama pull llama3:8b-q5_1 # 5-bit quantization
class EfficientAgent < ApplicationAgent
# Use quantized model for faster inference
generate_with :ollama, model: "llama3:8b-q4_0"
end
Error Handling
Handle Ollama-specific errors:
class RobustOllamaAgent < ApplicationAgent
generate_with :ollama, model: "llama3"
rescue_from Faraday::ConnectionFailed do |error|
Rails.logger.error "Ollama connection failed: #{error.message}"
render_ollama_setup_instructions
end
rescue_from ActiveAgent::GenerationError do |error|
if error.message.include?("model not found")
pull_model_and_retry
else
raise
end
end
private
def pull_model_and_retry
system("ollama pull #{generation_provider.model}")
retry
end
def render_ollama_setup_instructions
"Ollama is not running. Start it with: ollama serve"
end
end
Testing
Test with Ollama locally:
class OllamaAgentTest < ActiveSupport::TestCase
setup do
skip "Ollama not available" unless ollama_available?
end
test "generates response with local model" do
response = OllamaAgent.with(
message: "Hello"
).prompt_context.generate_now
assert_not_nil response.message.content
doc_example_output(response)
end
private
def ollama_available?
response = Net::HTTP.get_response(URI("http://localhost:11434/api/tags"))
response.code == "200"
rescue
false
end
end
Development Workflow
Local Development Setup
# config/environments/development.rb
Rails.application.configure do
config.active_agent = {
ollama: {
host: ENV['OLLAMA_HOST'] || 'http://localhost:11434',
model: ENV['OLLAMA_MODEL'] || 'llama3',
options: {
num_ctx: 4096,
temperature: 0.7
}
}
}
end
Docker Compose Setup
# docker-compose.yml
version: '3.8'
services:
ollama:
image: ollama/ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
ollama_data:
Best Practices
- Pre-pull models - Download models before first use
- Monitor memory usage - Large models require significant RAM
- Use appropriate models - Balance size and capability
- Keep models loaded - Use keep_alive for frequently used models
- Implement fallbacks - Handle connection failures gracefully
- Use quantization - Reduce memory usage and increase speed
- Test locally - Ensure models work before deployment
Ollama-Specific Considerations
Privacy First
class PrivacyFirstAgent < ApplicationAgent
generate_with :ollama, model: "llama3"
def process_pii
@personal_data = params[:personal_data]
# Data stays local - no external API calls
Rails.logger.info "Processing PII locally with Ollama"
prompt instructions: "Process this data privately"
end
end
Model Management
class ModelManager
def self.ensure_model(model_name)
models = list_models
unless models.include?(model_name)
pull_model(model_name)
end
end
def self.list_models
response = HTTParty.get("http://localhost:11434/api/tags")
response["models"].map { |m| m["name"] }
end
def self.pull_model(model_name)
system("ollama pull #{model_name}")
end
def self.delete_model(model_name)
HTTParty.delete("http://localhost:11434/api/delete",
body: { name: model_name }.to_json,
headers: { 'Content-Type' => 'application/json' }
)
end
end
Deployment Considerations
# Ensure Ollama is available in production
class ApplicationAgent < ActiveAgent::Base
before_action :ensure_ollama_available, if: :using_ollama?
private
def using_ollama?
generation_provider.is_a?(ActiveAgent::GenerationProvider::OllamaProvider)
end
def ensure_ollama_available
HTTParty.get("#{ollama_host}/api/tags")
rescue => e
raise "Ollama is not available: #{e.message}"
end
def ollama_host
Rails.configuration.active_agent.dig(:ollama, :host)
end
end
Related Documentation
- Embeddings Framework - Complete guide to embeddings
- Generation Provider Overview
- OpenAI Provider - Cloud-based alternative with more models
- Configuration Guide
- Ollama Documentation
- Ollama Model Library - Available models including embedding models
- OpenRouter Provider - For cloud alternative