Ollama Provider

The Ollama provider enables local LLM inference using the Ollama platform. Run models like Llama 3, Mistral, and Gemma locally without sending data to external APIs, perfect for privacy-sensitive applications and development.

Configuration

Basic Setup

Configure Ollama in your agent:

ruby

class OllamaAgent < ApplicationAgent
  layout "agent"
  generate_with :ollama, model: "gemma3:latest", instructions: "You're a basic Ollama agent."
end

Configuration File

Set up Ollama in config/active_agent.yml:

active_agent.ymlactive_agent.yml

yaml

ollama: &ollama
  service: "Ollama"
  access_token: ""
  host: "http://localhost:11434"
  model: "gemma3:latest"
  temperature: 0.7

yaml

ollama:
  <<: *ollama

Environment Variables

Configure via environment:

bash

OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=llama3

Installing Ollama

macOS/Linux

bash

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Start Ollama service
ollama serve

# Pull a model
ollama pull llama3

Docker

bash

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec -it ollama ollama pull llama3

Supported Models

Popular Models

llama3 - Meta's Llama 3 (8B, 70B)
mistral - Mistral 7B
gemma - Google's Gemma (2B, 7B)
codellama - Code-specialized Llama
mixtral - Mixture of experts model
phi - Microsoft's Phi-2
neural-chat - Intel's fine-tuned model
qwen - Alibaba's Qwen models

List Available Models

ruby

class OllamaAdmin < ApplicationAgent
  generate_with :ollama
  
  def list_models
    # Get list of installed models
    response = HTTParty.get("#{ollama_host}/api/tags")
    response["models"]
  end
  
  private
  
  def ollama_host
    Rails.configuration.active_agent.dig(:ollama, :host) || "http://localhost:11434"
  end
end

Features

Local Inference

Run models completely offline:

ruby

class PrivateDataAgent < ApplicationAgent
  generate_with :ollama, model: "llama3"
  
  def process_sensitive_data
    @data = params[:sensitive_data]
    # Data never leaves your infrastructure
    prompt instructions: "Process this confidential information"
  end
end

Model Switching

Easily switch between models:

ruby

class MultiModelAgent < ApplicationAgent
  def code_review
    # Use specialized code model
    self.class.generate_with :ollama, model: "codellama"
    @code = params[:code]
    prompt
  end
  
  def general_chat
    # Use general purpose model
    self.class.generate_with :ollama, model: "llama3"
    @message = params[:message]
    prompt
  end
end

Custom Models

Use fine-tuned or custom models:

ruby

class CustomModelAgent < ApplicationAgent
  generate_with :ollama, model: "my-custom-model:latest"
  
  before_action :ensure_model_exists
  
  private
  
  def ensure_model_exists
    # Check if model is available
    models = fetch_available_models
    unless models.include?(generation_provider.model)
      raise "Model #{generation_provider.model} not found. Run: ollama pull #{generation_provider.model}"
    end
  end
end

Structured Output

Ollama can generate JSON-formatted responses through careful prompting and model selection. While Ollama doesn't have native structured output like OpenAI, many models can reliably produce JSON when properly instructed.

Approach

To get structured output from Ollama:

Choose the right model - Models like Llama 3, Mixtral, and Mistral are good at following formatting instructions
Use clear prompts - Explicitly request JSON format in your instructions
Set low temperature - Use values like 0.1-0.3 for more consistent formatting
Parse and validate - Always validate the response as it may not be valid JSON

Example Approach

ruby

class OllamaAgent < ApplicationAgent
  generate_with :ollama,
    model: "llama3",
    temperature: 0.1  # Lower temperature for consistency
  
  def extract_with_json_prompt
    prompt(
      instructions: <<~INST,
        You must respond ONLY with valid JSON.
        Extract the key information and format as:
        {"field1": "value", "field2": "value"}
        No explanation, just the JSON object.
      INST
      message: params[:text]
    )
  end
end

# Usage - parse with error handling
response = agent.extract_with_json_prompt.generate_now
begin
  data = JSON.parse(response.message.content)
rescue JSON::ParserError
  # Handle malformed JSON
end

Best Practices

Model Selection: Test different models to find which works best for your use case
Prompt Engineering: Be very explicit about JSON requirements
Validation: Always validate and handle parsing errors
Local Processing: Ideal for sensitive data that must stay on-premise

Limitations

No guaranteed JSON output like OpenAI's strict mode
Quality varies significantly by model
May require multiple attempts or fallback logic
Complex schemas may be challenging

For reliable structured output, consider using OpenAI or OpenRouter providers. For local processing requirements where Ollama is necessary, implement robust validation and error handling.

See the Structured Output guide for more information about structured output patterns.

Streaming Responses

Stream responses for better UX:

ruby

class StreamingOllamaAgent < ApplicationAgent
  generate_with :ollama, 
    model: "llama3",
    stream: true
  
  on_message_chunk do |chunk|
    # Handle streaming chunks
    Rails.logger.info "Chunk: #{chunk}"
    broadcast_to_client(chunk)
  end
  
  def chat
    prompt(message: params[:message])
  end
end

Embeddings Support

Generate embeddings locally using Ollama's embedding models. See the Embeddings Framework Documentation for comprehensive coverage.

Basic Embedding Generation

ruby

provider = ActiveAgent::GenerationProvider::OllamaProvider.new(@config)
prompt = ActiveAgent::ActionPrompt::Prompt.new(
  message: ActiveAgent::ActionPrompt::Message.new(content: "Generate an embedding for this text"),
  instructions: "You are an embedding test agent"
)

response = provider.embed(prompt)

Response Example

Connection Required

Ollama must be running locally. If you see connection errors, start Ollama with:

bash

ollama serve

Available Embedding Models

nomic-embed-text - High-quality text embeddings (768 dimensions)
mxbai-embed-large - Large embedding model (1024 dimensions)
all-minilm - Lightweight embeddings (384 dimensions)

Pull Embedding Models

bash

# Install embedding models
ollama pull nomic-embed-text
ollama pull mxbai-embed-large

Error Handling

Ollama provides helpful error messages when the service is not available:

ruby

require "test_helper"
require "openai"
require "active_agent/action_prompt"
require "active_agent/generation_provider/ollama_provider"

class OllamaProviderTest < ActiveSupport::TestCase
  setup do
    @config = {
      "service" => "Ollama",
      "model" => "gemma3:latest",
      "host" => "http://localhost:11434",
      "api_version" => "v1",
      "embedding_model" => "nomic-embed-text"
    }
    @provider = ActiveAgent::GenerationProvider::OllamaProvider.new(@config)

    @prompt = ActiveAgent::ActionPrompt::Prompt.new(
      message: ActiveAgent::ActionPrompt::Message.new(content: "Test content for embedding"),
      instructions: "You are a test agent"
    )
  end

  test "initializes with correct configuration" do
    assert_equal "gemma3:latest", @provider.instance_variable_get(:@model_name)
    assert_equal "http://localhost:11434", @provider.instance_variable_get(:@host)
    assert_equal "v1", @provider.instance_variable_get(:@api_version)

    client = @provider.instance_variable_get(:@client)
    assert_instance_of OpenAI::Client, client
  end

  test "uses default values when config values not provided" do
    minimal_config = {
      "service" => "Ollama",
      "model" => "llama2:latest"
    }
    provider = ActiveAgent::GenerationProvider::OllamaProvider.new(minimal_config)

    assert_equal "http://localhost:11434", provider.instance_variable_get(:@host)
    assert_equal "v1", provider.instance_variable_get(:@api_version)
  end

  test "embeddings_parameters returns correct structure" do
    params = @provider.send(:embeddings_parameters, input: "Test text", model: "nomic-embed-text")

    assert_equal "nomic-embed-text", params[:model]
    assert_equal "Test text", params[:input]
  end

  test "embeddings_parameters uses config embedding_model when available" do
    params = @provider.send(:embeddings_parameters, input: "Test text")

    assert_equal "nomic-embed-text", params[:model]
    assert_equal "Test text", params[:input]
  end

  test "embeddings_parameters uses prompt message content by default" do
    @provider.instance_variable_set(:@prompt, @prompt)
    params = @provider.send(:embeddings_parameters)

    assert_equal "nomic-embed-text", params[:model]
    assert_equal "Test content for embedding", params[:input]
  end

  test "embeddings_response creates proper response object" do
    mock_response = {
      "embedding" => [ 0.1, 0.2, 0.3, 0.4, 0.5 ],
      "model" => "nomic-embed-text",
      "created" => 1234567890
    }

    request_params = {
      model: "nomic-embed-text",
      input: "Test text"
    }

    @provider.instance_variable_set(:@prompt, @prompt)
    response = @provider.send(:embeddings_response, mock_response, request_params)

    assert_instance_of ActiveAgent::GenerationProvider::Response, response
    assert_equal @prompt, response.prompt
    assert_instance_of ActiveAgent::ActionPrompt::Message, response.message
    assert_equal [ 0.1, 0.2, 0.3, 0.4, 0.5 ], response.message.content
    assert_equal "assistant", response.message.role
    assert_equal mock_response, response.raw_response
    assert_equal request_params, response.raw_request
  end

  test "embed method works with Ollama provider" do
    VCR.use_cassette("ollama_provider_embed") do
      # region ollama_provider_embed
      provider = ActiveAgent::GenerationProvider::OllamaProvider.new(@config)
      prompt = ActiveAgent::ActionPrompt::Prompt.new(
        message: ActiveAgent::ActionPrompt::Message.new(content: "Generate an embedding for this text"),
        instructions: "You are an embedding test agent"
      )

      response = provider.embed(prompt)
      # endregion ollama_provider_embed

      assert_not_nil response
      assert_instance_of ActiveAgent::GenerationProvider::Response, response
      assert_not_nil response.message.content
      assert_kind_of Array, response.message.content
      assert response.message.content.all? { |val| val.is_a?(Numeric) }

      doc_example_output(response)
    rescue Errno::ECONNREFUSED, Net::OpenTimeout, Net::ReadTimeout => e
      skip "Ollama is not running locally: #{e.message}"
    end
  end

  test "embed method provides helpful error when Ollama not running" do
    # Configure with a bad port to simulate Ollama not running
    # Disable VCR for this test to allow actual connection failure
    VCR.turn_off!
    WebMock.allow_net_connect!

    bad_config = @config.merge("host" => "http://localhost:99999")
    provider = ActiveAgent::GenerationProvider::OllamaProvider.new(bad_config)
    prompt = ActiveAgent::ActionPrompt::Prompt.new(
      message: ActiveAgent::ActionPrompt::Message.new(content: "Test embedding"),
      instructions: "Test agent"
    )

    error = assert_raises(ActiveAgent::GenerationProvider::Base::GenerationProviderError) do
      provider.embed(prompt)
    end

    assert_match(/Unable to connect to Ollama at http:\/\/localhost:99999/, error.message)
    assert_match(/Please ensure Ollama is running/, error.message)
    assert_match(/ollama serve/, error.message)
  ensure
    VCR.turn_on!
    WebMock.disable_net_connect!
  end

  test "inherits from OpenAIProvider" do
    assert ActiveAgent::GenerationProvider::OllamaProvider < ActiveAgent::GenerationProvider::OpenAIProvider
  end

  test "overrides embeddings methods from parent class" do
    # Verify that OllamaProvider has its own implementation of these methods
    assert @provider.respond_to?(:embeddings_parameters, true)
    assert @provider.respond_to?(:embeddings_response, true)

    # Verify the methods are defined in OllamaProvider, not just inherited
    ollama_methods = ActiveAgent::GenerationProvider::OllamaProvider.instance_methods(false)
    assert_includes ollama_methods, :embeddings_parameters
    assert_includes ollama_methods, :embeddings_response
  end

  test "handles Ollama-specific embedding format" do
    # Test native Ollama format
    ollama_response = {
      "embedding" => [ 0.1, 0.2, 0.3 ],
      "model" => "nomic-embed-text"
    }

    @provider.instance_variable_set(:@prompt, @prompt)
    response = @provider.send(:embeddings_response, ollama_response)

    assert_equal [ 0.1, 0.2, 0.3 ], response.message.content
  end

  test "handles OpenAI-compatible embedding format from Ollama" do
    # Test OpenAI-compatible format that newer Ollama versions return
    openai_format_response = {
      "data" => [
        {
          "embedding" => [ 0.4, 0.5, 0.6 ],
          "object" => "embedding"
        }
      ],
      "model" => "nomic-embed-text",
      "object" => "list"
    }

    @provider.instance_variable_set(:@prompt, @prompt)
    response = @provider.send(:embeddings_response, openai_format_response)

    assert_equal [ 0.4, 0.5, 0.6 ], response.message.content
  end
end

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184

This ensures developers get clear feedback about connection issues.

For more embedding patterns and examples, see the Embeddings Documentation.

Provider-Specific Parameters

Model Parameters

model - Model name (e.g., "llama3", "mistral")
embedding_model - Embedding model name (e.g., "nomic-embed-text")
temperature - Controls randomness (0.0 to 1.0)
top_p - Nucleus sampling
top_k - Top-k sampling
num_predict - Maximum tokens to generate
stop - Stop sequences
seed - For reproducible outputs

System Configuration

host - Ollama server URL (default: http://localhost:11434)
timeout - Request timeout in seconds
keep_alive - Keep model loaded in memory

Advanced Options

ruby

class AdvancedOllamaAgent < ApplicationAgent
  generate_with :ollama,
    model: "llama3",
    options: {
      num_ctx: 4096,      # Context window size
      num_gpu: 1,         # Number of GPUs to use
      num_thread: 8,      # Number of threads
      repeat_penalty: 1.1, # Penalize repetition
      mirostat: 2,        # Mirostat sampling
      mirostat_tau: 5.0,  # Mirostat tau parameter
      mirostat_eta: 0.1   # Mirostat learning rate
    }
end

Performance Optimization

Model Loading

Keep models in memory for faster responses:

ruby

class FastOllamaAgent < ApplicationAgent
  generate_with :ollama,
    model: "llama3",
    keep_alive: "5m"  # Keep model loaded for 5 minutes
  
  def quick_response
    @query = params[:query]
    prompt
  end
end

Hardware Acceleration

Configure GPU usage:

ruby

class GPUAgent < ApplicationAgent
  generate_with :ollama,
    model: "llama3",
    options: {
      num_gpu: -1,  # Use all available GPUs
      main_gpu: 0   # Primary GPU index
    }
end

Quantization

Use quantized models for better performance:

bash

# Pull quantized versions
ollama pull llama3:8b-q4_0  # 4-bit quantization
ollama pull llama3:8b-q5_1  # 5-bit quantization

ruby

class EfficientAgent < ApplicationAgent
  # Use quantized model for faster inference
  generate_with :ollama, model: "llama3:8b-q4_0"
end

Error Handling

Handle Ollama-specific errors:

ruby

class RobustOllamaAgent < ApplicationAgent
  generate_with :ollama, model: "llama3"
  
  rescue_from Faraday::ConnectionFailed do |error|
    Rails.logger.error "Ollama connection failed: #{error.message}"
    render_ollama_setup_instructions
  end
  
  rescue_from ActiveAgent::GenerationError do |error|
    if error.message.include?("model not found")
      pull_model_and_retry
    else
      raise
    end
  end
  
  private
  
  def pull_model_and_retry
    system("ollama pull #{generation_provider.model}")
    retry
  end
  
  def render_ollama_setup_instructions
    "Ollama is not running. Start it with: ollama serve"
  end
end

Testing

Test with Ollama locally:

ruby

class OllamaAgentTest < ActiveSupport::TestCase
  setup do
    skip "Ollama not available" unless ollama_available?
  end
  
  test "generates response with local model" do
    response = OllamaAgent.with(
      message: "Hello"
    ).prompt_context.generate_now
    
    assert_not_nil response.message.content
    doc_example_output(response)
  end
  
  private
  
  def ollama_available?
    response = Net::HTTP.get_response(URI("http://localhost:11434/api/tags"))
    response.code == "200"
  rescue
    false
  end
end

Development Workflow

Local Development Setup

ruby

# config/environments/development.rb
Rails.application.configure do
  config.active_agent = {
    ollama: {
      host: ENV['OLLAMA_HOST'] || 'http://localhost:11434',
      model: ENV['OLLAMA_MODEL'] || 'llama3',
      options: {
        num_ctx: 4096,
        temperature: 0.7
      }
    }
  }
end

Docker Compose Setup

yaml

# docker-compose.yml
version: '3.8'
services:
  ollama:
    image: ollama/ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

volumes:
  ollama_data:

Best Practices

Pre-pull models - Download models before first use
Monitor memory usage - Large models require significant RAM
Use appropriate models - Balance size and capability
Keep models loaded - Use keep_alive for frequently used models
Implement fallbacks - Handle connection failures gracefully
Use quantization - Reduce memory usage and increase speed
Test locally - Ensure models work before deployment

Ollama-Specific Considerations

Privacy First

ruby

class PrivacyFirstAgent < ApplicationAgent
  generate_with :ollama, model: "llama3"
  
  def process_pii
    @personal_data = params[:personal_data]
    
    # Data stays local - no external API calls
    Rails.logger.info "Processing PII locally with Ollama"
    
    prompt instructions: "Process this data privately"
  end
end

Model Management

ruby

class ModelManager
  def self.ensure_model(model_name)
    models = list_models
    unless models.include?(model_name)
      pull_model(model_name)
    end
  end
  
  def self.list_models
    response = HTTParty.get("http://localhost:11434/api/tags")
    response["models"].map { |m| m["name"] }
  end
  
  def self.pull_model(model_name)
    system("ollama pull #{model_name}")
  end
  
  def self.delete_model(model_name)
    HTTParty.delete("http://localhost:11434/api/delete", 
      body: { name: model_name }.to_json,
      headers: { 'Content-Type' => 'application/json' }
    )
  end
end

Deployment Considerations

ruby

# Ensure Ollama is available in production
class ApplicationAgent < ActiveAgent::Base
  before_action :ensure_ollama_available, if: :using_ollama?
  
  private
  
  def using_ollama?
    generation_provider.is_a?(ActiveAgent::GenerationProvider::OllamaProvider)
  end
  
  def ensure_ollama_available
    HTTParty.get("#{ollama_host}/api/tags")
  rescue => e
    raise "Ollama is not available: #{e.message}"
  end
  
  def ollama_host
    Rails.configuration.active_agent.dig(:ollama, :host)
  end
end

Embeddings Framework - Complete guide to embeddings
Generation Provider Overview
OpenAI Provider - Cloud-based alternative with more models
Configuration Guide
Ollama Documentation
Ollama Model Library - Available models including embedding models
OpenRouter Provider - For cloud alternative

Ollama Provider ​

Configuration ​

Basic Setup ​

Configuration File ​

Environment Variables ​

Installing Ollama ​

macOS/Linux ​

Docker ​

Supported Models ​

Popular Models ​

List Available Models ​

Features ​

Local Inference ​

Model Switching ​

Custom Models ​

Structured Output ​

Approach ​

Example Approach ​

Best Practices ​

Limitations ​

Streaming Responses ​

Embeddings Support ​

Basic Embedding Generation ​

Available Embedding Models ​

Pull Embedding Models ​

Error Handling ​

Provider-Specific Parameters ​

Model Parameters ​

System Configuration ​

Advanced Options ​

Performance Optimization ​

Model Loading ​

Hardware Acceleration ​

Quantization ​

Error Handling ​

Testing ​

Development Workflow ​

Local Development Setup ​

Docker Compose Setup ​

Best Practices ​

Ollama-Specific Considerations ​

Privacy First ​

Model Management ​

Deployment Considerations ​

Related Documentation ​

Ollama Provider

Configuration

Basic Setup

Configuration File

Environment Variables

Installing Ollama

macOS/Linux

Docker

Supported Models

Popular Models

List Available Models

Features

Local Inference

Model Switching

Custom Models

Structured Output

Approach

Example Approach

Best Practices

Limitations

Streaming Responses

Embeddings Support

Basic Embedding Generation

Available Embedding Models

Pull Embedding Models

Error Handling

Provider-Specific Parameters

Model Parameters

System Configuration

Advanced Options

Performance Optimization

Model Loading

Hardware Acceleration

Quantization

Error Handling

Testing

Development Workflow

Local Development Setup

Docker Compose Setup

Best Practices

Ollama-Specific Considerations

Privacy First

Model Management

Deployment Considerations

Related Documentation