Skip to content

Browser Use Agent

Active Agent provides browser automation capabilities through the Browser Use Agent (similar to Anthropic's Computer Use), which can navigate web pages, interact with elements, extract content, and take screenshots using Cuprite/Chrome.

Overview

The Browser Use Agent demonstrates how ActiveAgent can integrate with external tools like headless browsers to create powerful automation workflows. Following the naming convention of tools like Anthropic's Computer Use, it provides AI-driven browser control using familiar Rails patterns.

Features

  • Navigate to URLs - Direct browser navigation to any website
  • Click elements - Click buttons, links, or any element using CSS selectors or text
  • Extract content - Extract text from specific elements or entire pages
  • Take screenshots - Capture full page or specific areas with HD resolution (1920x1080)
  • Fill forms - Interact with form fields programmatically
  • Extract links - Gather links from pages with optional preview screenshots
  • Smart content detection - Automatically detect and focus on main content areas

Setup

Generate a browser use agent:

bash
rails generate active_agent:agent browser_use navigate click extract_text screenshot

Agent Implementation

ruby
require "capybara"
require "capybara/cuprite"

class BrowserAgent < ApplicationAgent
  # Configure AI provider for intelligent automation
  generate_with :openai,
    model: "gpt-4o-mini"

  class_attribute :browser_session, default: nil

  # Navigate to a URL
  def navigate
    setup_browser_if_needed

    @url = params[:url]
    Rails.logger.info "Navigating to #{@url}"

    begin
      self.class.browser_session.visit(@url)
      @status = 200
      @current_url = self.class.browser_session.current_url
      @title = self.class.browser_session.title
    rescue => e
      @status = 500
      @error = e.message
      Rails.logger.error "Navigation failed: #{e.message}"
    end

    prompt
  end

  # Click on an element
  def click
    setup_browser_if_needed

    @selector = params[:selector]
    @text = params[:text]
    Rails.logger.info "Clicking on element: selector=#{@selector}, text=#{@text}"

    begin
      if @text
        self.class.browser_session.click_on(@text)
      elsif @selector
        self.class.browser_session.find(@selector).click
      end
      @success = true
      @current_url = self.class.browser_session.current_url
    rescue => e
      @success = false
      @error = e.message
      Rails.logger.error "Click failed: #{e.message}"
    end

    prompt
  end

  # Fill in a form field
  def fill_form
    setup_browser_if_needed

    @field = params[:field]
    @value = params[:value]
    @selector = params[:selector]
    Rails.logger.info "Filling form field: field=#{@field}, selector=#{@selector}"

    begin
      if @selector
        self.class.browser_session.find(@selector).set(@value)
      else
        self.class.browser_session.fill_in(@field, with: @value)
      end
      @success = true
    rescue => e
      @success = false
      @error = e.message
      Rails.logger.error "Fill form failed: #{e.message}"
    end

    prompt
  end

  # Extract text from the page
  def extract_text
    setup_browser_if_needed

    @selector = params[:selector] || "body"
    Rails.logger.info "Extracting text from #{@selector}"

    begin
      element = self.class.browser_session.find(@selector)
      @text = element.text
      @success = true
    rescue => e
      @success = false
      @error = e.message
      Rails.logger.error "Extract text failed: #{e.message}"
    end

    prompt
  end

  # Get current page info
  def page_info
    setup_browser_if_needed

    Rails.logger.info "Getting page info"

    begin
      @current_url = self.class.browser_session.current_url
      @title = self.class.browser_session.title
      @has_css = {}

      # Check for common elements
      [ "form", "input", "button", "a", "img" ].each do |tag|
        @has_css[tag] = self.class.browser_session.has_css?(tag)
      end

      @success = true
    rescue => e
      @success = false
      @error = e.message
      Rails.logger.error "Page info failed: #{e.message}"
    end

    prompt
  end

  # Extract all links from the page
  def extract_links
    setup_browser_if_needed

    @selector = params[:selector] || "body"
    @limit = params[:limit] || 10
    Rails.logger.info "Extracting links from #{@selector}"

    begin
      @links = []
      within_element = (@selector == "body") ? self.class.browser_session : self.class.browser_session.find(@selector)

      within_element.all("a", visible: true).first(@limit).each do |link|
        href = link["href"]
        next if href.nil? || href.empty? || href.start_with?("#")

        @links << {
          text: link.text.strip,
          href: href,
          title: link["title"]
        }
      end

      @success = true
      @current_url = self.class.browser_session.current_url
    rescue => e
      @success = false
      @error = e.message
      @links = []
      Rails.logger.error "Extract links failed: #{e.message}"
    end

    prompt
  end

  # Follow a link by text or href
  def follow_link
    setup_browser_if_needed

    @text = params[:text]
    @href = params[:href]
    Rails.logger.info "Following link: text=#{@text}, href=#{@href}"

    begin
      if @text
        self.class.browser_session.click_link(@text)
      elsif @href
        link = self.class.browser_session.find("a[href*='#{@href}']")
        link.click
      else
        raise "Must provide either text or href parameter"
      end

      # Wait for navigation
      sleep 0.5

      @success = true
      @current_url = self.class.browser_session.current_url
      @title = self.class.browser_session.title
    rescue => e
      @success = false
      @error = e.message
      Rails.logger.error "Follow link failed: #{e.message}"
    end

    prompt
  end

  # Go back to previous page
  def go_back
    setup_browser_if_needed

    Rails.logger.info "Going back to previous page"

    begin
      self.class.browser_session.go_back
      sleep 0.5

      @success = true
      @current_url = self.class.browser_session.current_url
      @title = self.class.browser_session.title
    rescue => e
      @success = false
      @error = e.message
      Rails.logger.error "Go back failed: #{e.message}"
    end

    prompt
  end

  # Extract main content (useful for Wikipedia and articles)
  def extract_main_content
    setup_browser_if_needed

    Rails.logger.info "Extracting main content"

    begin
      # Try common content selectors
      content_selectors = [
        "#mw-content-text", # Wikipedia
        "main",
        "article",
        "[role='main']",
        ".content",
        "#content"
      ]

      @content = nil
      content_selectors.each do |selector|
        if self.class.browser_session.has_css?(selector)
          element = self.class.browser_session.find(selector)
          @content = element.text
          @selector_used = selector
          break
        end
      end

      @content ||= self.class.browser_session.find("body").text
      @current_url = self.class.browser_session.current_url
      @title = self.class.browser_session.title
      @success = true
    rescue => e
      @success = false
      @error = e.message
      Rails.logger.error "Extract main content failed: #{e.message}"
    end

    prompt
  end

  # Take a screenshot of the current page
  def screenshot
    setup_browser_if_needed

    @filename = params[:filename] || "screenshot_#{Time.now.to_i}.png"
    @full_page = params[:full_page] || false
    @selector = params[:selector]
    @area = params[:area] # { x: 0, y: 0, width: 400, height: 300 }
    @main_content_only = params[:main_content_only] != false # Default to true

    # Ensure tmp/screenshots directory exists
    screenshot_dir = Rails.root.join("tmp", "screenshots")
    FileUtils.mkdir_p(screenshot_dir)

    @path = screenshot_dir.join(@filename)
    Rails.logger.info "Taking screenshot: #{@filename}"

    begin
      # Build screenshot options
      options = { path: @path }

      # If main_content_only is true and no specific selector/area provided, try to detect main content
      if @main_content_only && !@selector && !@area
        main_area = detect_main_content_area
        if main_area
          options[:area] = main_area
          Rails.logger.info "Auto-cropping to main content area: #{main_area.inspect}"
        end
      else
        # Add full page option
        options[:full] = true if @full_page

        # Add selector option (for element screenshots)
        options[:selector] = @selector if @selector.present?

        # Add area option (for specific region screenshots)
        if @area.present?
          # Ensure area has the required keys and convert to symbol keys
          area_hash = {}
          area_hash[:x] = @area["x"] || @area[:x] if @area["x"] || @area[:x]
          area_hash[:y] = @area["y"] || @area[:y] if @area["y"] || @area[:y]
          area_hash[:width] = @area["width"] || @area[:width] if @area["width"] || @area[:width]
          area_hash[:height] = @area["height"] || @area[:height] if @area["height"] || @area[:height]

          options[:area] = area_hash if area_hash.any?
        end
      end

      # Take the screenshot with options
      self.class.browser_session.save_screenshot(**options)

      @success = true
      @filepath = @path.to_s
      @current_url = self.class.browser_session.current_url
      @title = self.class.browser_session.title

      # Generate a relative path for display
      @relative_path = @path.relative_path_from(Rails.root).to_s
    rescue => e
      @success = false
      @error = e.message
      Rails.logger.error "Screenshot failed: #{e.message}"
    end

    prompt
  end

  # Extract links with preview screenshots
  def extract_links_with_previews
    setup_browser_if_needed

    @selector = params[:selector] || "body"
    @limit = params[:limit] || 5
    Rails.logger.info "Extracting links with previews from #{@selector}"

    begin
      @links = []
      @original_url = self.class.browser_session.current_url
      within_element = (@selector == "body") ? self.class.browser_session : self.class.browser_session.find(@selector)

      # Get unique links
      all_links = within_element.all("a", visible: true)
      unique_links = {}

      all_links.each do |link|
        href = link["href"]
        next if href.nil? || href.empty? || href.start_with?("#") || href.start_with?("javascript:")

        # Normalize URL
        full_url = URI.join(@original_url, href).to_s rescue next
        next if unique_links.key?(full_url)

        unique_links[full_url] = {
          text: link.text.strip,
          href: full_url,
          title: link["title"]
        }

        break if unique_links.size >= @limit
      end

      # Take screenshots of each link
      unique_links.each_with_index do |(url, link_data), index|
        begin
          # Visit the link
          self.class.browser_session.visit(url)
          sleep 0.5 # Wait for page to load

          # Take a screenshot
          screenshot_filename = "preview_#{index}_#{Time.now.to_i}.png"
          screenshot_path = Rails.root.join("tmp", "screenshots", screenshot_filename)
          FileUtils.mkdir_p(File.dirname(screenshot_path))
          self.class.browser_session.save_screenshot(screenshot_path)

          link_data[:screenshot] = screenshot_path.relative_path_from(Rails.root).to_s
          link_data[:page_title] = self.class.browser_session.title

          @links << link_data
        rescue => e
          Rails.logger.warn "Failed to preview #{url}: #{e.message}"
          @links << link_data # Add without screenshot
        end
      end

      # Return to original page
      self.class.browser_session.visit(@original_url)

      @success = true
      @current_url = @original_url
    rescue => e
      @success = false
      @error = e.message
      @links = []
      Rails.logger.error "Extract links with previews failed: #{e.message}"
    end

    prompt
  end

  private

  def setup_browser_if_needed
    return if self.class.browser_session

    # Configure Cuprite driver if not already configured
    unless Capybara.drivers[:cuprite_agent]
      Capybara.register_driver :cuprite_agent do |app|
        Capybara::Cuprite::Driver.new(
          app,
          window_size: [ 1920, 1080 ], # Standard HD resolution
          browser_options: {
            "no-sandbox": nil,
            "disable-gpu": nil,
            "disable-dev-shm-usage": nil
          },
          inspector: false,
          headless: true
        )
      end
    end

    # Create a shared session for this agent class
    self.class.browser_session = Capybara::Session.new(:cuprite_agent)
  end

  def detect_main_content_area
    # Try to detect main content area based on common selectors
    main_selectors = [
      "main",                    # HTML5 main element
      "[role='main']",          # ARIA role
      "#main-content",          # Common ID
      "#main",                  # Common ID
      "#content",               # Common ID
      ".main-content",          # Common class
      ".content",              # Common class
      "article",               # Article element
      "#mw-content-text",      # Wikipedia
      ".container",            # Bootstrap/common framework
      "#root > div > main",    # React apps
      "body > div:nth-child(2)" # Fallback to second div
    ]

    main_selectors.each do |selector|
      if self.class.browser_session.has_css?(selector, wait: 0)
        begin
          # Get element position and dimensions using JavaScript
          rect = self.class.browser_session.evaluate_script(<<-JS)
            (function() {
              var elem = document.querySelector('#{selector}');
              if (!elem) return null;
              var rect = elem.getBoundingClientRect();
              return {
                x: Math.round(rect.left + window.scrollX),
                y: Math.round(rect.top + window.scrollY),
                width: Math.round(rect.width),
                height: Math.round(rect.height)
              };
            })()
          JS

          if rect && rect["width"] > 0 && rect["height"] > 0
            # Start from the element's Y position or skip header if element is at top
            start_y = (rect["y"] < 100) ? 150 : rect["y"]

            # Always use full viewport width and height from start_y
            return {
              x: 0,
              y: start_y,
              width: 1920,
              height: 1080 - start_y  # Full height minus the offset
            }
          end
        rescue => e
          Rails.logger.warn "Failed to get dimensions for #{selector}: #{e.message}"
        end
      end
    end

    # Default fallback: skip typical header area but keep full height
    {
      x: 0,
      y: 150,  # Skip typical header height
      width: 1920,
      height: 930  # 1080 - 150 = 930 to stay within viewport
    }
  end
end
erb
You are a browser automation agent that can navigate web pages and interact with web elements using Cuprite/Chrome.

You have access to the following browser actions:
<% controller.action_schemas.each do |schema| %>
- <%= schema["name"] %>: <%= schema["description"] %>
<% end %>

<% if params[:url].present? %>
Starting URL: <%= params[:url] %>
You should navigate to this URL first to begin your research.
<% end %>

Use these tools to help users automate web browsing tasks, extract information from websites, and perform user interactions.

When researching a topic:
1. Navigate to the provided URL or search for relevant pages
2. Extract the main content to understand the topic
3. Use the click action with specific text to navigate to related pages (e.g., click text: "Neil Armstrong")
4. Use go_back to return to previous pages when needed
5. Provide a comprehensive summary with reference URLs

Tips for efficient browsing:
- Use click with text parameter for navigating to specific links rather than extract_links_with_previews
- For Wikipedia: Use selector "#mw-content-text" when extracting links to focus on article content
- Extract main content before navigating away from important pages

Screenshot tips (browser is 1920x1080 HD resolution):
- To capture main content without headers, use the area parameter: { "x": 0, "y": 150, "width": 1920, "height": 930 }
- For Wikipedia articles, consider: { "x": 0, "y": 200, "width": 1920, "height": 880 } to skip navigation
- For specific elements, use the selector parameter (e.g., selector: "#mw-content-text")
- Full page screenshots capture everything, but cropped areas often look cleaner
- Default screenshots automatically try to crop to main content, but you can override with main_content_only: false
ruby
  json.name action_name
  json.description "Take a screenshot of the current page"
  json.parameters do
    json.type "object"
    json.properties do
      json.filename do
        json.type "string"
        json.description "Name for the screenshot file"
      end
      json.full_page do
        json.type "boolean"
        json.description "Whether to capture the full page (true) or just viewport (false)"
      end
      json.main_content_only do
        json.type "boolean"
        json.description "Automatically detect and crop to main content area, excluding headers (default: true)"
      end
      json.selector do
        json.type "string"
        json.description "CSS selector for a specific element to screenshot (optional)"
      end
      json.area do
        json.type "object"
        json.description "Specific area of the page to capture (optional)"
        json.properties do
          json.x do
            json.type "integer"
            json.description "X coordinate of the top-left corner"
          end
          json.y do
            json.type "integer"
            json.description "Y coordinate of the top-left corner"
          end
          json.width do
            json.type "integer"
            json.description "Width of the area to capture"
          end
          json.height do
            json.type "integer"
            json.description "Height of the area to capture"
          end
        end
      end
    end
  end

Basic Navigation Example

The browser use agent can navigate to URLs and interact with pages using AI:

ruby
response = BrowserAgent.with(
  message: "Navigate to https://www.example.com and tell me what you see"
).prompt_context.generate_now

assert response.message.content.present?
Navigation Response Example

AI-Driven Browser Control

The browser use agent can use AI to determine which actions to take:

ruby
response = BrowserAgent.with(
  message: "Go to https://www.example.com and extract the main heading"
).prompt_context.generate_now

# Check that AI used the tools
assert response.prompt.messages.any? { |m| m.role == :tool }
assert response.message.content.present?
AI Browser Response Example

Direct Action Usage

You can also call browser actions directly without AI:

ruby
# Call navigate action directly (synchronous execution)
navigate_response = BrowserAgent.with(
  url: "https://www.example.com"
).navigate

# The action returns a Generation object
assert_kind_of ActiveAgent::Generation, navigate_response

# Execute the generation
result = navigate_response.generate_now

assert result.message.content.include?("navigated") || result.message.content.include?("Failed") || result.message.content.include?("Example")
Direct Action Response Example

Wikipedia Research Example

The browser use agent excels at research tasks, navigating between pages and gathering information:

ruby
response = BrowserAgent.with(
  message: "Research the Apollo 11 moon landing mission. Start at the main Wikipedia article, then:
            1) Extract the main content to get an overview
            2) Find and follow links to learn about the crew members (Neil Armstrong, Buzz Aldrin, Michael Collins)
            3) Take screenshots of important pages
            4) Extract key dates, mission objectives, and historical significance
            5) Look for related missions or events by exploring relevant links
            Please provide a comprehensive summary with details about the mission, crew, and its impact on space exploration.",
  url: "https://en.wikipedia.org/wiki/Apollo_11"
).prompt_context.generate_now

# The agent should navigate to Wikipedia and gather information
assert response.message.content.present?
assert response.message.content.downcase.include?("apollo") ||
  response.message.content.downcase.include?("moon") ||
  response.message.content.downcase.include?("armstrong") ||
  response.message.content.downcase.include?("nasa")

# Check that multiple tools were used
tool_messages = response.prompt.messages.select { |m| m.role == :tool }
assert tool_messages.any?, "Should have used tools"

# Check for variety in tool usage (the agent should use multiple different tools)
assistant_messages = response.prompt.messages.select { |m| m.role == :assistant }
tool_names = []
assistant_messages.each do |msg|
  if msg.requested_actions&.any?
    tool_names.concat(msg.requested_actions.map(&:name))
  end
end
tool_names.uniq!

assert tool_names.length > 2, "Should use at least 3 different tools for comprehensive research"
Wikipedia Research Response Example

Area Screenshot Example

Take screenshots of specific page regions:

ruby
response = BrowserAgent.with(
  message: "Navigate to https://www.example.com and take a screenshot of just the header area (top 200 pixels)"
).prompt_context.generate_now

assert response.message.content.present?

# Check that screenshot tool was used
tool_messages = response.prompt.messages.select { |m| m.role == :tool }
assert tool_messages.any? { |m| m.content.include?("screenshot") }, "Should have taken a screenshot"
Area Screenshot Response Example

Main Content Auto-Cropping

The browser use agent can automatically detect and crop to main content areas:

ruby
response = BrowserAgent.with(
  message: "Navigate to Wikipedia's Apollo 11 page and take a screenshot of the main content (should automatically exclude navigation/header)"
).prompt_context.generate_now

assert response.message.content.present?

# Check that screenshot was taken
tool_messages = response.prompt.messages.select { |m| m.role == :tool }
assert tool_messages.any? { |m| m.content.include?("screenshot") }, "Should have taken a screenshot"

# Check that the agent navigated to Wikipedia
assert tool_messages.any? { |m| m.content.include?("wikipedia") }, "Should have navigated to Wikipedia"
Main Content Crop Response Example

Screenshot Capabilities

The screenshot action provides multiple options for capturing page content:

Full Page Screenshot

ruby
BrowserAgent.with(
  url: "https://example.com"
).navigate.generate_now

BrowserAgent.new.screenshot(
  filename: "full_page.png",
  full_page: true
)

Area Screenshot

ruby
BrowserAgent.new.screenshot(
  filename: "header.png",
  area: { x: 0, y: 0, width: 1920, height: 200 }
)

Element Screenshot

ruby
BrowserAgent.new.screenshot(
  filename: "content.png",
  selector: "#main-content"
)

Auto-Crop to Main Content

ruby
BrowserAgent.new.screenshot(
  filename: "main.png",
  main_content_only: true  # Default behavior
)

Browser Configuration

The browser runs in HD resolution (1920x1080) with headless Chrome:

ruby
def setup_browser_if_needed
  Capybara.register_driver :cuprite_agent do |app|
    Capybara::Cuprite::Driver.new(
      app,
      window_size: [1920, 1080], # HD resolution
      browser_options: {
        "no-sandbox": nil,
        "disable-gpu": nil,
        "disable-dev-shm-usage": nil
      },
      inspector: false,
      headless: true
    )
  end
end

Smart Content Detection

The browser use agent includes intelligent content detection that:

  • Identifies main content areas using common selectors
  • Skips headers and navigation automatically
  • Adjusts cropping based on page structure
  • Falls back to sensible defaults

Common selectors checked:

  • main, [role='main']
  • #main-content, #content
  • article
  • #mw-content-text (Wikipedia)
  • .container (Bootstrap)

Tips for Effective Use

  • Use click with text parameter for specific links
  • Extract main content before navigating away
  • Use go_back to return to previous pages
  • Take screenshots of important pages

Wikipedia Research

  • Use selector #mw-content-text for article content
  • Click directly on relevant links rather than extracting all links
  • Take screenshots with main_content_only: true to exclude navigation

Screenshot Optimization

  • Default main_content_only: true crops out headers automatically
  • Use area parameter for specific regions: { x: 0, y: 150, width: 1920, height: 930 }
  • For Wikipedia, consider y: 200 to skip navigation bars
  • Full page screenshots available with full_page: true

Integration with Rails

The Browser Use Agent integrates seamlessly with Rails applications:

ruby
class WebScraperController < ApplicationController
  def scrape
    response = BrowserAgent.with(
      message: params[:instructions],
      url: params[:url]
    ).prompt_context.generate_now
    
    render json: {
      content: response.message.content,
      screenshots: response.prompt.messages
        .select { |m| m.role == :tool && m.content.include?("screenshot") }
        .map { |m| m.content.match(/File: (.+?)\\n/)[1] }
    }
  end
end

Advanced Usage

Multi-Page Navigation Flow

ruby
agent = BrowserAgent.new

# Navigate to main page
agent.navigate(url: "https://example.com")

# Extract main content
content = agent.extract_main_content

# Click specific link
agent.click(text: "Learn More")

# Take screenshot of new page
agent.screenshot(main_content_only: true)

# Go back
agent.go_back

# Extract links for further exploration
links = agent.extract_links(selector: "#main-content")

Form Interaction

ruby
agent = BrowserAgent.new
agent.navigate(url: "https://example.com/form")
agent.fill_form(field: "email", value: "test@example.com")
agent.fill_form(field: "message", value: "Hello world")
agent.click(text: "Submit")
agent.screenshot(filename: "form_result.png")

Requirements

  • Cuprite gem for Chrome automation
  • Chrome or Chromium browser installed
  • Capybara for browser session management

Add to your Gemfile:

ruby
gem 'cuprite'
gem 'capybara'

Conclusion

The Browser Use Agent demonstrates ActiveAgent's flexibility in integrating with external tools while maintaining Rails conventions. Following the pattern of tools like Anthropic's Computer Use, it provides powerful browser automation capabilities driven by AI, making it ideal for:

  • Web scraping and data extraction
  • Automated testing and verification
  • Research and information gathering
  • Screenshot generation for documentation
  • Form submission and interaction