Browser Use Agent
Active Agent provides browser automation capabilities through the Browser Use Agent (similar to Anthropic's Computer Use), which can navigate web pages, interact with elements, extract content, and take screenshots using Cuprite/Chrome.
Overview
The Browser Use Agent demonstrates how ActiveAgent can integrate with external tools like headless browsers to create powerful automation workflows. Following the naming convention of tools like Anthropic's Computer Use, it provides AI-driven browser control using familiar Rails patterns.
Features
- Navigate to URLs - Direct browser navigation to any website
- Click elements - Click buttons, links, or any element using CSS selectors or text
- Extract content - Extract text from specific elements or entire pages
- Take screenshots - Capture full page or specific areas with HD resolution (1920x1080)
- Fill forms - Interact with form fields programmatically
- Extract links - Gather links from pages with optional preview screenshots
- Smart content detection - Automatically detect and focus on main content areas
Setup
Generate a browser use agent:
rails generate active_agent:agent browser_use navigate click extract_text screenshot
Agent Implementation
require "capybara"
require "capybara/cuprite"
class BrowserAgent < ApplicationAgent
# Configure AI provider for intelligent automation
generate_with :openai,
model: "gpt-4o-mini"
class_attribute :browser_session, default: nil
# Navigate to a URL
def navigate
setup_browser_if_needed
@url = params[:url]
Rails.logger.info "Navigating to #{@url}"
begin
self.class.browser_session.visit(@url)
@status = 200
@current_url = self.class.browser_session.current_url
@title = self.class.browser_session.title
rescue => e
@status = 500
@error = e.message
Rails.logger.error "Navigation failed: #{e.message}"
end
prompt
end
# Click on an element
def click
setup_browser_if_needed
@selector = params[:selector]
@text = params[:text]
Rails.logger.info "Clicking on element: selector=#{@selector}, text=#{@text}"
begin
if @text
self.class.browser_session.click_on(@text)
elsif @selector
self.class.browser_session.find(@selector).click
end
@success = true
@current_url = self.class.browser_session.current_url
rescue => e
@success = false
@error = e.message
Rails.logger.error "Click failed: #{e.message}"
end
prompt
end
# Fill in a form field
def fill_form
setup_browser_if_needed
@field = params[:field]
@value = params[:value]
@selector = params[:selector]
Rails.logger.info "Filling form field: field=#{@field}, selector=#{@selector}"
begin
if @selector
self.class.browser_session.find(@selector).set(@value)
else
self.class.browser_session.fill_in(@field, with: @value)
end
@success = true
rescue => e
@success = false
@error = e.message
Rails.logger.error "Fill form failed: #{e.message}"
end
prompt
end
# Extract text from the page
def extract_text
setup_browser_if_needed
@selector = params[:selector] || "body"
Rails.logger.info "Extracting text from #{@selector}"
begin
element = self.class.browser_session.find(@selector)
@text = element.text
@success = true
rescue => e
@success = false
@error = e.message
Rails.logger.error "Extract text failed: #{e.message}"
end
prompt
end
# Get current page info
def page_info
setup_browser_if_needed
Rails.logger.info "Getting page info"
begin
@current_url = self.class.browser_session.current_url
@title = self.class.browser_session.title
@has_css = {}
# Check for common elements
[ "form", "input", "button", "a", "img" ].each do |tag|
@has_css[tag] = self.class.browser_session.has_css?(tag)
end
@success = true
rescue => e
@success = false
@error = e.message
Rails.logger.error "Page info failed: #{e.message}"
end
prompt
end
# Extract all links from the page
def extract_links
setup_browser_if_needed
@selector = params[:selector] || "body"
@limit = params[:limit] || 10
Rails.logger.info "Extracting links from #{@selector}"
begin
@links = []
within_element = (@selector == "body") ? self.class.browser_session : self.class.browser_session.find(@selector)
within_element.all("a", visible: true).first(@limit).each do |link|
href = link["href"]
next if href.nil? || href.empty? || href.start_with?("#")
@links << {
text: link.text.strip,
href: href,
title: link["title"]
}
end
@success = true
@current_url = self.class.browser_session.current_url
rescue => e
@success = false
@error = e.message
@links = []
Rails.logger.error "Extract links failed: #{e.message}"
end
prompt
end
# Follow a link by text or href
def follow_link
setup_browser_if_needed
@text = params[:text]
@href = params[:href]
Rails.logger.info "Following link: text=#{@text}, href=#{@href}"
begin
if @text
self.class.browser_session.click_link(@text)
elsif @href
link = self.class.browser_session.find("a[href*='#{@href}']")
link.click
else
raise "Must provide either text or href parameter"
end
# Wait for navigation
sleep 0.5
@success = true
@current_url = self.class.browser_session.current_url
@title = self.class.browser_session.title
rescue => e
@success = false
@error = e.message
Rails.logger.error "Follow link failed: #{e.message}"
end
prompt
end
# Go back to previous page
def go_back
setup_browser_if_needed
Rails.logger.info "Going back to previous page"
begin
self.class.browser_session.go_back
sleep 0.5
@success = true
@current_url = self.class.browser_session.current_url
@title = self.class.browser_session.title
rescue => e
@success = false
@error = e.message
Rails.logger.error "Go back failed: #{e.message}"
end
prompt
end
# Extract main content (useful for Wikipedia and articles)
def extract_main_content
setup_browser_if_needed
Rails.logger.info "Extracting main content"
begin
# Try common content selectors
content_selectors = [
"#mw-content-text", # Wikipedia
"main",
"article",
"[role='main']",
".content",
"#content"
]
@content = nil
content_selectors.each do |selector|
if self.class.browser_session.has_css?(selector)
element = self.class.browser_session.find(selector)
@content = element.text
@selector_used = selector
break
end
end
@content ||= self.class.browser_session.find("body").text
@current_url = self.class.browser_session.current_url
@title = self.class.browser_session.title
@success = true
rescue => e
@success = false
@error = e.message
Rails.logger.error "Extract main content failed: #{e.message}"
end
prompt
end
# Take a screenshot of the current page
def screenshot
setup_browser_if_needed
@filename = params[:filename] || "screenshot_#{Time.now.to_i}.png"
@full_page = params[:full_page] || false
@selector = params[:selector]
@area = params[:area] # { x: 0, y: 0, width: 400, height: 300 }
@main_content_only = params[:main_content_only] != false # Default to true
# Ensure tmp/screenshots directory exists
screenshot_dir = Rails.root.join("tmp", "screenshots")
FileUtils.mkdir_p(screenshot_dir)
@path = screenshot_dir.join(@filename)
Rails.logger.info "Taking screenshot: #{@filename}"
begin
# Build screenshot options
options = { path: @path }
# If main_content_only is true and no specific selector/area provided, try to detect main content
if @main_content_only && !@selector && !@area
main_area = detect_main_content_area
if main_area
options[:area] = main_area
Rails.logger.info "Auto-cropping to main content area: #{main_area.inspect}"
end
else
# Add full page option
options[:full] = true if @full_page
# Add selector option (for element screenshots)
options[:selector] = @selector if @selector.present?
# Add area option (for specific region screenshots)
if @area.present?
# Ensure area has the required keys and convert to symbol keys
area_hash = {}
area_hash[:x] = @area["x"] || @area[:x] if @area["x"] || @area[:x]
area_hash[:y] = @area["y"] || @area[:y] if @area["y"] || @area[:y]
area_hash[:width] = @area["width"] || @area[:width] if @area["width"] || @area[:width]
area_hash[:height] = @area["height"] || @area[:height] if @area["height"] || @area[:height]
options[:area] = area_hash if area_hash.any?
end
end
# Take the screenshot with options
self.class.browser_session.save_screenshot(**options)
@success = true
@filepath = @path.to_s
@current_url = self.class.browser_session.current_url
@title = self.class.browser_session.title
# Generate a relative path for display
@relative_path = @path.relative_path_from(Rails.root).to_s
rescue => e
@success = false
@error = e.message
Rails.logger.error "Screenshot failed: #{e.message}"
end
prompt
end
# Extract links with preview screenshots
def extract_links_with_previews
setup_browser_if_needed
@selector = params[:selector] || "body"
@limit = params[:limit] || 5
Rails.logger.info "Extracting links with previews from #{@selector}"
begin
@links = []
@original_url = self.class.browser_session.current_url
within_element = (@selector == "body") ? self.class.browser_session : self.class.browser_session.find(@selector)
# Get unique links
all_links = within_element.all("a", visible: true)
unique_links = {}
all_links.each do |link|
href = link["href"]
next if href.nil? || href.empty? || href.start_with?("#") || href.start_with?("javascript:")
# Normalize URL
full_url = URI.join(@original_url, href).to_s rescue next
next if unique_links.key?(full_url)
unique_links[full_url] = {
text: link.text.strip,
href: full_url,
title: link["title"]
}
break if unique_links.size >= @limit
end
# Take screenshots of each link
unique_links.each_with_index do |(url, link_data), index|
begin
# Visit the link
self.class.browser_session.visit(url)
sleep 0.5 # Wait for page to load
# Take a screenshot
screenshot_filename = "preview_#{index}_#{Time.now.to_i}.png"
screenshot_path = Rails.root.join("tmp", "screenshots", screenshot_filename)
FileUtils.mkdir_p(File.dirname(screenshot_path))
self.class.browser_session.save_screenshot(screenshot_path)
link_data[:screenshot] = screenshot_path.relative_path_from(Rails.root).to_s
link_data[:page_title] = self.class.browser_session.title
@links << link_data
rescue => e
Rails.logger.warn "Failed to preview #{url}: #{e.message}"
@links << link_data # Add without screenshot
end
end
# Return to original page
self.class.browser_session.visit(@original_url)
@success = true
@current_url = @original_url
rescue => e
@success = false
@error = e.message
@links = []
Rails.logger.error "Extract links with previews failed: #{e.message}"
end
prompt
end
private
def setup_browser_if_needed
return if self.class.browser_session
# Configure Cuprite driver if not already configured
unless Capybara.drivers[:cuprite_agent]
Capybara.register_driver :cuprite_agent do |app|
Capybara::Cuprite::Driver.new(
app,
window_size: [ 1920, 1080 ], # Standard HD resolution
browser_options: {
"no-sandbox": nil,
"disable-gpu": nil,
"disable-dev-shm-usage": nil
},
inspector: false,
headless: true
)
end
end
# Create a shared session for this agent class
self.class.browser_session = Capybara::Session.new(:cuprite_agent)
end
def detect_main_content_area
# Try to detect main content area based on common selectors
main_selectors = [
"main", # HTML5 main element
"[role='main']", # ARIA role
"#main-content", # Common ID
"#main", # Common ID
"#content", # Common ID
".main-content", # Common class
".content", # Common class
"article", # Article element
"#mw-content-text", # Wikipedia
".container", # Bootstrap/common framework
"#root > div > main", # React apps
"body > div:nth-child(2)" # Fallback to second div
]
main_selectors.each do |selector|
if self.class.browser_session.has_css?(selector, wait: 0)
begin
# Get element position and dimensions using JavaScript
rect = self.class.browser_session.evaluate_script(<<-JS)
(function() {
var elem = document.querySelector('#{selector}');
if (!elem) return null;
var rect = elem.getBoundingClientRect();
return {
x: Math.round(rect.left + window.scrollX),
y: Math.round(rect.top + window.scrollY),
width: Math.round(rect.width),
height: Math.round(rect.height)
};
})()
JS
if rect && rect["width"] > 0 && rect["height"] > 0
# Start from the element's Y position or skip header if element is at top
start_y = (rect["y"] < 100) ? 150 : rect["y"]
# Always use full viewport width and height from start_y
return {
x: 0,
y: start_y,
width: 1920,
height: 1080 - start_y # Full height minus the offset
}
end
rescue => e
Rails.logger.warn "Failed to get dimensions for #{selector}: #{e.message}"
end
end
end
# Default fallback: skip typical header area but keep full height
{
x: 0,
y: 150, # Skip typical header height
width: 1920,
height: 930 # 1080 - 150 = 930 to stay within viewport
}
end
end
You are a browser automation agent that can navigate web pages and interact with web elements using Cuprite/Chrome.
You have access to the following browser actions:
<% controller.action_schemas.each do |schema| %>
- <%= schema["name"] %>: <%= schema["description"] %>
<% end %>
<% if params[:url].present? %>
Starting URL: <%= params[:url] %>
You should navigate to this URL first to begin your research.
<% end %>
Use these tools to help users automate web browsing tasks, extract information from websites, and perform user interactions.
When researching a topic:
1. Navigate to the provided URL or search for relevant pages
2. Extract the main content to understand the topic
3. Use the click action with specific text to navigate to related pages (e.g., click text: "Neil Armstrong")
4. Use go_back to return to previous pages when needed
5. Provide a comprehensive summary with reference URLs
Tips for efficient browsing:
- Use click with text parameter for navigating to specific links rather than extract_links_with_previews
- For Wikipedia: Use selector "#mw-content-text" when extracting links to focus on article content
- Extract main content before navigating away from important pages
Screenshot tips (browser is 1920x1080 HD resolution):
- To capture main content without headers, use the area parameter: { "x": 0, "y": 150, "width": 1920, "height": 930 }
- For Wikipedia articles, consider: { "x": 0, "y": 200, "width": 1920, "height": 880 } to skip navigation
- For specific elements, use the selector parameter (e.g., selector: "#mw-content-text")
- Full page screenshots capture everything, but cropped areas often look cleaner
- Default screenshots automatically try to crop to main content, but you can override with main_content_only: false
json.name action_name
json.description "Take a screenshot of the current page"
json.parameters do
json.type "object"
json.properties do
json.filename do
json.type "string"
json.description "Name for the screenshot file"
end
json.full_page do
json.type "boolean"
json.description "Whether to capture the full page (true) or just viewport (false)"
end
json.main_content_only do
json.type "boolean"
json.description "Automatically detect and crop to main content area, excluding headers (default: true)"
end
json.selector do
json.type "string"
json.description "CSS selector for a specific element to screenshot (optional)"
end
json.area do
json.type "object"
json.description "Specific area of the page to capture (optional)"
json.properties do
json.x do
json.type "integer"
json.description "X coordinate of the top-left corner"
end
json.y do
json.type "integer"
json.description "Y coordinate of the top-left corner"
end
json.width do
json.type "integer"
json.description "Width of the area to capture"
end
json.height do
json.type "integer"
json.description "Height of the area to capture"
end
end
end
end
end
Basic Navigation Example
The browser use agent can navigate to URLs and interact with pages using AI:
response = BrowserAgent.with(
message: "Navigate to https://www.example.com and tell me what you see"
).prompt_context.generate_now
assert response.message.content.present?
Navigation Response Example
AI-Driven Browser Control
The browser use agent can use AI to determine which actions to take:
response = BrowserAgent.with(
message: "Go to https://www.example.com and extract the main heading"
).prompt_context.generate_now
# Check that AI used the tools
assert response.prompt.messages.any? { |m| m.role == :tool }
assert response.message.content.present?
AI Browser Response Example
Direct Action Usage
You can also call browser actions directly without AI:
# Call navigate action directly (synchronous execution)
navigate_response = BrowserAgent.with(
url: "https://www.example.com"
).navigate
# The action returns a Generation object
assert_kind_of ActiveAgent::Generation, navigate_response
# Execute the generation
result = navigate_response.generate_now
assert result.message.content.include?("navigated") || result.message.content.include?("Failed") || result.message.content.include?("Example")
Direct Action Response Example
Wikipedia Research Example
The browser use agent excels at research tasks, navigating between pages and gathering information:
response = BrowserAgent.with(
message: "Research the Apollo 11 moon landing mission. Start at the main Wikipedia article, then:
1) Extract the main content to get an overview
2) Find and follow links to learn about the crew members (Neil Armstrong, Buzz Aldrin, Michael Collins)
3) Take screenshots of important pages
4) Extract key dates, mission objectives, and historical significance
5) Look for related missions or events by exploring relevant links
Please provide a comprehensive summary with details about the mission, crew, and its impact on space exploration.",
url: "https://en.wikipedia.org/wiki/Apollo_11"
).prompt_context.generate_now
# The agent should navigate to Wikipedia and gather information
assert response.message.content.present?
assert response.message.content.downcase.include?("apollo") ||
response.message.content.downcase.include?("moon") ||
response.message.content.downcase.include?("armstrong") ||
response.message.content.downcase.include?("nasa")
# Check that multiple tools were used
tool_messages = response.prompt.messages.select { |m| m.role == :tool }
assert tool_messages.any?, "Should have used tools"
# Check for variety in tool usage (the agent should use multiple different tools)
assistant_messages = response.prompt.messages.select { |m| m.role == :assistant }
tool_names = []
assistant_messages.each do |msg|
if msg.requested_actions&.any?
tool_names.concat(msg.requested_actions.map(&:name))
end
end
tool_names.uniq!
assert tool_names.length > 2, "Should use at least 3 different tools for comprehensive research"
Wikipedia Research Response Example
Area Screenshot Example
Take screenshots of specific page regions:
response = BrowserAgent.with(
message: "Navigate to https://www.example.com and take a screenshot of just the header area (top 200 pixels)"
).prompt_context.generate_now
assert response.message.content.present?
# Check that screenshot tool was used
tool_messages = response.prompt.messages.select { |m| m.role == :tool }
assert tool_messages.any? { |m| m.content.include?("screenshot") }, "Should have taken a screenshot"
Area Screenshot Response Example
Main Content Auto-Cropping
The browser use agent can automatically detect and crop to main content areas:
response = BrowserAgent.with(
message: "Navigate to Wikipedia's Apollo 11 page and take a screenshot of the main content (should automatically exclude navigation/header)"
).prompt_context.generate_now
assert response.message.content.present?
# Check that screenshot was taken
tool_messages = response.prompt.messages.select { |m| m.role == :tool }
assert tool_messages.any? { |m| m.content.include?("screenshot") }, "Should have taken a screenshot"
# Check that the agent navigated to Wikipedia
assert tool_messages.any? { |m| m.content.include?("wikipedia") }, "Should have navigated to Wikipedia"
Main Content Crop Response Example
Screenshot Capabilities
The screenshot action provides multiple options for capturing page content:
Full Page Screenshot
BrowserAgent.with(
url: "https://example.com"
).navigate.generate_now
BrowserAgent.new.screenshot(
filename: "full_page.png",
full_page: true
)
Area Screenshot
BrowserAgent.new.screenshot(
filename: "header.png",
area: { x: 0, y: 0, width: 1920, height: 200 }
)
Element Screenshot
BrowserAgent.new.screenshot(
filename: "content.png",
selector: "#main-content"
)
Auto-Crop to Main Content
BrowserAgent.new.screenshot(
filename: "main.png",
main_content_only: true # Default behavior
)
Browser Configuration
The browser runs in HD resolution (1920x1080) with headless Chrome:
def setup_browser_if_needed
Capybara.register_driver :cuprite_agent do |app|
Capybara::Cuprite::Driver.new(
app,
window_size: [1920, 1080], # HD resolution
browser_options: {
"no-sandbox": nil,
"disable-gpu": nil,
"disable-dev-shm-usage": nil
},
inspector: false,
headless: true
)
end
end
Smart Content Detection
The browser use agent includes intelligent content detection that:
- Identifies main content areas using common selectors
- Skips headers and navigation automatically
- Adjusts cropping based on page structure
- Falls back to sensible defaults
Common selectors checked:
main
,[role='main']
#main-content
,#content
article
#mw-content-text
(Wikipedia).container
(Bootstrap)
Tips for Effective Use
Navigation Best Practices
- Use
click
with text parameter for specific links - Extract main content before navigating away
- Use
go_back
to return to previous pages - Take screenshots of important pages
Wikipedia Research
- Use selector
#mw-content-text
for article content - Click directly on relevant links rather than extracting all links
- Take screenshots with
main_content_only: true
to exclude navigation
Screenshot Optimization
- Default
main_content_only: true
crops out headers automatically - Use area parameter for specific regions:
{ x: 0, y: 150, width: 1920, height: 930 }
- For Wikipedia, consider
y: 200
to skip navigation bars - Full page screenshots available with
full_page: true
Integration with Rails
The Browser Use Agent integrates seamlessly with Rails applications:
class WebScraperController < ApplicationController
def scrape
response = BrowserAgent.with(
message: params[:instructions],
url: params[:url]
).prompt_context.generate_now
render json: {
content: response.message.content,
screenshots: response.prompt.messages
.select { |m| m.role == :tool && m.content.include?("screenshot") }
.map { |m| m.content.match(/File: (.+?)\\n/)[1] }
}
end
end
Advanced Usage
Multi-Page Navigation Flow
agent = BrowserAgent.new
# Navigate to main page
agent.navigate(url: "https://example.com")
# Extract main content
content = agent.extract_main_content
# Click specific link
agent.click(text: "Learn More")
# Take screenshot of new page
agent.screenshot(main_content_only: true)
# Go back
agent.go_back
# Extract links for further exploration
links = agent.extract_links(selector: "#main-content")
Form Interaction
agent = BrowserAgent.new
agent.navigate(url: "https://example.com/form")
agent.fill_form(field: "email", value: "test@example.com")
agent.fill_form(field: "message", value: "Hello world")
agent.click(text: "Submit")
agent.screenshot(filename: "form_result.png")
Requirements
- Cuprite gem for Chrome automation
- Chrome or Chromium browser installed
- Capybara for browser session management
Add to your Gemfile:
gem 'cuprite'
gem 'capybara'
Conclusion
The Browser Use Agent demonstrates ActiveAgent's flexibility in integrating with external tools while maintaining Rails conventions. Following the pattern of tools like Anthropic's Computer Use, it provides powerful browser automation capabilities driven by AI, making it ideal for:
- Web scraping and data extraction
- Automated testing and verification
- Research and information gathering
- Screenshot generation for documentation
- Form submission and interaction