Browser Use

This example is available on GitHub: examples/01_standalone_sdk/15_browser_use.py

The BrowserToolSet integration enables your agent to interact with web pages through automated browser control. Built on top of browser-use, it provides capabilities for navigating websites, clicking elements, filling forms, and extracting content - all through natural language instructions.

examples/01_standalone_sdk/15_browser_use.py

import os

from pydantic import SecretStr

from openhands.sdk import (
    LLM,
    Agent,
    Conversation,
    Event,
    LLMConvertibleEvent,
    get_logger,
)
from openhands.sdk.tool import Tool
from openhands.tools.browser_use import BrowserToolSet
from openhands.tools.file_editor import FileEditorTool
from openhands.tools.terminal import TerminalTool


logger = get_logger(__name__)

# Configure LLM
api_key = os.getenv("LLM_API_KEY")
assert api_key is not None, "LLM_API_KEY environment variable is not set."
model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929")
base_url = os.getenv("LLM_BASE_URL")
llm = LLM(
    usage_id="agent",
    model=model,
    base_url=base_url,
    api_key=SecretStr(api_key),
)

# Tools
cwd = os.getcwd()
tools = [
    Tool(
        name=TerminalTool.name,
    ),
    Tool(name=FileEditorTool.name),
    Tool(name=BrowserToolSet.name),
]

# If you need fine-grained browser control, you can manually register individual browser
# tools by creating a BrowserToolExecutor and providing factories that return customized
# Tool instances before constructing the Agent.

# Agent
agent = Agent(llm=llm, tools=tools)

llm_messages = []  # collect raw LLM messages


def conversation_callback(event: Event):
    if isinstance(event, LLMConvertibleEvent):
        llm_messages.append(event.to_llm_message())


conversation = Conversation(
    agent=agent, callbacks=[conversation_callback], workspace=cwd
)

conversation.send_message(
    "Could you go to https://openhands.dev/ blog page and summarize main "
    "points of the latest blog?"
)
conversation.run()

print("=" * 100)
print("Conversation finished. Got the following LLM messages:")
for i, message in enumerate(llm_messages):
    print(f"Message {i}: {str(message)[:200]}")

Running the Example

export LLM_API_KEY="your-api-key"
cd agent-sdk
uv run python examples/01_standalone_sdk/15_browser_use.py

How It Works

The example demonstrates combining multiple tools to create a capable web research agent:

BrowserToolSet: Provides automated browser control for web interaction
FileEditorTool: Allows the agent to read and write files if needed
BashTool: Enables command-line operations for additional functionality

The agent uses these tools to:

Navigate to specified URLs
Interact with web page elements (clicking, scrolling, etc.)
Extract and analyze content from web pages
Summarize information from multiple sources

In this example, the agent visits the openhands.dev blog, finds the latest blog post, and provides a summary of its main points.

Customization

For advanced use cases requiring only a subset of browser tools or custom configurations, you can manually register individual browser tools. Refer to the BrowserToolSet definition to see the available individual tools and create a BrowserToolExecutor with customized tool configurations before constructing the Agent. This gives you fine-grained control over which browser capabilities are exposed to the agent.

Next Steps

Custom Tools - Create specialized tools
MCP Integration - Connect external services

Guides

Architecture

API Reference

How It Works

Customization

Next Steps

Guides

Architecture

API Reference

Documentation Index

​How It Works

​Customization

​Next Steps

How It Works

Customization

Next Steps