Testing Overview

Erdo provides two complementary testing approaches that enable comprehensive validation while minimizing API costs: unit testing for logic validation and integration testing for end-to-end verification.

Testing Philosophy

Agent systems present unique testing challenges:

Complexity: Multiple execution paths through conditional logic and result handlers
State Validation: Ensuring data flows correctly through templates and steps
Coverage: Identifying and testing all possible paths through the agent
Integration Costs: LLM API calls in integration tests can accumulate

Erdo addresses these through:

Declarative structure that enables automatic path enumeration
Static analysis that validates templates before execution
Intelligent caching for cost-effective integration testing
Parallel execution for rapid feedback

Test Types Comparison

Erdo provides two distinct test types, each serving different purposes:

Aspect	Unit Tests (`erdo test`)	Integration Tests (`erdo agent-test`)
What it does	Validates agent structure & logic locally	Executes agents with real backend/LLM calls
Backend calls	❌ None - pure static analysis	✅ Yes - actual execution
Speed	⚡ 2-5 seconds for full coverage	🐢 Depends on LLM response time (or fast with replay cache)
Cost	💰 Free - no API calls	💰 First run costs API fees, replay mode caches for free reruns
Coverage	🔍 All execution paths (2^n for n conditions)	🎯 Specific test scenarios you write
Use case	Validate logic, catch template errors, rapid development	End-to-end validation, integration verification
When to run	Every save/commit - instant feedback	Before deployment, CI/CD, regression testing
Test files	Any agent Python file	Functions with `agent_test_*` prefix
Command	`erdo test my_agent.py`	`erdo agent-test tests/test_my_agent.py`

Quick Decision Guide:

Use Unit Tests when you want fast feedback on agent structure, template syntax, and execution paths
Use Integration Tests when you need to validate actual LLM behavior and end-to-end functionality
Use Both for comprehensive coverage: unit tests catch structure issues instantly, integration tests validate real behavior

Unit Testing

Local Validation Only: Unit tests run entirely on your machine with NO backend or LLM calls. They’re free, fast, and perfect for rapid development iteration. See the Unit Testing Guide for complete details.

Overview

Unit tests validate agent structure and logic through static analysis and path enumeration:

erdo test my_agent.py

What Gets Tested

Execution Path Enumeration Because Erdo agents are declarative, the CLI can automatically identify all execution paths:

All conditional branches (success/error handlers)
ITERATE_OVER loops with different data structures
Step dependencies and ordering
Handler combinations

Template Validation Templates are validated against actual data structures:

State field availability and access patterns
Type compatibility
Parameter hydration with test data
Missing key detection

State Management State flow is simulated through execution:

Step output accumulation
Data transformations
Context availability at each step

Example Output

🔍 Testing Python agents in my_agent.py...
🤖 Found 1 agents to test:
  • data_analyzer: Analyzes data and generates insights

📊 Generating execution paths...

✅ Path 1: analysis → store_high_confidence → send_notification
✅ Path 2: analysis → request_review → notify_team
✅ Path 3: analysis [error] → retry → notify_failure

✅ All 12 execution paths tested successfully!

Running Unit Tests

# Test all agents in current directory
erdo test

# Test specific agent
erdo test agents/data_analyzer.py

# Provide custom test data
erdo test agents/data_analyzer.py --data test_scenarios.json

# Watch mode for development
erdo test agents/data_analyzer.py --watch

Test Data Generation

Erdo automatically generates test data from your parameter definitions:

agent = Agent(
    name="analyzer",
    parameter_definitions=[
        ParameterDefinition(
            name="Query",
            key="query",
            type=ParameterType.STRING,
            is_required=True
        )
    ]
)

# erdo test will generate: {"query": "sample_query"}

You can also provide custom test data:

// test_data.json
{
  "query": "Analyze Q4 sales trends",
  "dataset": {
    "id": "sales_2024_q4",
    "name": "Q4 Sales Data",
    "type": "file"
  }
}

erdo test my_agent.py --data test_data.json

Integration Testing

Live Agent Execution: Integration tests execute your agents with real backend and LLM calls. Use replay mode to cache responses and minimize API costs. See the Integration Testing Guide for complete details on modes and best practices.

Overview

Integration tests execute agents with real LLM calls. They support three modes (live, replay, manual) for different testing needs:

from erdo import invoke
from erdo.test import text_contains

def agent_test_sales_analysis():
    """Test sales analysis with actual LLM execution."""
    response = invoke(
        "data-analyst",
        messages=[{"role": "user", "content": "Analyze Q4 sales"}],
        datasets=["sales_2024_q4"],
        mode="replay"
    )
    
    assert response.success
    assert text_contains(str(response.result), "revenue")

Replay Mode

Replay mode intelligently caches LLM responses: First Execution

Executes agent with real LLM calls
Caches responses locally
Test runs with actual API behavior

Subsequent Executions

Returns cached responses without API calls
Tests run in milliseconds instead of seconds
Identical behavior to first execution

Cache Invalidation

Automatic when agent definition changes
Manual refresh with --refresh flag
Per-test cache isolation

Test Modes

Replay Mode (Recommended)
Live Mode
Manual Mode

response = invoke(
    "my-agent",
    messages=[{"role": "user", "content": "test"}],
    mode="replay"
)

Use for:

Most integration tests
CI/CD pipelines
Regression testing
Rapid development iteration

Behavior:

First run: Real API calls, response cached
Subsequent runs: Cached response, no API calls
Deterministic test results

response = invoke(
    "my-agent",
    messages=[{"role": "user", "content": "test"}],
    mode="live"
)

Use for:

Integration tests requiring fresh data
Testing with latest model behavior
Validating cache integrity

Behavior:

Always makes real API calls
Non-deterministic results
API costs per execution

response = invoke(
    "my-agent",
    messages=[{"role": "user", "content": "test"}],
    mode="manual",
    manual_mocks={
        "llm.message": {
            "status": "success",
            "output": {"content": "Mocked response"}
        }
    }
)

Use for:

Deterministic test scenarios
Testing error handling
Offline development
Fast unit-style tests

Behavior:

No API calls
Developer-controlled responses
Predictable outcomes

Running Integration Tests

# Run all tests in parallel
erdo agent-test tests/test_my_agent.py

# Verbose output for debugging
erdo agent-test tests/test_my_agent.py --verbose

# Refresh cached responses
erdo agent-test tests/test_my_agent.py --refresh

# Control parallelism
erdo agent-test tests/test_my_agent.py -j 8

Test Helpers

from erdo.test import (
    text_contains,
    text_equals,
    text_matches,
    json_path_equals,
    json_path_exists
)

def agent_test_analysis():
    response = invoke("analyst", messages=[...], mode="replay")
    
    # Text assertions
    result = str(response.result)
    assert text_contains(result, "insights", case_sensitive=False)
    assert text_matches(result, r"\d+ recommendations")
    
    # JSON assertions
    assert json_path_exists(response.result, "analysis.summary")
    assert json_path_equals(response.result, "analysis.confidence", 0.95)

Test Organization

File Structure

my_project/
├── agents/
│   ├── data_analyzer.py
│   └── report_generator.py
├── tests/
│   ├── test_data_analyzer.py
│   └── test_report_generator.py
└── test_data/
    ├── scenarios.json
    └── fixtures.json

Naming Convention

# Integration tests: agent_test_* prefix
def agent_test_csv_analysis():
    """Test CSV data analysis."""
    pass

def agent_test_error_handling():
    """Test error handling behavior."""
    pass

# Helper functions: no prefix requirement
def load_test_data():
    """Load test data fixture."""
    pass

Best Practices

Test Coverage Strategy

Unit Tests: Validate all execution paths
Integration Tests: Test critical user journeys
Edge Cases: Handle error conditions and boundary cases
Regression Tests: Prevent known issues from recurring

Cost-Effective Testing

# Good: Most tests use replay mode
def agent_test_standard_query():
    response = invoke("agent", messages=[...], mode="replay")
    assert response.success

# Strategic: Use live mode sparingly for critical paths
def agent_test_production_validation():
    response = invoke("agent", messages=[...], mode="live")
    assert response.success

# Efficient: Use manual mode for deterministic scenarios
def agent_test_error_handling():
    response = invoke(
        "agent",
        messages=[...],
        mode="manual",
        manual_mocks={"llm.message": {"status": "error", "error": "Test error"}}
    )
    assert not response.success

Continuous Integration

# .github/workflows/test.yml
name: Test Agents

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      - name: Install Erdo CLI
        run: |
          brew install erdoai/tap/erdo
          erdo login --token ${{ secrets.ERDO_TOKEN }}
      
      - name: Unit Tests
        run: erdo test agents/
      
      - name: Integration Tests
        run: erdo agent-test tests/

Performance Characteristics

Unit Tests

Duration: 2-5 seconds for comprehensive path coverage
Cost: No API calls
Coverage: All execution paths automatically tested

Integration Tests with Replay Mode

First Run: Standard API call duration + caching overhead
Subsequent Runs: Milliseconds (cached responses)
Cost: API costs only on first run or cache refresh
Parallelism: 10x faster with parallel execution

Integration Tests with Live Mode

Duration: Standard API call duration per test
Cost: API costs per execution
Use Case: Strategic validation only

Next Steps

Unit Testing Guide

Learn about comprehensive local validation

Integration Testing

Master replay mode and test strategies

CLI Reference

Complete testing command reference

SDK Testing

Python SDK testing functions

Get Started

Core Concepts

Testing & Validation

Examples & Use Cases

Python SDK

TypeScript SDK

CLI Tools

Pre-built Agents

Advanced

Integrations

​Testing Overview

​Testing Philosophy

​Test Types Comparison

​Unit Testing

​Overview

​What Gets Tested

​Example Output

​Running Unit Tests

​Test Data Generation

​Integration Testing

​Overview

​Replay Mode

​Test Modes

​Running Integration Tests

​Test Helpers

​Test Organization

​File Structure

​Naming Convention

​Best Practices

​Test Coverage Strategy

​Cost-Effective Testing

​Continuous Integration

​Performance Characteristics

​Unit Tests

​Integration Tests with Replay Mode

​Integration Tests with Live Mode

​Next Steps

Unit Testing Guide

Integration Testing

CLI Reference

SDK Testing

Testing Overview

Testing Philosophy

Test Types Comparison

Unit Testing

Overview

What Gets Tested

Example Output

Running Unit Tests

Test Data Generation

Integration Testing

Overview

Replay Mode

Test Modes

Running Integration Tests

Test Helpers

Test Organization

File Structure

Naming Convention

Best Practices

Test Coverage Strategy

Cost-Effective Testing

Continuous Integration

Performance Characteristics

Unit Tests

Integration Tests with Replay Mode

Integration Tests with Live Mode

Next Steps