Skip to main content

Testing Overview

Erdo provides two complementary testing approaches that enable comprehensive validation while minimizing API costs: unit testing for logic validation and integration testing for end-to-end verification.

Testing Philosophy

Agent systems present unique testing challenges:
  • Complexity: Multiple execution paths through conditional logic and result handlers
  • State Validation: Ensuring data flows correctly through templates and steps
  • Coverage: Identifying and testing all possible paths through the agent
  • Integration Costs: LLM API calls in integration tests can accumulate
Erdo addresses these through:
  1. Declarative structure that enables automatic path enumeration
  2. Static analysis that validates templates before execution
  3. Intelligent caching for cost-effective integration testing
  4. Parallel execution for rapid feedback

Test Types Comparison

Erdo provides two distinct test types, each serving different purposes:
AspectUnit Tests (erdo test)Integration Tests (erdo agent-test)
What it doesValidates agent structure & logic locallyExecutes agents with real backend/LLM calls
Backend calls❌ None - pure static analysis✅ Yes - actual execution
Speed⚡ 2-5 seconds for full coverage🐢 Depends on LLM response time (or fast with replay cache)
Cost💰 Free - no API calls💰 First run costs API fees, replay mode caches for free reruns
Coverage🔍 All execution paths (2^n for n conditions)🎯 Specific test scenarios you write
Use caseValidate logic, catch template errors, rapid developmentEnd-to-end validation, integration verification
When to runEvery save/commit - instant feedbackBefore deployment, CI/CD, regression testing
Test filesAny agent Python fileFunctions with agent_test_* prefix
Commanderdo test my_agent.pyerdo agent-test tests/test_my_agent.py
Quick Decision Guide:
  • Use Unit Tests when you want fast feedback on agent structure, template syntax, and execution paths
  • Use Integration Tests when you need to validate actual LLM behavior and end-to-end functionality
  • Use Both for comprehensive coverage: unit tests catch structure issues instantly, integration tests validate real behavior

Unit Testing

Local Validation Only: Unit tests run entirely on your machine with NO backend or LLM calls. They’re free, fast, and perfect for rapid development iteration. See the Unit Testing Guide for complete details.

Overview

Unit tests validate agent structure and logic through static analysis and path enumeration:
erdo test my_agent.py

What Gets Tested

Execution Path Enumeration Because Erdo agents are declarative, the CLI can automatically identify all execution paths:
  • All conditional branches (success/error handlers)
  • ITERATE_OVER loops with different data structures
  • Step dependencies and ordering
  • Handler combinations
Template Validation Templates are validated against actual data structures:
  • State field availability and access patterns
  • Type compatibility
  • Parameter hydration with test data
  • Missing key detection
State Management State flow is simulated through execution:
  • Step output accumulation
  • Data transformations
  • Context availability at each step

Example Output

🔍 Testing Python agents in my_agent.py...
🤖 Found 1 agents to test:
  • data_analyzer: Analyzes data and generates insights

📊 Generating execution paths...

✅ Path 1: analysis → store_high_confidence → send_notification
✅ Path 2: analysis → request_review → notify_team
✅ Path 3: analysis [error] → retry → notify_failure

✅ All 12 execution paths tested successfully!

Running Unit Tests

# Test all agents in current directory
erdo test

# Test specific agent
erdo test agents/data_analyzer.py

# Provide custom test data
erdo test agents/data_analyzer.py --data test_scenarios.json

# Watch mode for development
erdo test agents/data_analyzer.py --watch

Test Data Generation

Erdo automatically generates test data from your parameter definitions:
agent = Agent(
    name="analyzer",
    parameter_definitions=[
        ParameterDefinition(
            name="Query",
            key="query",
            type=ParameterType.STRING,
            is_required=True
        )
    ]
)

# erdo test will generate: {"query": "sample_query"}
You can also provide custom test data:
// test_data.json
{
  "query": "Analyze Q4 sales trends",
  "dataset": {
    "id": "sales_2024_q4",
    "name": "Q4 Sales Data",
    "type": "file"
  }
}
erdo test my_agent.py --data test_data.json

Integration Testing

Live Agent Execution: Integration tests execute your agents with real backend and LLM calls. Use replay mode to cache responses and minimize API costs. See the Integration Testing Guide for complete details on modes and best practices.

Overview

Integration tests execute agents with real LLM calls. They support three modes (live, replay, manual) for different testing needs:
from erdo import invoke
from erdo.test import text_contains

def agent_test_sales_analysis():
    """Test sales analysis with actual LLM execution."""
    response = invoke(
        "data-analyst",
        messages=[{"role": "user", "content": "Analyze Q4 sales"}],
        datasets=["sales_2024_q4"],
        mode="replay"
    )
    
    assert response.success
    assert text_contains(str(response.result), "revenue")

Replay Mode

Replay mode intelligently caches LLM responses: First Execution
  1. Executes agent with real LLM calls
  2. Caches responses locally
  3. Test runs with actual API behavior
Subsequent Executions
  1. Returns cached responses without API calls
  2. Tests run in milliseconds instead of seconds
  3. Identical behavior to first execution
Cache Invalidation
  • Automatic when agent definition changes
  • Manual refresh with --refresh flag
  • Per-test cache isolation

Test Modes

Running Integration Tests

# Run all tests in parallel
erdo agent-test tests/test_my_agent.py

# Verbose output for debugging
erdo agent-test tests/test_my_agent.py --verbose

# Refresh cached responses
erdo agent-test tests/test_my_agent.py --refresh

# Control parallelism
erdo agent-test tests/test_my_agent.py -j 8

Test Helpers

from erdo.test import (
    text_contains,
    text_equals,
    text_matches,
    json_path_equals,
    json_path_exists
)

def agent_test_analysis():
    response = invoke("analyst", messages=[...], mode="replay")
    
    # Text assertions
    result = str(response.result)
    assert text_contains(result, "insights", case_sensitive=False)
    assert text_matches(result, r"\d+ recommendations")
    
    # JSON assertions
    assert json_path_exists(response.result, "analysis.summary")
    assert json_path_equals(response.result, "analysis.confidence", 0.95)

Test Organization

File Structure

my_project/
├── agents/
│   ├── data_analyzer.py
│   └── report_generator.py
├── tests/
│   ├── test_data_analyzer.py
│   └── test_report_generator.py
└── test_data/
    ├── scenarios.json
    └── fixtures.json

Naming Convention

# Integration tests: agent_test_* prefix
def agent_test_csv_analysis():
    """Test CSV data analysis."""
    pass

def agent_test_error_handling():
    """Test error handling behavior."""
    pass

# Helper functions: no prefix requirement
def load_test_data():
    """Load test data fixture."""
    pass

Best Practices

Test Coverage Strategy

  1. Unit Tests: Validate all execution paths
  2. Integration Tests: Test critical user journeys
  3. Edge Cases: Handle error conditions and boundary cases
  4. Regression Tests: Prevent known issues from recurring

Cost-Effective Testing

# Good: Most tests use replay mode
def agent_test_standard_query():
    response = invoke("agent", messages=[...], mode="replay")
    assert response.success

# Strategic: Use live mode sparingly for critical paths
def agent_test_production_validation():
    response = invoke("agent", messages=[...], mode="live")
    assert response.success

# Efficient: Use manual mode for deterministic scenarios
def agent_test_error_handling():
    response = invoke(
        "agent",
        messages=[...],
        mode="manual",
        manual_mocks={"llm.message": {"status": "error", "error": "Test error"}}
    )
    assert not response.success

Continuous Integration

# .github/workflows/test.yml
name: Test Agents

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      - name: Install Erdo CLI
        run: |
          brew install erdoai/tap/erdo
          erdo login --token ${{ secrets.ERDO_TOKEN }}
      
      - name: Unit Tests
        run: erdo test agents/
      
      - name: Integration Tests
        run: erdo agent-test tests/

Performance Characteristics

Unit Tests

  • Duration: 2-5 seconds for comprehensive path coverage
  • Cost: No API calls
  • Coverage: All execution paths automatically tested

Integration Tests with Replay Mode

  • First Run: Standard API call duration + caching overhead
  • Subsequent Runs: Milliseconds (cached responses)
  • Cost: API costs only on first run or cache refresh
  • Parallelism: 10x faster with parallel execution

Integration Tests with Live Mode

  • Duration: Standard API call duration per test
  • Cost: API costs per execution
  • Use Case: Strategic validation only

Next Steps