Agent Testing

Erdo provides a simple, fast testing pattern for agents using the invoke() function with parallel execution.

Quick Start

1. Write Test Functions

Create a Python file with agent_test_* functions:

from erdo import invoke
from erdo.test import text_contains

def agent_test_basic_query():
    """Test basic agent invocation."""
    response = invoke(
        "data-question-answerer",
        messages=[{"role": "user", "content": "What were Q4 sales?"}],
        datasets=["sales-q4-2024"],
        mode="replay",  # Free after first run!
    )

    # Assert invocation succeeded
    assert response.success, f"Invocation failed: {response.error}"

    # Assert on the result
    result_text = str(response.result)
    assert text_contains(result_text, "sales", case_sensitive=False)

2. Run Tests

# Run all tests in parallel
erdo agent-test tests/test_my_agent.py

# Verbose output
erdo agent-test tests/test_my_agent.py --verbose

# Limit parallel jobs
erdo agent-test tests/test_my_agent.py -j 4

3. See Results

Discovering tests in tests/test_my_agent.py...
Found 15 tests

Running tests in parallel...

======================================================================
AGENT TEST RESULTS
======================================================================

✅ agent_test_csv_sales_total (0.45s)
✅ agent_test_csv_product_breakdown (0.52s)
✅ agent_test_postgres_customer_count (0.38s)
❌ agent_test_invalid_dataset (0.21s)

----------------------------------------------------------------------
Total: 15 | Passed: 14 | Failed: 1 | Duration: 2.3s
----------------------------------------------------------------------

Benefits

Parallel Execution Tests run concurrently, reducing test suite duration from minutes to seconds. Efficient Caching Replay mode caches LLM responses after the first execution, eliminating API costs for subsequent runs. Simple Pattern Name functions with agent_test_* prefix and use invoke(). No complex test framework required. Clear Results Clean summary with pass/fail counts, timing information, and detailed error output when needed.

Test Modes

Control how bot actions are executed in tests:

Mode	Description	API Calls	Use Case
live	Real LLM execution	Every test run	Integration tests requiring fresh data
replay	Cached responses	First run only	Most tests, CI/CD pipelines
manual	Developer mocks	None	Deterministic tests, offline development

Live Mode (Default)

Executes with real LLM calls:

response = invoke("my-agent", messages=[...])  # mode="live" is default

Use for: Integration tests requiring current model behavior or fresh data

Replay Mode (Recommended)

Caches LLM responses for efficient testing:

def agent_test_basic():
    response = invoke("my-agent", messages=[...], mode="replay")
    assert response.success

Cache behavior:

First run: Executes live and caches response
Subsequent runs: Returns cached response without API calls
Auto-invalidates when agent definition changes

Use for: Most integration tests, CI/CD pipelines, regression testing

Replay Mode with Refresh

Force cache refresh in replay mode:

def agent_test_with_fresh_data():
    # Bypass cache to get fresh response
    response = invoke(
        "my-agent",
        messages=[...],
        mode={"mode": "replay", "refresh": True}
    )
    assert response.success

Use for:

Updating cached responses after agent modifications
Validating current model behavior
Refreshing test fixtures

Manual Mode

Developer-provided mock responses:

def agent_test_with_mocks():
    response = invoke(
        "my-agent",
        messages=[...],
        mode="manual",
        manual_mocks={
            "llm.message": {
                "status": "success",
                "output": {"content": "Mocked response"}
            }
        }
    )
    assert response.success
    assert "Mocked" in str(response.result)

Use for:

Testing error handling with controlled failures
Deterministic test scenarios
Offline development
Fast execution without API dependencies

Test Helpers

Import assertion helpers from erdo.test:

from erdo.test import (
    text_contains,      # Check if text contains substring
    text_equals,        # Check exact match
    text_matches,       # Check regex pattern
    json_path_equals,   # Check JSON path value
    json_path_exists,   # Check if JSON path exists
    has_dataset,        # Check if dataset is present
)

text_contains

Check if text contains a substring:

result_text = str(response.result)

assert text_contains(result_text, "expected")
assert text_contains(result_text, "EXPECTED", case_sensitive=False)

text_equals

Check exact text match:

assert text_equals(result_text, "exact match")
assert text_equals(result_text, "EXACT MATCH", case_sensitive=False)

text_matches

Check regex pattern:

import re

assert text_matches(result_text, r"\d+ customers")
assert text_matches(result_text, r"^hello", re.IGNORECASE)

json_path_equals

Check JSON path values:

data = {"user": {"name": "Alice", "age": 30}}

assert json_path_equals(data, "user.name", "Alice")
assert json_path_equals(data, "user.age", 30)

json_path_exists

Check if JSON path exists:

assert json_path_exists(data, "user.name")
assert not json_path_exists(data, "user.email")

has_dataset

Check if dataset is present in response:

assert has_dataset(response)
assert has_dataset(response, dataset_id="abc-123")
assert has_dataset(response, dataset_key="sales_data")

Complete Example

"""
Agent tests for data question answerer.

To run:
  erdo agent-test tests/test_data_question_answerer.py
"""

from erdo import invoke
from erdo.test import text_contains, text_equals

# Agent key constant
AGENT_KEY = "data-question-answerer"

def agent_test_csv_sales_total():
    """Test CSV sales total aggregation."""
    response = invoke(
        AGENT_KEY,
        messages=[{"role": "user", "content": "What were total sales in October?"}],
        datasets=["sales-q4-2024"],
        mode="replay",
    )

    assert response.success
    result_text = str(response.result)
    assert text_contains(result_text, "October", case_sensitive=False)
    assert text_contains(result_text, "total", case_sensitive=False)


def agent_test_csv_product_breakdown():
    """Test CSV product category breakdown."""
    response = invoke(
        AGENT_KEY,
        messages=[{"role": "user", "content": "Show sales by product category"}],
        datasets=["sales-q4-2024"],
        mode="replay",
    )

    assert response.success
    result_text = str(response.result)
    assert text_contains(result_text, "category", case_sensitive=False)


def agent_test_with_parameters():
    """Test passing custom parameters."""
    response = invoke(
        AGENT_KEY,
        messages=[{"role": "user", "content": "Analyze the sales data"}],
        datasets=["sales-q4-2024"],
        parameters={
            "analysis_type": "trend",
            "time_period": "weekly"
        },
        mode="replay",
    )

    assert response.success
    result_text = str(response.result)
    assert text_contains(result_text, "trend", case_sensitive=False) or \
           text_contains(result_text, "weekly", case_sensitive=False)


def agent_test_multi_dataset():
    """Test with multiple datasets."""
    response = invoke(
        AGENT_KEY,
        messages=[{
            "role": "user",
            "content": "Is there a correlation between traffic and sales?"
        }],
        datasets=["sales-q4-2024", "ga-main-property"],
        mode="replay",
    )

    assert response.success
    result_text = str(response.result)
    assert text_contains(result_text, "correlation", case_sensitive=False)


def agent_test_empty_result():
    """Test handling of empty results."""
    response = invoke(
        AGENT_KEY,
        messages=[{
            "role": "user",
            "content": "Show products costing over $1,000,000"
        }],
        datasets=["sales-q4-2024"],
        mode="replay",
    )

    assert response.success
    result_text = str(response.result)
    # Should gracefully indicate no results
    assert text_contains(result_text, "no", case_sensitive=False)

Best Practices

1. Use Descriptive Names

# Good - clear purpose
def agent_test_csv_sales_aggregation():
    ...

# Bad - unclear
def agent_test_test1():
    ...

2. Choose the Right Mode

# Recommended: replay mode for most tests
response = invoke("my-agent", messages=[...], mode="replay")

# Strategic: manual mode for deterministic scenarios
response = invoke("my-agent", messages=[...], mode="manual",
                  manual_mocks={"llm.message": {...}})

# Occasional: refresh when updating cache
response = invoke("my-agent", messages=[...], mode={"mode": "replay", "refresh": True})

# Selective: live mode for critical validations
response = invoke("my-agent", messages=[...])  # mode="live" incurs API costs

Mode selection guide:

Replay mode: Most integration tests and CI/CD
Manual mode: Deterministic test scenarios and error handling
Refresh: Updating cached responses after agent changes
Live mode: Strategic validation with current model behavior

3. Add Good Error Messages

# Good
assert response.success, f"Invocation failed: {response.error}"
assert text_contains(result, "sales"), "Response should mention sales"

# Bad
assert response.success
assert "sales" in result

4. Test Edge Cases

def agent_test_empty_dataset():
    """Test with empty dataset."""
    response = invoke("my-agent", datasets=[], mode="replay")
    assert response.success

def agent_test_long_message():
    """Test with long message."""
    long_text = "Lorem ipsum " * 100
    response = invoke("my-agent", messages=[...], mode="replay")
    assert response.success

def agent_test_special_characters():
    """Test with special characters."""
    response = invoke(
        "my-agent",
        messages=[{"role": "user", "content": "Test: $100 & 50% @ #tag"}],
        mode="replay"
    )
    assert response.success

# CSV Tests
def agent_test_csv_basic():
    ...

def agent_test_csv_aggregation():
    ...

def agent_test_csv_filtering():
    ...

# Database Tests
def agent_test_postgres_query():
    ...

def agent_test_postgres_join():
    ...

CLI Reference

# Run all tests in a file
erdo agent-test tests/test_my_agent.py

# Verbose output (show full error traces)
erdo agent-test tests/test_my_agent.py --verbose

# Limit parallel jobs
erdo agent-test tests/test_my_agent.py -j 4

# Refresh cached responses (force re-execution in replay mode)
erdo agent-test tests/test_my_agent.py --refresh

# Combine flags
erdo agent-test tests/test_my_agent.py --refresh --verbose -j 8

# Help
erdo agent-test --help

CLI Flags:

-v, --verbose: Show detailed error traces
-j, --jobs <N>: Number of parallel jobs (default: auto)
-r, --refresh: Force refresh cached responses in replay mode

Refresh Flag Behavior: The --refresh flag forces all tests using mode="replay" to bypass cache and execute with live LLM calls. Use this when:

Agent logic has changed and cached responses need updating
Validating behavior with current model versions
Refreshing test fixtures after significant changes

Note: Tests using mode="live" or mode="manual" are unaffected by the refresh flag.

Python API

You can also run tests programmatically:

from erdo.test import run_tests

# Run tests from a file
exit_code = run_tests(
    "tests/test_my_agent.py",
    verbose=True,
    max_workers=4
)

# exit_code is 0 if all tests passed, 1 if any failed

Performance Characteristics

Sequential Execution

15 tests × 2s each = 30 seconds total

Parallel Execution with Replay Mode

15 tests in parallel = ~2-3 seconds total
First run: Standard duration with caching
Subsequent runs: Near-instant with cached responses

Parallel execution combined with replay caching provides order-of-magnitude improvements in test suite duration.

Troubleshooting

No tests found

Make sure your test functions start with agent_test_:

# Correct
def agent_test_my_feature():
    ...

# Wrong - won't be discovered
def test_my_feature():
    ...

def my_test():
    ...

Tests failing with “Bot not found”

Make sure the agent is synced to the backend:

erdo sync-agent path/to/agent.py

Authentication errors

erdo login

Or set environment variables:

export ERDO_ENDPOINT="https://api.erdo.ai"
export ERDO_AUTH_TOKEN="your-token"

Slow test execution

Use replay mode for efficient testing:

# Slower - makes API calls every execution
response = invoke("my-agent", messages=[...])

# Faster - caches responses after first run
response = invoke("my-agent", messages=[...], mode="replay")

Import errors

Make sure the SDK is installed:

cd erdo-agents
uv pip install -e ../erdo-python-sdk

Get Started

Core Concepts

Testing & Validation

Examples & Use Cases

Python SDK

TypeScript SDK

CLI Tools

Pre-built Agents

Advanced

Integrations

​Agent Testing

​Quick Start

​1. Write Test Functions

​2. Run Tests

​3. See Results

​Benefits

​Test Modes

​Live Mode (Default)

​Replay Mode (Recommended)

​Replay Mode with Refresh

​Manual Mode

​Test Helpers

​text_contains

​text_equals

​text_matches

​json_path_equals

​json_path_exists

​has_dataset

​Complete Example

​Best Practices

​1. Use Descriptive Names

​2. Choose the Right Mode

​3. Add Good Error Messages

​4. Test Edge Cases

​5. Group Related Tests

​CLI Reference

​Python API

​Performance Characteristics

​Troubleshooting

​No tests found

​Tests failing with “Bot not found”

​Authentication errors

​Slow test execution

​Import errors

​See Also