Skip to main content

Data Quality Checker Agent

The Data Quality Checker Agent validates data integrity, identifies quality issues, and ensures data meets specified standards. It performs comprehensive checks across multiple dimensions of data quality.

Quick Start

from erdo.actions import bot

# Check data quality
result = bot.invoke(
    bot_name="data quality checker",
    parameters={
        "dataset": "customer_data.csv",
        "rules": ["completeness", "validity", "consistency"]
    }
)

Quality Dimensions

Completeness

Identifies missing values and null data

Validity

Validates data against format and domain rules

Consistency

Checks for logical consistency across records

Accuracy

Verifies data against reference sources

Validation Rules

Built-in Rules

  • Email Format: Valid email address patterns
  • Phone Numbers: International phone number formats
  • Dates: Valid date ranges and formats
  • Numeric Ranges: Min/max value constraints
  • Text Patterns: Regex pattern matching
  • Referential Integrity: Foreign key constraints

Custom Rules

# Define custom validation rules
custom_rules = {
    "customer_id": {
        "pattern": r"^CUST[0-9]{6}$",
        "required": True
    },
    "revenue": {
        "min_value": 0,
        "max_value": 1000000
    },
    "status": {
        "allowed_values": ["active", "inactive", "pending"]
    }
}

Quality Report

{
  "overall_score": 0.87,
  "total_records": 10000,
  "issues_found": 1250,
  "dimensions": {
    "completeness": 0.92,
    "validity": 0.85,
    "consistency": 0.89,
    "accuracy": 0.84
  }
}

Configuration

# Standard quality assessment
result = bot.invoke(
    bot_name="data quality checker",
    parameters={
        "dataset": "data.csv"
    }
)
# Apply specific validation rules
result = bot.invoke(
    bot_name="data quality checker",
    parameters={
        "dataset": "customer_data.csv",
        "rules": {
            "email": {"format": "email", "required": True},
            "age": {"min": 0, "max": 120},
            "country": {"reference_table": "countries"}
        }
    }
)
# Set quality thresholds
result = bot.invoke(
    bot_name="data quality checker",
    parameters={
        "dataset": "sales_data.csv",
        "thresholds": {
            "completeness": 0.95,
            "validity": 0.90,
            "overall_score": 0.85
        },
        "fail_on_threshold": True
    }
)

Integration Features

  • CI/CD Integration: Automated quality gates
  • Real-time Monitoring: Continuous quality assessment
  • Alert System: Notifications for quality degradation
  • Reporting Dashboard: Visual quality metrics
  • Data Lineage: Track quality across data pipeline

Best Practices

  • Start with basic rules, then add complexity - Use business context for rule definitions - Balance strictness with practicality - Document rule rationale and exceptions