Skip to main content
Quality checks are the core of Muster’s observability. Each check is a boolean assertion about an agent’s output — did the agent do what it was supposed to do?

How checks work

  1. Your agent emits checks via the SDK after each job
  2. Muster stores pass/fail results and aggregates them daily
  3. The Health Heatmap shows pass rates per check per agent
  4. Muster detects degradation trends automatically

Check anatomy

quality.Check(
    check_id="subtotal_arithmetic",   # Snake_case identifier
    severity="HIGH",                   # HIGH | MEDIUM | LOW
    passed=True,                       # bool — did it pass?
    expected="1234.56",                # Optional: what you expected
    actual="1234.56",                  # Optional: what you got
    delta="0.00",                      # Optional: numeric difference
    message="Subtotals match"          # Optional: free text
)

Severity levels

SeverityMeaningExample
HIGHCore business logic — failure is unacceptableArithmetic check on invoice totals
MEDIUMImportant quality signal — repeated failure needs investigationConfidence score above threshold
LOWNice-to-have signalResponse style consistency

Trend detection

Muster calculates trend direction over a rolling 30-day window:
TrendMeaning
StablePass rate consistent within ±3%
ImprovingPass rate increasing
DegradingPass rate declining over 7+ days
Degrading RecentSharp decline in last 7 days
Degrading checks are highlighted in orange/red in the Health Heatmap and trigger anomaly events.

Industry benchmarks

Muster anonymously aggregates pass rates across all deployments to build industry benchmarks by check ID and agent category. You can see how your agent compares to peers in the same category — e.g. “your invoice processor’s arithmetic check is in the bottom 20% of finance agents.” Benchmark data is anonymous and opt-in. You can manage your participation in Settings → Benchmark.