Monitor flag evaluation health with real-time metrics, error rate tracking, and auto-remediation

Flag Health

Flaggr tracks evaluation metrics for every flag, giving you real-time visibility into how flags are performing. The health system detects anomalies — high error rates, sudden drops in evaluation volume, or latency spikes — and surfaces them as health statuses.

Health Statuses

Every flag has a computed health status based on its recent evaluation metrics:

Status	Criteria	What it means
healthy	Error rate < 1%, normal evaluation volume	Everything is working as expected
warning	Error rate 1–5%, or evaluation volume anomaly	Something may need attention
critical	Error rate > 5%, or zero evaluations when expected	Likely a problem — investigate

Flag Health API

Get Health for a Single Flag

curl /api/flags/checkout-v2/health?serviceId=svc-web&projectId=proj-123

Response:

{
  "flagKey": "checkout-v2",
  "serviceId": "svc-web",
  "status": "healthy",
  "summary": {
    "totalEvaluations": 15420,
    "errorCount": 12,
    "errorRate": 0.0008,
    "avgLatencyMs": 2.3,
    "p95LatencyMs": 5.1,
    "p99LatencyMs": 8.7,
    "lastEvaluatedAt": "2026-02-19T14:32:01.000Z"
  },
  "reasonBreakdown": {
    "TARGETING_MATCH": 8200,
    "VARIANT": 4100,
    "DEFAULT": 3100,
    "ERROR": 12,
    "DISABLED": 8
  },
  "timeSeries": [
    {
      "timestamp": "2026-02-19T14:00:00.000Z",
      "evaluations": 450,
      "errors": 1,
      "avgLatencyMs": 2.1
    }
  ]
}

Get Health for All Flags in a Service

curl /api/services/svc-web/health?projectId=proj-123

Response:

{
  "serviceId": "svc-web",
  "overallStatus": "warning",
  "flags": [
    { "flagKey": "checkout-v2", "status": "healthy", "evaluations": 15420, "errorRate": 0.0008 },
    { "flagKey": "search-v3", "status": "warning", "evaluations": 890, "errorRate": 0.034 },
    { "flagKey": "onboarding-flow", "status": "critical", "evaluations": 0, "errorRate": 0 }
  ]
}

The overallStatus reflects the worst status among all flags — if any flag is critical, the service status is critical.

Evaluation Metrics Pipeline

Flaggr collects metrics for every flag evaluation:

Metric	Description
Evaluation count	Total evaluations per time bucket
Error count	Evaluations that returned `ERROR` reason
Error rate	Errors / total evaluations
Latency	Evaluation duration in milliseconds
Reason breakdown	Count per evaluation reason (TARGETING_MATCH, VARIANT, DEFAULT, etc.)

Metrics are bucketed into 1-minute intervals and retained for 24 hours. Older data is automatically evicted.

Time Series Data

The health API returns time-series data for visualization. Each data point represents one minute:

{
  "timeSeries": [
    { "timestamp": "2026-02-19T14:00:00Z", "evaluations": 450, "errors": 1, "avgLatencyMs": 2.1 },
    { "timestamp": "2026-02-19T14:01:00Z", "evaluations": 462, "errors": 0, "avgLatencyMs": 1.9 },
    { "timestamp": "2026-02-19T14:02:00Z", "evaluations": 448, "errors": 3, "avgLatencyMs": 4.2 }
  ]
}

Use this data to plot evaluation volume, error rates, and latency over time in dashboards.

Percentile Calculations

The health API computes latency percentiles from the raw evaluation data:

Percentile	Description
p50	Median latency — typical user experience
p95	95th percentile — most users are faster than this
p99	99th percentile — tail latency

Integrating Health Checks with Rollouts

Progressive rollout safety checks can reference the same metrics that the health API tracks:

{
  "safetyChecks": [
    { "type": "error_rate", "threshold": 0.05, "action": "rollback" },
    { "type": "latency_p99", "threshold": 500, "action": "pause" }
  ]
}

When you report health metrics to a rollout plan, the rollout engine compares them against your safety checks and automatically pauses or rolls back if thresholds are exceeded.

See Progressive Rollouts for details.

Monitoring Best Practices

Set up alerting for flags with critical or warning status
Watch for zero-evaluation flags — a flag that stops being evaluated may indicate a deployment issue
Monitor error rate trends, not just absolute values — a spike matters more than a steady 0.1%
Review stale flags — flags that haven't been evaluated in days may be candidates for cleanup
Use the service health endpoint for dashboards — it gives you a single view of all flags