Skip to main content

Monitor flag evaluation health with real-time metrics, error rate tracking, and auto-remediation

Flag Health

Flaggr tracks evaluation metrics for every flag, giving you real-time visibility into how flags are performing. The health system detects anomalies — high error rates, sudden drops in evaluation volume, or latency spikes — and surfaces them as health statuses.

Health Statuses

Every flag has a computed health status based on its recent evaluation metrics:

StatusCriteriaWhat it means
healthyError rate < 1%, normal evaluation volumeEverything is working as expected
warningError rate 1–5%, or evaluation volume anomalySomething may need attention
criticalError rate > 5%, or zero evaluations when expectedLikely a problem — investigate

Flag Health API

Get Health for a Single Flag

curl /api/flags/checkout-v2/health?serviceId=svc-web&projectId=proj-123

Response:

{
  "flagKey": "checkout-v2",
  "serviceId": "svc-web",
  "status": "healthy",
  "summary": {
    "totalEvaluations": 15420,
    "errorCount": 12,
    "errorRate": 0.0008,
    "avgLatencyMs": 2.3,
    "p95LatencyMs": 5.1,
    "p99LatencyMs": 8.7,
    "lastEvaluatedAt": "2026-02-19T14:32:01.000Z"
  },
  "reasonBreakdown": {
    "TARGETING_MATCH": 8200,
    "VARIANT": 4100,
    "DEFAULT": 3100,
    "ERROR": 12,
    "DISABLED": 8
  },
  "timeSeries": [
    {
      "timestamp": "2026-02-19T14:00:00.000Z",
      "evaluations": 450,
      "errors": 1,
      "avgLatencyMs": 2.1
    }
  ]
}

Get Health for All Flags in a Service

curl /api/services/svc-web/health?projectId=proj-123

Response:

{
  "serviceId": "svc-web",
  "overallStatus": "warning",
  "flags": [
    { "flagKey": "checkout-v2", "status": "healthy", "evaluations": 15420, "errorRate": 0.0008 },
    { "flagKey": "search-v3", "status": "warning", "evaluations": 890, "errorRate": 0.034 },
    { "flagKey": "onboarding-flow", "status": "critical", "evaluations": 0, "errorRate": 0 }
  ]
}

The overallStatus reflects the worst status among all flags — if any flag is critical, the service status is critical.

Evaluation Metrics Pipeline

Flaggr collects metrics for every flag evaluation:

MetricDescription
Evaluation countTotal evaluations per time bucket
Error countEvaluations that returned ERROR reason
Error rateErrors / total evaluations
LatencyEvaluation duration in milliseconds
Reason breakdownCount per evaluation reason (TARGETING_MATCH, VARIANT, DEFAULT, etc.)

Metrics are bucketed into 1-minute intervals and retained for 24 hours. Older data is automatically evicted.

Time Series Data

The health API returns time-series data for visualization. Each data point represents one minute:

{
  "timeSeries": [
    { "timestamp": "2026-02-19T14:00:00Z", "evaluations": 450, "errors": 1, "avgLatencyMs": 2.1 },
    { "timestamp": "2026-02-19T14:01:00Z", "evaluations": 462, "errors": 0, "avgLatencyMs": 1.9 },
    { "timestamp": "2026-02-19T14:02:00Z", "evaluations": 448, "errors": 3, "avgLatencyMs": 4.2 }
  ]
}

Use this data to plot evaluation volume, error rates, and latency over time in dashboards.

Percentile Calculations

The health API computes latency percentiles from the raw evaluation data:

PercentileDescription
p50Median latency — typical user experience
p9595th percentile — most users are faster than this
p9999th percentile — tail latency

Integrating Health Checks with Rollouts

Progressive rollout safety checks can reference the same metrics that the health API tracks:

{
  "safetyChecks": [
    { "type": "error_rate", "threshold": 0.05, "action": "rollback" },
    { "type": "latency_p99", "threshold": 500, "action": "pause" }
  ]
}

When you report health metrics to a rollout plan, the rollout engine compares them against your safety checks and automatically pauses or rolls back if thresholds are exceeded.

See Progressive Rollouts for details.

Monitoring Best Practices

  • Set up alerting for flags with critical or warning status
  • Watch for zero-evaluation flags — a flag that stops being evaluated may indicate a deployment issue
  • Monitor error rate trends, not just absolute values — a spike matters more than a steady 0.1%
  • Review stale flags — flags that haven't been evaluated in days may be candidates for cleanup
  • Use the service health endpoint for dashboards — it gives you a single view of all flags