Observability for feature flags — health scoring, impact analysis, evaluation debugging, alerts, and real-time streaming

Monitoring Flags

This guide covers the observability tools built into Flaggr: health scoring, project-level dashboards, impact analysis before changes, evaluation debugging with trace, alerting, and real-time event streaming.

Health Scoring

Every flag with evaluation data receives a health score from 0 (critical) to 100 (healthy). Scores are computed from four weighted dimensions.

Scoring dimensions

Dimension	Weight	What It Measures	Warning	Critical
Error rate	35%	Fraction of evaluations that return errors	> 1%	> 5%
Evaluation latency	25%	P99 evaluation time in milliseconds	> 50ms	> 200ms
Volume anomaly	25%	Deviation from baseline evaluations/min	> 30% drop	> 70% drop
Staleness	15%	Days since last evaluation	> 7 days	> 30 days

Health statuses

Score	Status	Meaning
80-100	`healthy`	All dimensions within normal range
40-79	`warning`	One or more dimensions approaching threshold
0-39	`critical`	Significant issues detected
—	`unknown`	No evaluation data available

Viewing flag health

Via the API

# Single flag health
curl "https://flaggr.dev/api/flags/new-checkout-flow/health?serviceId=svc-abc123" \
  -H "Authorization: Bearer <token>"

Response:

{
  "flagKey": "new-checkout-flow",
  "serviceId": "svc-abc123",
  "overallScore": 92,
  "status": "healthy",
  "dimensions": {
    "errorRate": { "score": 100, "value": 0, "threshold": { "warning": 0.01, "critical": 0.05 } },
    "evaluationLatency": { "score": 85, "value": 12.3, "threshold": { "warning": 50, "critical": 200 } },
    "evaluationVolume": { "score": 90, "value": 45.2, "baseline": 42.0, "deviationPercent": 7.6 },
    "staleness": { "score": 100, "value": 0.001, "daysSinceLastEval": 0.001 }
  },
  "lastUpdated": "2026-03-01T12:00:00.000Z"
}

Each dimension exposes its raw value and thresholds so you can see exactly why a score is what it is.

Volume baselines

The health scorer automatically learns baselines from historical evaluation data. When a flag's evaluation volume drops significantly from its baseline, the volume anomaly dimension decreases. This catches scenarios like:

A deployment removed a flag check
A service stopped sending evaluation requests
A routing change diverted traffic away

Project Health Dashboard

The project health dashboard aggregates health across all services and flags in a project.

Accessing the dashboard

Navigate to Project > Health in the console, or query the API:

curl "https://flaggr.dev/api/projects/proj-123/health" \
  -H "Authorization: Bearer <token>"

Dashboard response

{
  "overallScore": 87,
  "status": "healthy",
  "services": [
    {
      "serviceId": "svc-web",
      "serviceName": "Web Frontend",
      "overallScore": 92,
      "status": "healthy",
      "flagCount": 12,
      "healthyCount": 10,
      "warningCount": 2,
      "criticalCount": 0
    },
    {
      "serviceId": "svc-api",
      "serviceName": "API Gateway",
      "overallScore": 78,
      "status": "warning",
      "flagCount": 8,
      "healthyCount": 5,
      "warningCount": 2,
      "criticalCount": 1
    }
  ],
  "topIssues": [
    {
      "flagKey": "experimental-parser",
      "serviceId": "svc-api",
      "score": 35,
      "status": "critical",
      "worstDimension": "errorRate"
    }
  ],
  "totalFlags": 20,
  "totalEvaluations": 1456000,
  "projectId": "proj-123",
  "lastUpdated": "2026-03-01T12:00:00.000Z"
}

What to look for

Top issues — flags with the lowest health scores, sorted worst-first. The worstDimension field tells you which metric is causing the problem.
Service breakdown — quickly spot which service has the most warnings or critical flags.
Total evaluations — 24-hour evaluation volume across the project.

The dashboard auto-refreshes every 60 seconds.

Impact Analysis

Before toggling or changing a flag, use impact analysis to understand how many users and requests would be affected.

Querying impact

curl "https://flaggr.dev/api/flags/new-checkout-flow/impact?serviceId=svc-abc123&environment=production" \
  -H "Authorization: Bearer <token>"

Impact response

{
  "flagKey": "new-checkout-flow",
  "serviceId": "svc-abc123",
  "environment": "production",
  "totalEvaluations": 84200,
  "estimatedUniqueUsers": 1250,
  "currentValueBreakdown": { "true": 63150, "false": 21050 },
  "affectedPercentage": 75,
  "evaluationsPerMinute": 58.5,
  "peakEvaluationsPerMinute": 142,
  "riskLevel": "medium",
  "summary": "~84.2K evaluations in last 24h. 58.5/min. ~75% of evaluations would change value.",
  "windowMs": 86400000
}

Risk levels

Level	Criteria	Recommendation
`low`	< 10 evals/min and < 10K total	Safe to change directly
`medium`	10-100 evals/min or 10K-100K total	Consider a gradual rollout
`high`	100-1000 evals/min or > 100K total	Use staged rollout with monitoring
`critical`	> 1000 evals/min or > 5% error rate	Requires careful planning

Using impact data

The impact preview appears in the flag detail panel above the toggle button. Before toggling a high-traffic flag:

Check the evaluationsPerMinute — how actively is this flag being evaluated?
Review currentValueBreakdown — what percentage of users are getting each value?
Consider affectedPercentage — for boolean flags, this shows what fraction would see a change
If riskLevel is high or critical, use a gradual rollout instead of an instant toggle

Evaluation Debugging

The evaluation debugger provides an "EXPLAIN ANALYZE" view of flag evaluation — every decision point the evaluator walks through, with timing and match details.

Debug endpoint

curl -X POST "https://flaggr.dev/api/flags/evaluate-debug" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "flagKey": "new-checkout-flow",
    "serviceId": "svc-abc123",
    "environment": "production",
    "defaultValue": false,
    "context": {
      "targetingKey": "user-12345",
      "plan": "pro",
      "country": "US"
    }
  }'

Trace response

{
  "flagKey": "new-checkout-flow",
  "value": true,
  "reason": "TARGETING_MATCH",
  "trace": {
    "flagKey": "new-checkout-flow",
    "totalMs": 3,
    "result": { "value": true, "reason": "TARGETING_MATCH" },
    "steps": [
      {
        "type": "disabled_check",
        "label": "Flag is enabled",
        "matched": false,
        "durationMs": 0,
        "details": {}
      },
      {
        "type": "override",
        "label": "Override \"QA testers\" (priority 10)",
        "matched": false,
        "durationMs": 0,
        "details": { "identifiers": ["user-qa-1", "user-qa-2"], "targetingKey": "user-12345" }
      },
      {
        "type": "targeting_rule",
        "label": "Rule \"beta-users\" matched",
        "matched": true,
        "durationMs": 0,
        "details": {
          "ruleId": "beta-users",
          "conditions": [
            { "property": "plan", "operator": "in", "expected": ["pro", "enterprise"], "actual": "pro", "matched": true }
          ],
          "value": true
        }
      }
    ]
  }
}

Trace step types

Step Type	Description
`disabled_check`	Whether the flag is enabled or disabled
`override`	Per-user override check (one step per override)
`prerequisite`	Prerequisite flag evaluation result
`mutual_exclusion`	Mutual exclusion group constraint check
`experiment`	Running experiment variant assignment
`schedule_check`	Whether a rule's schedule window is active
`targeting_rule`	Targeting rule evaluation with condition details
`rollout_check`	Percentage rollout hash bucket check
`variant`	Weighted variant selection
`default`	Fell through to default value

Using the trace in the evaluate endpoint

You can also get a trace from the regular evaluate endpoint by adding _trace: true:

curl -X POST "https://flaggr.dev/api/flags/evaluate" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "flagKey": "new-checkout-flow",
    "serviceId": "svc-abc123",
    "defaultValue": false,
    "context": { "targetingKey": "user-12345" },
    "_trace": true
  }'

The response includes a _debug field with the trace. The trace is only returned when explicitly requested — it adds no overhead to normal evaluations.

Debugging with the trace viewer

The dashboard includes a visual trace viewer component that renders the trace as a collapsible tree. Each step shows:

Green/red icon indicating matched/not matched
Step label describing the decision
Expandable details with raw condition values, hash buckets, and timing

Alerting

Flaggr includes a configurable alerting system that monitors evaluation metrics and notifies you when thresholds are breached.

Alert channels

Alert channels define where notifications are sent. Supported channel types:

Type	Description
`slack`	Slack webhook URL
`email`	Email address
`webhook`	Custom HTTP endpoint
`pagerduty`	PagerDuty integration key

Creating a channel

curl -X POST "https://flaggr.dev/api/projects/proj-123/alert-channels" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Engineering Slack",
    "type": "slack",
    "config": {
      "webhookUrl": "https://hooks.slack.com/services/T.../B.../xxx"
    },
    "enabled": true
  }'

Testing a channel

curl -X POST "https://flaggr.dev/api/projects/proj-123/alert-channels/ch-123/test" \
  -H "Authorization: Bearer <token>"

Alert rules

Alert rules define what conditions trigger notifications.

Creating a rule

curl -X POST "https://flaggr.dev/api/projects/proj-123/alerts" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "High Error Rate",
    "description": "Alert when evaluation error rate exceeds 1%",
    "metric": "error_rate",
    "operator": "above",
    "threshold": 1,
    "severity": "critical",
    "channelIds": ["ch-123"],
    "cooldownMinutes": 15,
    "enabled": true
  }'

Alert delivery pipeline

When a rule fires, the alert passes through several reliability layers:

Deduplication — SHA-256 hash of ruleId + time-bucket prevents duplicate alerts within the cooldown window
Retry — 3 attempts with exponential backoff (1s, 2s, 4s). Non-retryable 4xx errors fail immediately.
Circuit breaker — 3 consecutive failures per channel open the circuit for 5 minutes, then half-open for a probe
Dead letter — Failed deliveries are stored (capped at 200 per project) for later inspection

Viewing alert firings

curl "https://flaggr.dev/api/projects/proj-123/alert-firings?limit=20" \
  -H "Authorization: Bearer <token>"

Remediation rules

For automated responses, configure remediation rules via the health scorer:

Action	Description
`disable_flag`	Automatically disable a flag when health degrades
`rollback_to_version`	Roll back to a specific version snapshot
`reduce_rollout`	Reduce rollout percentage to limit blast radius
`alert_only`	Only send a notification (no automated action)

Remediation rules support cooldownMinutes and requireApproval to prevent runaway automation.

Real-Time Streaming

Flag changes are broadcast in real time via Server-Sent Events (SSE) and Redis Pub/Sub.

SSE endpoint

curl -N "https://flaggr.dev/api/flags/stream?serviceId=svc-abc123" \
  -H "Authorization: Bearer <token>"

Events are sent as standard SSE:

event: FLAG_UPDATED
data: {"key":"new-checkout-flow","enabled":true,"defaultValue":true,...}

event: FLAG_TOGGLED
data: {"key":"new-checkout-flow","enabled":false,"defaultValue":false,...}

event: FLAG_CREATED
data: {"key":"new-feature","enabled":false,...}

event: FLAG_DELETED
data: {"key":"old-feature","serviceId":"svc-abc123"}

Event types

Event	When
`FLAG_CREATED`	A new flag is created in this service
`FLAG_UPDATED`	A flag's configuration is changed
`FLAG_TOGGLED`	A flag is toggled on or off
`FLAG_DELETED`	A flag is deleted

Delivery guarantees

Primary: Redis Pub/Sub delivers events to all connected SSE clients within 10-50ms
Fallback: When Redis is unavailable, clients fall back to 5-second polling
Keepalive: A ping comment (: keepalive) is sent every 30 seconds to keep connections alive

Using SSE in your application

const eventSource = new EventSource(
  "/api/flags/stream?serviceId=svc-abc123",
  { headers: { Authorization: `Bearer ${token}` } }
);
 
eventSource.addEventListener("FLAG_TOGGLED", (event) => {
  const flag = JSON.parse(event.data);
  console.log(`Flag ${flag.key} is now ${flag.enabled ? "on" : "off"}`);
  // Update local flag cache
});

Connect / gRPC-Web streaming

For binary-efficient streaming, Flaggr also supports the Connect protocol at /api/grpc-web/[...path]. The Connect provider resolves flags via gRPC-Web and subscribes to real-time updates. See the Real-Time Updates guide for Connect setup.

Audit Logging

Every flag mutation (create, update, toggle, delete) generates an audit log entry containing:

Who — user ID, name, email
What — the action type and changed fields
When — timestamp
Before/After — previous and new flag state

Audit logs are captured asynchronously (fire-and-forget) so they don't slow down API responses.

Version History

Each flag change creates a version snapshot. Versions enable:

Reviewing history — see what changed and when
Rollback — restore a previous configuration
Blame — who made each change

Versions record the full flag state, the action type (create, update, toggle), and the user who made the change. See the Versioning guide for rollback instructions.

Putting It All Together

A typical monitoring workflow:

Before change — check impact analysis for the flag
Make change — toggle or update the flag (use gradual rollout for high-risk changes)
Monitor health — watch the flag's health score and the project dashboard
Debug issues — if something goes wrong, use the evaluation debugger to trace a specific user's evaluation
Get alerted — configure alert rules to notify you of degraded health
Auto-remediate — set up remediation rules to automatically reduce blast radius

Creating Flags — initial flag setup
Managing Flags — day-to-day operations
Flag Health — deep dive into health scoring configuration
Real-Time Updates — SSE and Connect streaming setup
Troubleshooting — common issues and solutions

Guides

Previous←

Targeting Rules