Skip to main content

Built-in alert rules for error rates, latency, gRPC health, and cache availability

Alerting

Flaggr includes built-in alert rules that monitor the health of the evaluation pipeline. Alerts are evaluated on every check and exposed via the /api/alerts endpoint.

Built-In Alert Rules

HighErrorRate

FieldValue
SeverityCritical
ConditionError rate exceeds 1% of total requests
DescriptionEvaluation errors are occurring at an abnormal rate

Fires when the ratio of flaggr_evaluation_errors_total to flaggr_evaluations_total exceeds 0.01.

SlowP95Latency

FieldValue
SeverityWarning
ConditionP95 request latency exceeds 100ms
DescriptionFlag evaluations are slower than expected

Fires when the 95th percentile of flaggr_http_request_duration_seconds exceeds 100ms.

GrpcConnectionFailure

FieldValue
SeverityWarning
ConditionActive gRPC errors detected
DescriptiongRPC streaming connections are failing

Fires when gRPC connection errors are observed in the metrics.

CacheUnavailable

FieldValue
SeverityWarning
ConditionCache hit rate drops below 50%
DescriptionCache layer is degraded or unavailable

Fires when flaggr_cache_operations_total{result="hit"} / total cache operations falls below 0.5.

Querying Alerts

Get All Alerts

GET /api/alerts
{
  "status": "ok",
  "firingCount": 0,
  "alerts": [
    {
      "name": "HighErrorRate",
      "status": "resolved",
      "severity": "critical",
      "description": "Error rate exceeds 1%",
      "value": 0.002,
      "threshold": 0.01
    },
    {
      "name": "SlowP95Latency",
      "status": "resolved",
      "severity": "warning",
      "description": "P95 latency exceeds 100ms",
      "value": 45,
      "threshold": 100
    },
    {
      "name": "GrpcConnectionFailure",
      "status": "resolved",
      "severity": "warning",
      "description": "gRPC connection errors detected",
      "value": 0,
      "threshold": 1
    },
    {
      "name": "CacheUnavailable",
      "status": "resolved",
      "severity": "warning",
      "description": "Cache hit rate below 50%",
      "value": 0.95,
      "threshold": 0.5
    }
  ],
  "evaluatedAt": "2025-07-20T10:00:00Z"
}

Get Firing Alerts Only

GET /api/alerts?firing=true

Returns only alerts with status: "firing". The top-level status field reflects overall health:

  • "ok" — no alerts firing
  • "alerting" — one or more alerts firing

Health Check Integration

The /api/health endpoint includes alert status. A load balancer or uptime monitor can check this endpoint and alert on "degraded" status.

GET /api/health
{
  "status": "healthy",
  ...
}

Status values: "healthy", "degraded".

Prometheus Metrics

All alert rules are derived from the Prometheus metrics exposed at /api/metrics. You can build custom alerts in Grafana, Datadog, or any Prometheus-compatible monitoring tool:

# Error rate
sum(rate(flaggr_evaluation_errors_total[5m])) / sum(rate(flaggr_evaluations_total[5m]))
 
# P95 latency
histogram_quantile(0.95, sum(rate(flaggr_http_request_duration_seconds_bucket[5m])) by (le))
 
# Cache hit rate
sum(rate(flaggr_cache_operations_total{result="hit"}[5m])) / sum(rate(flaggr_cache_operations_total[5m]))
 
# Active gRPC connections
flaggr_grpc_active_connections

External Alerting

For production deployments, combine Flaggr's built-in alerts with your existing monitoring stack:

Grafana Cloud

Configure alert rules in Grafana using the Prometheus metrics. The Grafana dashboard includes pre-built panels for all key metrics.

Webhook Notifications

Use webhooks to send flag change events to Slack, PagerDuty, or your incident management tool.

Custom Health Check

Poll /api/alerts?firing=true from your monitoring system:

# Simple health check script
RESPONSE=$(curl -s https://flaggr.dev/api/alerts?firing=true)
FIRING=$(echo $RESPONSE | jq '.firingCount')
if [ "$FIRING" -gt 0 ]; then
  echo "ALERT: $FIRING alerts firing"
  # Send notification
fi