Built-in alert rules for error rates, latency, gRPC health, and cache availability
Alerting
Flaggr includes built-in alert rules that monitor the health of the evaluation pipeline. Alerts are evaluated on every check and exposed via the /api/alerts endpoint.
Built-In Alert Rules
HighErrorRate
| Field | Value |
|---|---|
| Severity | Critical |
| Condition | Error rate exceeds 1% of total requests |
| Description | Evaluation errors are occurring at an abnormal rate |
Fires when the ratio of flaggr_evaluation_errors_total to flaggr_evaluations_total exceeds 0.01.
SlowP95Latency
| Field | Value |
|---|---|
| Severity | Warning |
| Condition | P95 request latency exceeds 100ms |
| Description | Flag evaluations are slower than expected |
Fires when the 95th percentile of flaggr_http_request_duration_seconds exceeds 100ms.
GrpcConnectionFailure
| Field | Value |
|---|---|
| Severity | Warning |
| Condition | Active gRPC errors detected |
| Description | gRPC streaming connections are failing |
Fires when gRPC connection errors are observed in the metrics.
CacheUnavailable
| Field | Value |
|---|---|
| Severity | Warning |
| Condition | Cache hit rate drops below 50% |
| Description | Cache layer is degraded or unavailable |
Fires when flaggr_cache_operations_total{result="hit"} / total cache operations falls below 0.5.
Querying Alerts
Get All Alerts
GET /api/alerts{
"status": "ok",
"firingCount": 0,
"alerts": [
{
"name": "HighErrorRate",
"status": "resolved",
"severity": "critical",
"description": "Error rate exceeds 1%",
"value": 0.002,
"threshold": 0.01
},
{
"name": "SlowP95Latency",
"status": "resolved",
"severity": "warning",
"description": "P95 latency exceeds 100ms",
"value": 45,
"threshold": 100
},
{
"name": "GrpcConnectionFailure",
"status": "resolved",
"severity": "warning",
"description": "gRPC connection errors detected",
"value": 0,
"threshold": 1
},
{
"name": "CacheUnavailable",
"status": "resolved",
"severity": "warning",
"description": "Cache hit rate below 50%",
"value": 0.95,
"threshold": 0.5
}
],
"evaluatedAt": "2025-07-20T10:00:00Z"
}Get Firing Alerts Only
GET /api/alerts?firing=trueReturns only alerts with status: "firing". The top-level status field reflects overall health:
"ok"— no alerts firing"alerting"— one or more alerts firing
Health Check Integration
The /api/health endpoint includes alert status. A load balancer or uptime monitor can check this endpoint and alert on "degraded" status.
GET /api/health{
"status": "healthy",
...
}Status values: "healthy", "degraded".
Prometheus Metrics
All alert rules are derived from the Prometheus metrics exposed at /api/metrics. You can build custom alerts in Grafana, Datadog, or any Prometheus-compatible monitoring tool:
# Error rate
sum(rate(flaggr_evaluation_errors_total[5m])) / sum(rate(flaggr_evaluations_total[5m]))
# P95 latency
histogram_quantile(0.95, sum(rate(flaggr_http_request_duration_seconds_bucket[5m])) by (le))
# Cache hit rate
sum(rate(flaggr_cache_operations_total{result="hit"}[5m])) / sum(rate(flaggr_cache_operations_total[5m]))
# Active gRPC connections
flaggr_grpc_active_connectionsExternal Alerting
For production deployments, combine Flaggr's built-in alerts with your existing monitoring stack:
Grafana Cloud
Configure alert rules in Grafana using the Prometheus metrics. The Grafana dashboard includes pre-built panels for all key metrics.
Webhook Notifications
Use webhooks to send flag change events to Slack, PagerDuty, or your incident management tool.
Custom Health Check
Poll /api/alerts?firing=true from your monitoring system:
# Simple health check script
RESPONSE=$(curl -s https://flaggr.dev/api/alerts?firing=true)
FIRING=$(echo $RESPONSE | jq '.firingCount')
if [ "$FIRING" -gt 0 ]; then
echo "ALERT: $FIRING alerts firing"
# Send notification
fiRelated
- Caching Strategy — Cache health and circuit breaker
- Rate Limiting — Rate limit monitoring
- REST API Reference — Alert endpoint details