IP, token, and service-level rate limiting with Upstash Redis or in-memory backends
Rate Limiting
Flaggr applies rate limiting at three levels to protect the evaluation pipeline and ensure fair usage.
Rate Limits
| Level | Limit | Window | Scope |
|---|---|---|---|
| IP | 1,000 requests | 60 seconds | Per client IP address |
| Token | 5,000 requests | 60 seconds | Per API token |
| Service | 10,000 requests | 60 seconds | Per service ID |
All three limits are checked on the /api/flags/evaluate endpoint. A request must pass all applicable checks.
Response Headers
Every response from rate-limited endpoints includes these headers:
| Header | Description | Example |
|---|---|---|
RateLimit-Limit | Maximum requests in window | 1000 |
RateLimit-Remaining | Requests remaining | 842 |
RateLimit-Reset | Unix timestamp when limit resets | 1721474460 |
Rate Limited Response
When any limit is exceeded, you receive:
HTTP/1.1 429 Too Many Requests
RateLimit-Limit: 1000
RateLimit-Remaining: 0
RateLimit-Reset: 1721474460
Retry-After: 45
{
"error": "Rate limit exceeded",
"message": "IP rate limit exceeded. Try again in 45 seconds."
}
How Limits Stack
The evaluation endpoint checks limits in order:
- IP check — always applied, uses the client's IP address
- Token check — applied when an
Authorization: Bearerheader is present - Service check — applied when a
serviceIdis in the request body
If any check fails, the request is rejected with 429. The most restrictive limit determines the response headers.
Backend
Upstash Redis (recommended)
UPSTASH_REDIS_REST_URL=https://your-upstash.upstash.io
UPSTASH_REDIS_REST_TOKEN=your-tokenWhen Upstash is configured, rate limits are shared across all serverless instances. This is the recommended configuration for production.
In-Memory (fallback)
When no Upstash configuration is found, rate limiting falls back to an in-memory store. This works for single-instance deployments and local development but does not share state across serverless instances.
Handling Rate Limits in Your Application
Exponential Backoff
async function evaluateWithRetry(flagKey: string, maxRetries = 3) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const response = await fetch('/api/flags/evaluate', {
method: 'POST',
body: JSON.stringify({ flagKey, serviceId: 'web-app', defaultValue: false }),
})
if (response.status !== 429) return response.json()
const retryAfter = parseInt(response.headers.get('Retry-After') || '1')
const delay = retryAfter * 1000 * Math.pow(2, attempt)
await new Promise(r => setTimeout(r, delay))
}
throw new Error('Rate limited after max retries')
}Use SDK Caching
The SDKs cache evaluations locally. With caching enabled, you'll make far fewer API calls:
const client = createFlaggr({
serviceId: 'web-app',
cacheTtl: 10000, // Cache for 10 seconds
})Use Bulk Evaluation
Instead of evaluating flags one at a time, use bulk evaluation to fetch multiple flags in a single request:
POST /api/connect/bulk-evaluate
{
"serviceId": "web-app",
"flagKeys": ["checkout-v2", "dark-mode", "new-pricing"],
"context": { "targetingKey": "user-123" }
}Use Streaming
SSE and gRPC streaming provide flag values pushed from the server, avoiding repeated evaluation calls entirely:
const provider = new FlaggrWebProvider({
enableStreaming: true,
serviceId: 'web-app',
})Monitoring
Rate limit metrics are available via the Prometheus endpoint:
flaggr_http_request_duration_seconds{status="429"} // Rate-limited requests
Check the Alerting page for built-in alerts that fire when error rates (including 429s) exceed thresholds.
Related
- REST API Reference — Endpoint-specific rate limits
- Caching Strategy — Reduce API calls with caching
- TypeScript SDK — SDK-level caching and retry