IP, token, and service-level rate limiting with Upstash Redis or in-memory backends

Rate Limiting

Flaggr applies rate limiting at three levels to protect the evaluation pipeline and ensure fair usage.

Rate Limits

Level	Limit	Window	Scope
IP	1,000 requests	60 seconds	Per client IP address
Token	5,000 requests	60 seconds	Per API token
Service	10,000 requests	60 seconds	Per service ID

All three limits are checked on the /api/flags/evaluate endpoint. A request must pass all applicable checks.

Response Headers

Every response from rate-limited endpoints includes these headers:

Header	Description	Example
`RateLimit-Limit`	Maximum requests in window	`1000`
`RateLimit-Remaining`	Requests remaining	`842`
`RateLimit-Reset`	Unix timestamp when limit resets	`1721474460`

Rate Limited Response

When any limit is exceeded, you receive:

HTTP/1.1 429 Too Many Requests
RateLimit-Limit: 1000
RateLimit-Remaining: 0
RateLimit-Reset: 1721474460
Retry-After: 45

{
  "error": "Rate limit exceeded",
  "message": "IP rate limit exceeded. Try again in 45 seconds."
}

How Limits Stack

The evaluation endpoint checks limits in order:

IP check — always applied, uses the client's IP address
Token check — applied when an Authorization: Bearer header is present
Service check — applied when a serviceId is in the request body

If any check fails, the request is rejected with 429. The most restrictive limit determines the response headers.

Backend

Upstash Redis (recommended)

UPSTASH_REDIS_REST_URL=https://your-upstash.upstash.io
UPSTASH_REDIS_REST_TOKEN=your-token

When Upstash is configured, rate limits are shared across all serverless instances. This is the recommended configuration for production.

In-Memory (fallback)

When no Upstash configuration is found, rate limiting falls back to an in-memory store. This works for single-instance deployments and local development but does not share state across serverless instances.

Handling Rate Limits in Your Application

Exponential Backoff

async function evaluateWithRetry(flagKey: string, maxRetries = 3) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const response = await fetch('/api/flags/evaluate', {
      method: 'POST',
      body: JSON.stringify({ flagKey, serviceId: 'web-app', defaultValue: false }),
    })
 
    if (response.status !== 429) return response.json()
 
    const retryAfter = parseInt(response.headers.get('Retry-After') || '1')
    const delay = retryAfter * 1000 * Math.pow(2, attempt)
    await new Promise(r => setTimeout(r, delay))
  }
 
  throw new Error('Rate limited after max retries')
}

Use SDK Caching

The SDKs cache evaluations locally. With caching enabled, you'll make far fewer API calls:

const client = createFlaggr({
  serviceId: 'web-app',
  cacheTtl: 10000, // Cache for 10 seconds
})

Use Bulk Evaluation

Instead of evaluating flags one at a time, use bulk evaluation to fetch multiple flags in a single request:

POST /api/connect/bulk-evaluate
 
{
  "serviceId": "web-app",
  "flagKeys": ["checkout-v2", "dark-mode", "new-pricing"],
  "context": { "targetingKey": "user-123" }
}

Use Streaming

SSE and gRPC streaming provide flag values pushed from the server, avoiding repeated evaluation calls entirely:

const provider = new FlaggrWebProvider({
  enableStreaming: true,
  serviceId: 'web-app',
})

Monitoring

Rate limit metrics are available via the Prometheus endpoint:

flaggr_http_request_duration_seconds{status="429"} // Rate-limited requests

Check the Alerting page for built-in alerts that fire when error rates (including 429s) exceed thresholds.

REST API Reference — Endpoint-specific rate limits
Caching Strategy — Reduce API calls with caching
TypeScript SDK — SDK-level caching and retry

Guides

Previous←

Caching Strategy