Skip to main content

IP, token, and service-level rate limiting with Upstash Redis or in-memory backends

Rate Limiting

Flaggr applies rate limiting at three levels to protect the evaluation pipeline and ensure fair usage.

Rate Limits

LevelLimitWindowScope
IP1,000 requests60 secondsPer client IP address
Token5,000 requests60 secondsPer API token
Service10,000 requests60 secondsPer service ID

All three limits are checked on the /api/flags/evaluate endpoint. A request must pass all applicable checks.

Response Headers

Every response from rate-limited endpoints includes these headers:

HeaderDescriptionExample
RateLimit-LimitMaximum requests in window1000
RateLimit-RemainingRequests remaining842
RateLimit-ResetUnix timestamp when limit resets1721474460

Rate Limited Response

When any limit is exceeded, you receive:

HTTP/1.1 429 Too Many Requests
RateLimit-Limit: 1000
RateLimit-Remaining: 0
RateLimit-Reset: 1721474460
Retry-After: 45

{
  "error": "Rate limit exceeded",
  "message": "IP rate limit exceeded. Try again in 45 seconds."
}

How Limits Stack

The evaluation endpoint checks limits in order:

  1. IP check — always applied, uses the client's IP address
  2. Token check — applied when an Authorization: Bearer header is present
  3. Service check — applied when a serviceId is in the request body

If any check fails, the request is rejected with 429. The most restrictive limit determines the response headers.

Backend

UPSTASH_REDIS_REST_URL=https://your-upstash.upstash.io
UPSTASH_REDIS_REST_TOKEN=your-token

When Upstash is configured, rate limits are shared across all serverless instances. This is the recommended configuration for production.

In-Memory (fallback)

When no Upstash configuration is found, rate limiting falls back to an in-memory store. This works for single-instance deployments and local development but does not share state across serverless instances.

Handling Rate Limits in Your Application

Exponential Backoff

async function evaluateWithRetry(flagKey: string, maxRetries = 3) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const response = await fetch('/api/flags/evaluate', {
      method: 'POST',
      body: JSON.stringify({ flagKey, serviceId: 'web-app', defaultValue: false }),
    })
 
    if (response.status !== 429) return response.json()
 
    const retryAfter = parseInt(response.headers.get('Retry-After') || '1')
    const delay = retryAfter * 1000 * Math.pow(2, attempt)
    await new Promise(r => setTimeout(r, delay))
  }
 
  throw new Error('Rate limited after max retries')
}

Use SDK Caching

The SDKs cache evaluations locally. With caching enabled, you'll make far fewer API calls:

const client = createFlaggr({
  serviceId: 'web-app',
  cacheTtl: 10000, // Cache for 10 seconds
})

Use Bulk Evaluation

Instead of evaluating flags one at a time, use bulk evaluation to fetch multiple flags in a single request:

POST /api/connect/bulk-evaluate
 
{
  "serviceId": "web-app",
  "flagKeys": ["checkout-v2", "dark-mode", "new-pricing"],
  "context": { "targetingKey": "user-123" }
}

Use Streaming

SSE and gRPC streaming provide flag values pushed from the server, avoiding repeated evaluation calls entirely:

const provider = new FlaggrWebProvider({
  enableStreaming: true,
  serviceId: 'web-app',
})

Monitoring

Rate limit metrics are available via the Prometheus endpoint:

flaggr_http_request_duration_seconds{status="429"} // Rate-limited requests

Check the Alerting page for built-in alerts that fire when error rates (including 429s) exceed thresholds.