Three-tier caching architecture with L1 in-process, L2 Redis/KV, and L3 storage — TTLs, invalidation, and circuit breakers
Caching Strategy
Flaggr uses a three-tier caching architecture designed for sub-millisecond flag evaluations with strong consistency guarantees.
Cache Hierarchy
Request → L1 (In-Process LRU) → L2 (Redis / Vercel KV) → L3 (Firestore / File Storage)
~0.01ms ~1-5ms ~10-50ms
The hierarchy operates as a write-through, read-aside cache. On a read miss at any tier, the value is fetched from the next tier down and back-populated into all higher tiers. This means the first request for a cold flag pays the full L3 latency, but every subsequent request within the TTL window is served from L1 or L2.
┌─────────────────────────────────────────────────────────┐
│ Instance A Instance B │
│ ┌─────────┐ ┌─────────┐ │
│ │ L1 LRU │ │ L1 LRU │ Per-instance │
│ │ 500 keys│ │ 500 keys│ memory cache │
│ └────┬────┘ └────┬────┘ │
│ │ │ │
│ └──────────┬─────────────┘ │
│ │ │
│ ┌──────┴──────┐ │
│ │ L2 Redis │ Shared across all instances │
│ │ (circuit │ Protected by circuit breaker │
│ │ breaker) │ │
│ └──────┬──────┘ │
│ │ │
│ ┌──────┴──────┐ │
│ │ L3 Firestore│ Source of truth │
│ │ / File Store│ Always consistent │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────┘
L1 -- In-Process LRU
A bounded, per-instance LRU (Least Recently Used) cache. Provides sub-millisecond reads for hot flags.
- Latency: < 0.01ms
- Capacity: 500 entries per instance (configurable via
CACHE_L1_MAX_SIZE) - Scope: Per serverless instance / process
- Eviction: LRU when capacity is reached
- Consistency: Eventual (stale for up to TTL)
The L1 cache is an in-memory Map with LRU eviction. Because each serverless instance has its own L1, two instances may briefly return different values for the same flag during TTL windows. This is by design -- the consistency tradeoff buys sub-millisecond reads for the vast majority of evaluations.
L2 -- Redis / Vercel KV
Shared remote cache accessible across all instances. Protected by a circuit breaker.
- Latency: 1-5ms
- Scope: Global (shared across all instances)
- Backend priority:
- Direct Redis (
REDIS_URLorREDIS_READ_URL/REDIS_WRITE_URL) - Vercel KV (
KV_REST_API_URL) - In-memory Map (fallback for local dev)
- Direct Redis (
When using Redis with read replicas, Flaggr routes reads to the replica and writes to the primary. This reduces load on the primary and improves read latency when the replica is geographically closer.
L3 -- Storage
The source of truth. Firestore in production, file-based storage for local development.
- Latency: 10-50ms
- Always consistent
Cache Planes
Flaggr separates caches into two planes with different consistency requirements:
| Plane | Prefix | Use Case | TTL Range |
|---|---|---|---|
| Read | read: | Flags, configs, projects | 2-5 min |
| Write | write: | Sessions, CSRF tokens, rate limits | 1-30 min |
The read plane tolerates brief staleness (a flag update might take up to 2 minutes to propagate everywhere). The write plane handles stateful resources where TTLs are dictated by security requirements (session duration, CSRF validity).
Default TTLs
| Resource | L1 TTL | L2 TTL | Plane | Rationale |
|---|---|---|---|---|
| Projects | 300s (5 min) | 300s (5 min) | Read | Rarely change |
| Services | 180s (3 min) | 180s (3 min) | Read | Moderate change frequency |
| Flags | 60s (1 min) | 120s (2 min) | Read | Balance freshness vs. load |
| Flag rules / targeting | 60s (1 min) | 120s (2 min) | Read | Same as flags |
| User data | 300s (5 min) | 300s (5 min) | Read | Profile data, rarely changes |
| Sessions | 1800s (30 min) | 1800s (30 min) | Write | Security: bounded lifetime |
| CSRF tokens | 300s (5 min) | 300s (5 min) | Write | Single-use, short-lived |
| Rate limits | 60s (1 min) | 60s (1 min) | Write | Window-based, must expire |
L1 TTL is always less than or equal to L2 TTL. This ensures that when L1 expires and falls through to L2, L2 still has a valid entry. The L1 TTL for flags (60s) is intentionally shorter than L2 (120s) to reduce staleness on hot paths without increasing L2 read load.
Cache Operations
Read Path
cacheGet(key, plane?)
├─ Check L1 (LRU)
│ ├─ Hit → return value
│ └─ Miss → Check L2
│ ├─ L2 Hit → populate L1, return value
│ └─ L2 Miss → return null (caller fetches from L3)
Write Path
cacheSet(key, value, ttl, plane?)
├─ Write to L1
└─ Write to L2 (async, fire-and-forget)
Invalidation
cacheDel(key, plane?)
├─ Delete from L1
└─ Delete from L2
cacheDelPattern(pattern, plane?)
├─ Pattern-match and delete from L1
└─ Pattern-match and delete from L2
Cache Key Patterns
project:{projectId}
flags:service:{serviceId}
flag:{serviceId}:{flagKey}:{environment}
user:{userId}
session:{sessionId}
csrf:{token}
rate_limit:{identifier}
Circuit Breaker
The L2 cache is protected by a circuit breaker to prevent cascading failures when Redis/KV is unavailable.
States
CLOSED → (failures exceed threshold) → OPEN → (timeout expires) → HALF-OPEN
↑ │
└────────── (success in half-open) ──────────────────────────────────┘
- Closed: Normal operation. All L2 calls proceed.
- Open: L2 calls are skipped entirely. Falls back to stale L1 entries.
- Half-open: A single test request is sent to L2. If it succeeds, the circuit closes. If it fails, the circuit reopens.
Circuit Breaker Configuration
| Parameter | Default | Description |
|---|---|---|
| Failure threshold | 5 | Consecutive failures before opening |
| Open timeout | 30s | Time before transitioning to half-open |
| Half-open max attempts | 1 | Test requests before closing |
| Monitored exceptions | Timeout, ConnectionRefused | Errors that count as failures |
When the circuit breaker is open, evaluations continue using L1 cache entries (potentially stale) rather than failing. This ensures flag evaluations remain fast even during cache infrastructure outages. The flaggr_cache_circuit_breaker_state metric tracks the current state.
Invalidation Strategies
On Flag Mutation (Write-Through)
When a flag is created, updated, deleted, or toggled:
- Immediate (local):
cacheDel(CacheKeys.serviceFlags(serviceId))on the handling instance - Broadcast (cross-instance): Pub/Sub message triggers invalidation on all other instances
- TTL (safety net): Natural expiration ensures staleness is bounded even if broadcast fails
Flag toggle (instance A)
├─ Delete from L1 (instance A) [0ms]
├─ Delete from L2 (Redis) [1-3ms]
└─ Publish invalidation event [2-5ms]
├─ Instance B receives event
│ └─ Delete from L1 (instance B) [0ms]
└─ Instance C receives event
└─ Delete from L1 (instance C) [0ms]
On-Demand
// Clear all flags for a service
await cacheDelPattern('flags:service:web-app*')
// Clear everything for a plane
await cacheClear('read')Stampede Prevention
When many instances simultaneously experience an L1 miss for the same key, they could all hit L2 or L3 at once (a cache stampede). Flaggr prevents this with:
- L2 read coalescing: Concurrent requests for the same key within a 10ms window are coalesced into a single L2 read
- Jittered TTLs: L1 TTLs are jittered by +/- 10% to prevent synchronized expiration across instances
Health Check
Monitor cache health per plane:
const health = await cacheHealthCheck('read')
// { working: true, testTime: 2 }Configuration
Redis (recommended for production)
REDIS_URL=redis://your-redis:6379For read replicas:
REDIS_READ_URL=redis://read-replica:6379
REDIS_WRITE_URL=redis://primary:6379Vercel KV
KV_REST_API_URL=https://your-kv.vercel-storage.com
KV_REST_API_TOKEN=your-tokenIn-Memory (local dev only)
No configuration needed. Falls back automatically when no Redis or KV URL is configured.
Tuning L1 Capacity
CACHE_L1_MAX_SIZE=1000 # Default: 500Increase L1 capacity if your service evaluates more than ~500 unique flag+environment combinations. Monitor the flaggr_cache_operations_total{tier="l1",result="eviction"} metric to see if evictions are frequent.
Performance Characteristics
| Scenario | Latency | Cache Path |
|---|---|---|
| Hot flag (L1 hit) | < 0.1ms | L1 only |
| Warm flag (L2 hit) | 1-5ms | L1 miss → L2 hit → L1 populate |
| Cold flag (cache miss) | 10-50ms | L1 miss → L2 miss → L3 fetch → L1+L2 populate |
| L2 circuit open | < 0.1ms (stale) | L1 stale entry |
| Complete cache miss, L2 down | 10-50ms | Direct L3 fetch |
Evaluation Endpoint Caching
The /api/flags/evaluate endpoint uses server-side deduplication:
- If a request for the same flag arrives within the cache TTL, it returns the cached result
- The
_debug.cacheHitfield in the response indicates whether the result was served from cache _debug.timings.cacheGetshows cache lookup time in milliseconds
Best Practices
- Use Redis in production for shared cache across serverless instances
- Monitor cache hit rates via the
/api/metricsPrometheus endpoint (flaggr_cache_operations_total{operation,result}) - Don't over-cache -- the default TTLs are tuned for flag evaluation workloads
- Prefer SSE or gRPC streaming over aggressive cache TTLs for real-time flag updates
- Check the circuit breaker -- if you see stale evaluations, check Redis connectivity
- Size L1 appropriately -- too small means frequent evictions and L2 pressure; too large wastes memory in serverless environments
Related
- REST API Reference -- Evaluation endpoint caching
- Rate Limiting -- Rate limit caching
- Alerting -- Cache availability alerts