Three-tier caching architecture with L1 in-process, L2 Redis/KV, and L3 storage — TTLs, invalidation, and circuit breakers

Caching Strategy

Flaggr uses a three-tier caching architecture designed for sub-millisecond flag evaluations with strong consistency guarantees.

Cache Hierarchy

Request → L1 (In-Process LRU) → L2 (Redis / Vercel KV) → L3 (Firestore / File Storage)
              ~0.01ms               ~1-5ms                    ~10-50ms

The hierarchy operates as a write-through, read-aside cache. On a read miss at any tier, the value is fetched from the next tier down and back-populated into all higher tiers. This means the first request for a cold flag pays the full L3 latency, but every subsequent request within the TTL window is served from L1 or L2.

┌─────────────────────────────────────────────────────────┐
│  Instance A                Instance B                   │
│  ┌─────────┐              ┌─────────┐                   │
│  │ L1 LRU  │              │ L1 LRU  │  Per-instance     │
│  │ 500 keys│              │ 500 keys│  memory cache      │
│  └────┬────┘              └────┬────┘                   │
│       │                        │                        │
│       └──────────┬─────────────┘                        │
│                  │                                      │
│           ┌──────┴──────┐                               │
│           │  L2 Redis   │  Shared across all instances  │
│           │  (circuit   │  Protected by circuit breaker │
│           │   breaker)  │                               │
│           └──────┬──────┘                               │
│                  │                                      │
│           ┌──────┴──────┐                               │
│           │ L3 Firestore│  Source of truth               │
│           │ / File Store│  Always consistent             │
│           └─────────────┘                               │
└─────────────────────────────────────────────────────────┘

L1 -- In-Process LRU

A bounded, per-instance LRU (Least Recently Used) cache. Provides sub-millisecond reads for hot flags.

Latency: < 0.01ms
Capacity: 500 entries per instance (configurable via CACHE_L1_MAX_SIZE)
Scope: Per serverless instance / process
Eviction: LRU when capacity is reached
Consistency: Eventual (stale for up to TTL)

The L1 cache is an in-memory Map with LRU eviction. Because each serverless instance has its own L1, two instances may briefly return different values for the same flag during TTL windows. This is by design -- the consistency tradeoff buys sub-millisecond reads for the vast majority of evaluations.

L2 -- Redis / Vercel KV

Shared remote cache accessible across all instances. Protected by a circuit breaker.

Latency: 1-5ms
Scope: Global (shared across all instances)
Backend priority:
1. Direct Redis (REDIS_URL or REDIS_READ_URL / REDIS_WRITE_URL)
2. Vercel KV (KV_REST_API_URL)
3. In-memory Map (fallback for local dev)

When using Redis with read replicas, Flaggr routes reads to the replica and writes to the primary. This reduces load on the primary and improves read latency when the replica is geographically closer.

L3 -- Storage

The source of truth. Firestore in production, file-based storage for local development.

Latency: 10-50ms
Always consistent

Cache Planes

Flaggr separates caches into two planes with different consistency requirements:

Plane	Prefix	Use Case	TTL Range
Read	`read:`	Flags, configs, projects	2-5 min
Write	`write:`	Sessions, CSRF tokens, rate limits	1-30 min

The read plane tolerates brief staleness (a flag update might take up to 2 minutes to propagate everywhere). The write plane handles stateful resources where TTLs are dictated by security requirements (session duration, CSRF validity).

Default TTLs

Resource	L1 TTL	L2 TTL	Plane	Rationale
Projects	300s (5 min)	300s (5 min)	Read	Rarely change
Services	180s (3 min)	180s (3 min)	Read	Moderate change frequency
Flags	60s (1 min)	120s (2 min)	Read	Balance freshness vs. load
Flag rules / targeting	60s (1 min)	120s (2 min)	Read	Same as flags
User data	300s (5 min)	300s (5 min)	Read	Profile data, rarely changes
Sessions	1800s (30 min)	1800s (30 min)	Write	Security: bounded lifetime
CSRF tokens	300s (5 min)	300s (5 min)	Write	Single-use, short-lived
Rate limits	60s (1 min)	60s (1 min)	Write	Window-based, must expire

L1 vs. L2 TTL

L1 TTL is always less than or equal to L2 TTL. This ensures that when L1 expires and falls through to L2, L2 still has a valid entry. The L1 TTL for flags (60s) is intentionally shorter than L2 (120s) to reduce staleness on hot paths without increasing L2 read load.

Cache Operations

Read Path

cacheGet(key, plane?)
├─ Check L1 (LRU)
│  ├─ Hit → return value
│  └─ Miss → Check L2
│     ├─ L2 Hit → populate L1, return value
│     └─ L2 Miss → return null (caller fetches from L3)

Write Path

cacheSet(key, value, ttl, plane?)
├─ Write to L1
└─ Write to L2 (async, fire-and-forget)

Invalidation

cacheDel(key, plane?)
├─ Delete from L1
└─ Delete from L2

cacheDelPattern(pattern, plane?)
├─ Pattern-match and delete from L1
└─ Pattern-match and delete from L2

Cache Key Patterns

project:{projectId}
flags:service:{serviceId}
flag:{serviceId}:{flagKey}:{environment}
user:{userId}
session:{sessionId}
csrf:{token}
rate_limit:{identifier}

Circuit Breaker

The L2 cache is protected by a circuit breaker to prevent cascading failures when Redis/KV is unavailable.

States

CLOSED → (failures exceed threshold) → OPEN → (timeout expires) → HALF-OPEN
   ↑                                                                    │
   └────────── (success in half-open) ──────────────────────────────────┘

Closed: Normal operation. All L2 calls proceed.
Open: L2 calls are skipped entirely. Falls back to stale L1 entries.
Half-open: A single test request is sent to L2. If it succeeds, the circuit closes. If it fails, the circuit reopens.

Circuit Breaker Configuration

Parameter	Default	Description
Failure threshold	5	Consecutive failures before opening
Open timeout	30s	Time before transitioning to half-open
Half-open max attempts	1	Test requests before closing
Monitored exceptions	Timeout, ConnectionRefused	Errors that count as failures

When the circuit breaker is open, evaluations continue using L1 cache entries (potentially stale) rather than failing. This ensures flag evaluations remain fast even during cache infrastructure outages. The flaggr_cache_circuit_breaker_state metric tracks the current state.

Invalidation Strategies

On Flag Mutation (Write-Through)

When a flag is created, updated, deleted, or toggled:

Immediate (local): cacheDel(CacheKeys.serviceFlags(serviceId)) on the handling instance
Broadcast (cross-instance): Pub/Sub message triggers invalidation on all other instances
TTL (safety net): Natural expiration ensures staleness is bounded even if broadcast fails

Flag toggle (instance A)
  ├─ Delete from L1 (instance A)        [0ms]
  ├─ Delete from L2 (Redis)             [1-3ms]
  └─ Publish invalidation event         [2-5ms]
       ├─ Instance B receives event
       │   └─ Delete from L1 (instance B) [0ms]
       └─ Instance C receives event
           └─ Delete from L1 (instance C) [0ms]

On-Demand

// Clear all flags for a service
await cacheDelPattern('flags:service:web-app*')
 
// Clear everything for a plane
await cacheClear('read')

Stampede Prevention

When many instances simultaneously experience an L1 miss for the same key, they could all hit L2 or L3 at once (a cache stampede). Flaggr prevents this with:

L2 read coalescing: Concurrent requests for the same key within a 10ms window are coalesced into a single L2 read
Jittered TTLs: L1 TTLs are jittered by +/- 10% to prevent synchronized expiration across instances

Health Check

Monitor cache health per plane:

const health = await cacheHealthCheck('read')
// { working: true, testTime: 2 }

Configuration

Redis (recommended for production)

REDIS_URL=redis://your-redis:6379

For read replicas:

REDIS_READ_URL=redis://read-replica:6379
REDIS_WRITE_URL=redis://primary:6379

Vercel KV

KV_REST_API_URL=https://your-kv.vercel-storage.com
KV_REST_API_TOKEN=your-token

In-Memory (local dev only)

No configuration needed. Falls back automatically when no Redis or KV URL is configured.

Tuning L1 Capacity

CACHE_L1_MAX_SIZE=1000  # Default: 500

Increase L1 capacity if your service evaluates more than ~500 unique flag+environment combinations. Monitor the flaggr_cache_operations_total{tier="l1",result="eviction"} metric to see if evictions are frequent.

Performance Characteristics

Scenario	Latency	Cache Path
Hot flag (L1 hit)	< 0.1ms	L1 only
Warm flag (L2 hit)	1-5ms	L1 miss → L2 hit → L1 populate
Cold flag (cache miss)	10-50ms	L1 miss → L2 miss → L3 fetch → L1+L2 populate
L2 circuit open	< 0.1ms (stale)	L1 stale entry
Complete cache miss, L2 down	10-50ms	Direct L3 fetch

Evaluation Endpoint Caching

The /api/flags/evaluate endpoint uses server-side deduplication:

If a request for the same flag arrives within the cache TTL, it returns the cached result
The _debug.cacheHit field in the response indicates whether the result was served from cache
_debug.timings.cacheGet shows cache lookup time in milliseconds

Best Practices

Use Redis in production for shared cache across serverless instances
Monitor cache hit rates via the /api/metrics Prometheus endpoint (flaggr_cache_operations_total{operation,result})
Don't over-cache -- the default TTLs are tuned for flag evaluation workloads
Prefer SSE or gRPC streaming over aggressive cache TTLs for real-time flag updates
Check the circuit breaker -- if you see stale evaluations, check Redis connectivity
Size L1 appropriately -- too small means frequent evictions and L2 pressure; too large wastes memory in serverless environments