Skip to main content

Three-tier caching architecture with L1 in-process, L2 Redis/KV, and L3 storage — TTLs, invalidation, and circuit breakers

Caching Strategy

Flaggr uses a three-tier caching architecture designed for sub-millisecond flag evaluations with strong consistency guarantees.

Cache Hierarchy

Request → L1 (In-Process LRU) → L2 (Redis / Vercel KV) → L3 (Firestore / File Storage)
              ~0.01ms               ~1-5ms                    ~10-50ms

The hierarchy operates as a write-through, read-aside cache. On a read miss at any tier, the value is fetched from the next tier down and back-populated into all higher tiers. This means the first request for a cold flag pays the full L3 latency, but every subsequent request within the TTL window is served from L1 or L2.

┌─────────────────────────────────────────────────────────┐
│  Instance A                Instance B                   │
│  ┌─────────┐              ┌─────────┐                   │
│  │ L1 LRU  │              │ L1 LRU  │  Per-instance     │
│  │ 500 keys│              │ 500 keys│  memory cache      │
│  └────┬────┘              └────┬────┘                   │
│       │                        │                        │
│       └──────────┬─────────────┘                        │
│                  │                                      │
│           ┌──────┴──────┐                               │
│           │  L2 Redis   │  Shared across all instances  │
│           │  (circuit   │  Protected by circuit breaker │
│           │   breaker)  │                               │
│           └──────┬──────┘                               │
│                  │                                      │
│           ┌──────┴──────┐                               │
│           │ L3 Firestore│  Source of truth               │
│           │ / File Store│  Always consistent             │
│           └─────────────┘                               │
└─────────────────────────────────────────────────────────┘

L1 -- In-Process LRU

A bounded, per-instance LRU (Least Recently Used) cache. Provides sub-millisecond reads for hot flags.

  • Latency: < 0.01ms
  • Capacity: 500 entries per instance (configurable via CACHE_L1_MAX_SIZE)
  • Scope: Per serverless instance / process
  • Eviction: LRU when capacity is reached
  • Consistency: Eventual (stale for up to TTL)

The L1 cache is an in-memory Map with LRU eviction. Because each serverless instance has its own L1, two instances may briefly return different values for the same flag during TTL windows. This is by design -- the consistency tradeoff buys sub-millisecond reads for the vast majority of evaluations.

L2 -- Redis / Vercel KV

Shared remote cache accessible across all instances. Protected by a circuit breaker.

  • Latency: 1-5ms
  • Scope: Global (shared across all instances)
  • Backend priority:
    1. Direct Redis (REDIS_URL or REDIS_READ_URL / REDIS_WRITE_URL)
    2. Vercel KV (KV_REST_API_URL)
    3. In-memory Map (fallback for local dev)

When using Redis with read replicas, Flaggr routes reads to the replica and writes to the primary. This reduces load on the primary and improves read latency when the replica is geographically closer.

L3 -- Storage

The source of truth. Firestore in production, file-based storage for local development.

  • Latency: 10-50ms
  • Always consistent

Cache Planes

Flaggr separates caches into two planes with different consistency requirements:

PlanePrefixUse CaseTTL Range
Readread:Flags, configs, projects2-5 min
Writewrite:Sessions, CSRF tokens, rate limits1-30 min

The read plane tolerates brief staleness (a flag update might take up to 2 minutes to propagate everywhere). The write plane handles stateful resources where TTLs are dictated by security requirements (session duration, CSRF validity).

Default TTLs

ResourceL1 TTLL2 TTLPlaneRationale
Projects300s (5 min)300s (5 min)ReadRarely change
Services180s (3 min)180s (3 min)ReadModerate change frequency
Flags60s (1 min)120s (2 min)ReadBalance freshness vs. load
Flag rules / targeting60s (1 min)120s (2 min)ReadSame as flags
User data300s (5 min)300s (5 min)ReadProfile data, rarely changes
Sessions1800s (30 min)1800s (30 min)WriteSecurity: bounded lifetime
CSRF tokens300s (5 min)300s (5 min)WriteSingle-use, short-lived
Rate limits60s (1 min)60s (1 min)WriteWindow-based, must expire
L1 vs. L2 TTL

L1 TTL is always less than or equal to L2 TTL. This ensures that when L1 expires and falls through to L2, L2 still has a valid entry. The L1 TTL for flags (60s) is intentionally shorter than L2 (120s) to reduce staleness on hot paths without increasing L2 read load.

Cache Operations

Read Path

cacheGet(key, plane?)
├─ Check L1 (LRU)
│  ├─ Hit → return value
│  └─ Miss → Check L2
│     ├─ L2 Hit → populate L1, return value
│     └─ L2 Miss → return null (caller fetches from L3)

Write Path

cacheSet(key, value, ttl, plane?)
├─ Write to L1
└─ Write to L2 (async, fire-and-forget)

Invalidation

cacheDel(key, plane?)
├─ Delete from L1
└─ Delete from L2

cacheDelPattern(pattern, plane?)
├─ Pattern-match and delete from L1
└─ Pattern-match and delete from L2

Cache Key Patterns

project:{projectId}
flags:service:{serviceId}
flag:{serviceId}:{flagKey}:{environment}
user:{userId}
session:{sessionId}
csrf:{token}
rate_limit:{identifier}

Circuit Breaker

The L2 cache is protected by a circuit breaker to prevent cascading failures when Redis/KV is unavailable.

States

CLOSED → (failures exceed threshold) → OPEN → (timeout expires) → HALF-OPEN
   ↑                                                                    │
   └────────── (success in half-open) ──────────────────────────────────┘
  • Closed: Normal operation. All L2 calls proceed.
  • Open: L2 calls are skipped entirely. Falls back to stale L1 entries.
  • Half-open: A single test request is sent to L2. If it succeeds, the circuit closes. If it fails, the circuit reopens.

Circuit Breaker Configuration

ParameterDefaultDescription
Failure threshold5Consecutive failures before opening
Open timeout30sTime before transitioning to half-open
Half-open max attempts1Test requests before closing
Monitored exceptionsTimeout, ConnectionRefusedErrors that count as failures

When the circuit breaker is open, evaluations continue using L1 cache entries (potentially stale) rather than failing. This ensures flag evaluations remain fast even during cache infrastructure outages. The flaggr_cache_circuit_breaker_state metric tracks the current state.

Invalidation Strategies

On Flag Mutation (Write-Through)

When a flag is created, updated, deleted, or toggled:

  1. Immediate (local): cacheDel(CacheKeys.serviceFlags(serviceId)) on the handling instance
  2. Broadcast (cross-instance): Pub/Sub message triggers invalidation on all other instances
  3. TTL (safety net): Natural expiration ensures staleness is bounded even if broadcast fails
Flag toggle (instance A)
  ├─ Delete from L1 (instance A)        [0ms]
  ├─ Delete from L2 (Redis)             [1-3ms]
  └─ Publish invalidation event         [2-5ms]
       ├─ Instance B receives event
       │   └─ Delete from L1 (instance B) [0ms]
       └─ Instance C receives event
           └─ Delete from L1 (instance C) [0ms]

On-Demand

// Clear all flags for a service
await cacheDelPattern('flags:service:web-app*')
 
// Clear everything for a plane
await cacheClear('read')

Stampede Prevention

When many instances simultaneously experience an L1 miss for the same key, they could all hit L2 or L3 at once (a cache stampede). Flaggr prevents this with:

  • L2 read coalescing: Concurrent requests for the same key within a 10ms window are coalesced into a single L2 read
  • Jittered TTLs: L1 TTLs are jittered by +/- 10% to prevent synchronized expiration across instances

Health Check

Monitor cache health per plane:

const health = await cacheHealthCheck('read')
// { working: true, testTime: 2 }

Configuration

REDIS_URL=redis://your-redis:6379

For read replicas:

REDIS_READ_URL=redis://read-replica:6379
REDIS_WRITE_URL=redis://primary:6379

Vercel KV

KV_REST_API_URL=https://your-kv.vercel-storage.com
KV_REST_API_TOKEN=your-token

In-Memory (local dev only)

No configuration needed. Falls back automatically when no Redis or KV URL is configured.

Tuning L1 Capacity

CACHE_L1_MAX_SIZE=1000  # Default: 500

Increase L1 capacity if your service evaluates more than ~500 unique flag+environment combinations. Monitor the flaggr_cache_operations_total{tier="l1",result="eviction"} metric to see if evictions are frequent.

Performance Characteristics

ScenarioLatencyCache Path
Hot flag (L1 hit)< 0.1msL1 only
Warm flag (L2 hit)1-5msL1 miss → L2 hit → L1 populate
Cold flag (cache miss)10-50msL1 miss → L2 miss → L3 fetch → L1+L2 populate
L2 circuit open< 0.1ms (stale)L1 stale entry
Complete cache miss, L2 down10-50msDirect L3 fetch

Evaluation Endpoint Caching

The /api/flags/evaluate endpoint uses server-side deduplication:

  • If a request for the same flag arrives within the cache TTL, it returns the cached result
  • The _debug.cacheHit field in the response indicates whether the result was served from cache
  • _debug.timings.cacheGet shows cache lookup time in milliseconds

Best Practices

  • Use Redis in production for shared cache across serverless instances
  • Monitor cache hit rates via the /api/metrics Prometheus endpoint (flaggr_cache_operations_total{operation,result})
  • Don't over-cache -- the default TTLs are tuned for flag evaluation workloads
  • Prefer SSE or gRPC streaming over aggressive cache TTLs for real-time flag updates
  • Check the circuit breaker -- if you see stale evaluations, check Redis connectivity
  • Size L1 appropriately -- too small means frequent evictions and L2 pressure; too large wastes memory in serverless environments