What Are Feature Flags? A Complete Guide for Engineering Teams

Feature flags are one of those ideas that sound simple but fundamentally change how software teams build, ship, and operate their products. If you've ever wished you could turn a feature on or off in production without deploying new code, feature flags are the answer.

This guide covers everything engineering teams need to know about feature flags: the core concept, how they work under the hood, the different types, real-world use cases, lifecycle management, implementation approaches, and best practices. Whether you're evaluating feature flags for the first time or looking to deepen your team's practices, this is the reference.

What Is a Feature Flag?

A feature flag (also called a feature toggle, feature switch, or feature gate) is a mechanism that lets you enable or disable functionality in your application at runtime, without deploying new code. At its simplest, a feature flag is a conditional that wraps a piece of functionality:

if (featureFlags.isEnabled("new-checkout-flow")) {
  renderNewCheckout();
} else {
  renderLegacyCheckout();
}

Think of it like a light switch for your code. The wiring (code) is already in place, but you control whether the light (feature) is on or off from a central control panel rather than by rewiring the house.

What makes feature flags powerful is who controls the switch and how. Instead of a simple boolean hard-coded in a config file, modern feature flag systems let you:

Target specific users or segments — enable a feature for beta testers, internal employees, or users in a specific region
Roll out gradually — start at 1% of traffic, monitor metrics, then increase to 5%, 25%, 50%, and finally 100%
Change values instantly — toggle a flag in your dashboard and the change propagates in seconds, no deployment required
Decouple deployment from release — merge code to main behind a flag, deploy it to production disabled, and release it when you're ready

This decoupling is the key insight. Traditional software delivery treats deployment and release as the same event. Feature flags separate them into two independent actions. You can deploy code any time and release features on your own schedule.

How Feature Flags Work

Under the hood, a feature flag system has four core components:

1. Flag Configuration

The flag definition lives in a central store — a database, configuration service, or management platform. Each flag has at minimum a key (unique identifier), a type (boolean, string, number, or JSON), and a default value. Most flags also include targeting rules, rollout percentages, and metadata.

Flag: "release-new-search"
Type: boolean
Default: false
Rules:
  - If user.email ends with "@yourcompany.com" → true
  - If user.country == "NZ" → true (5% rollout)
  - Otherwise → false

2. Evaluation Context

When your application needs to evaluate a flag, it provides context — information about the current user, request, or environment. This context is what targeting rules match against.

const context = {
  targetingKey: user.id,       // Required: unique identifier for consistent assignment
  email: user.email,           // For targeting internal users
  country: user.country,       // For geographic rollouts
  plan: user.subscription,     // For entitlement checks
  environment: "production",   // For environment-specific behavior
};

3. Evaluation Engine

The evaluation engine takes the flag configuration and the evaluation context, runs them through the targeting rules, and returns a value. The evaluation flow looks like this:

Request comes in
  → Build evaluation context (user ID, attributes, environment)
  → Send context to flag evaluation engine
  → Engine checks rules top-to-bottom:
      Rule 1: Does user match targeting condition? → Yes → Return rule's value
      Rule 2: Is user in percentage rollout?       → Check hash(userId + flagKey) % 100
      No rules match                               → Return default value
  → Application receives flag value
  → Code branches based on value

A critical property of well-designed evaluation engines is determinism. Given the same user and flag configuration, the engine always returns the same value. For percentage rollouts, this is achieved by hashing the user's targeting key with the flag key, producing a stable number between 0 and 99. A user assigned to bucket 23 will consistently get the same variant, even across multiple evaluations and different application instances.

4. SDK Integration

Your application uses an SDK (or calls an API directly) to evaluate flags. The SDK handles communication with the flag service, local caching for performance, and real-time updates when flag configurations change.

import { OpenFeature } from "@openfeature/server-sdk";
 
const client = OpenFeature.getClient();
 
const showNewSearch = await client.getBooleanValue(
  "release-new-search",  // Flag key
  false,                 // Default fallback
  { targetingKey: user.id, email: user.email }  // Context
);
 
if (showNewSearch) {
  return <NewSearchExperience />;
}
return <ClassicSearch />;

Types of Feature Flags

Not all flags serve the same purpose. Understanding the different types helps you choose the right approach and set appropriate lifecycle expectations.

Type	Purpose	Typical Lifetime	Example
Release flag	Gate a feature during rollout	Days to weeks	`release-checkout-v2`
Experiment flag	A/B test or multivariate experiment	Weeks to months	`experiment-pricing-layout`
Ops flag	Operational control / kill switch	Long-lived	`ops-disable-recommendations`
Permission flag	Entitlement or access control	Permanent	`permission-enterprise-export`

Release Flags

Release flags are the most common type. They gate new functionality during the rollout period. A release flag starts as false for everyone, gets gradually rolled out to increasing percentages of users, and eventually reaches 100%. At that point, the flag and its conditional code should be removed — the feature is now the default behavior.

Release flags are temporary by design. If a release flag has been at 100% for more than two sprints without being cleaned up, it's becoming tech debt.

Experiment Flags

Experiment flags power A/B tests and multivariate experiments. Unlike release flags (which are boolean), experiment flags often return string values representing variants:

const variant = await client.getStringValue(
  "experiment-onboarding-flow",
  "control",
  context
);
 
switch (variant) {
  case "control":
    return <CurrentOnboarding />;
  case "streamlined":
    return <StreamlinedOnboarding />;
  case "guided":
    return <GuidedOnboarding />;
}

Experiment flags require careful statistical analysis before drawing conclusions. They live longer than release flags because experiments need sufficient sample sizes, but they should still be cleaned up once the winning variant is chosen and fully deployed.

Ops Flags (Kill Switches)

Ops flags give your on-call engineers the ability to shed load or disable non-critical features during incidents. They are intentionally long-lived and serve as circuit breakers.

Common ops flags include:

ops-disable-recommendations — turn off the recommendation engine if it's causing high latency
ops-maintenance-mode — show a maintenance page during planned outages
ops-fallback-search — switch to a simpler search backend if the primary is degraded

Ops flags should default to the "safe" state (feature enabled, not in maintenance mode) and only be toggled during incidents.

Permission Flags

Permission flags control access to features based on user entitlements — subscription tier, organization plan, or account-level settings. They are essentially feature entitlements implemented through the flag system.

const canExport = await client.getBooleanValue(
  "permission-csv-export",
  false,
  { targetingKey: user.id, plan: user.subscription }
);

Permission flags are typically permanent. They don't follow the create-rollout-cleanup lifecycle of release flags because they represent ongoing business rules.

Real-World Use Cases

Progressive Rollouts

Instead of releasing a new feature to all users simultaneously, you roll it out in stages:

Internal dogfooding (0% public) — enable for employees via email domain targeting
Canary (1%) — expose to a small percentage of real users, monitor error rates
Early adopters (10%) — expand if metrics are healthy
Broad rollout (50%) — half of traffic sees the new experience
General availability (100%) — full rollout, schedule flag cleanup

At each stage, if error rates spike or latency degrades, you instantly roll back by setting the percentage to 0%. No code revert, no redeployment. For a deep dive, see our progressive rollouts guide.

Trunk-Based Development

Feature flags are what make trunk-based development practical. Instead of maintaining long-lived feature branches that diverge from main and create painful merges, developers commit directly to the main branch with new features hidden behind flags:

Day 1: Merge new search UI code behind `release-new-search` (flag: false)
Day 3: Merge search API integration behind same flag
Day 5: Enable flag for internal users → QA in production
Day 8: Roll out to 5% of users
Day 12: 100% rollout → remove flag

The code is in production from day 1. It's just not visible to users until the flag is enabled. This eliminates merge conflicts, enables continuous integration, and lets you ship incomplete features safely.

Canary Deployments

Canary deployments and feature flags complement each other. While canary deployments route a percentage of traffic to new infrastructure (new container version, new server), feature flags route a percentage of users to new behavior within the same infrastructure.

You can combine both: deploy a new service version to 5% of instances (canary) with the new feature disabled (flag off). Once the canary is healthy, enable the flag for 1% of users on the canary instances. This gives you two layers of blast radius control.

A/B Testing

Feature flags provide the randomization infrastructure for A/B tests. The flag evaluation engine assigns users to variants deterministically (same user always sees the same variant), and you measure business metrics for each variant:

const pricingVariant = await client.getStringValue(
  "experiment-pricing-display",
  "control",
  { targetingKey: user.id }
);
 
// Track which variant the user sees
analytics.track("pricing_page_viewed", { variant: pricingVariant });

Emergency Kill Switches

When an incident occurs and a specific feature is contributing to the problem, kill switches let you disable it in seconds:

3:02 AM — Alert: API latency > 2s for /api/recommendations
3:03 AM — On-call engineer opens flag dashboard
3:04 AM — Disables `ops-enable-recommendations`
3:04 AM — Latency drops to normal, recommendations endpoint returns empty results
3:05 AM — Incident contained, team investigates root cause during business hours

The alternative — reverting a deployment at 3 AM — is slower, riskier, and requires waking up additional engineers who understand the deployment pipeline.

Beta Programs and Early Access

Feature flags make it straightforward to run beta programs. Target users who have opted into beta via a user attribute:

const rules = [
  {
    conditions: [{ attribute: "beta_opted_in", operator: "equals", value: true }],
    value: true,
  },
];

This is cleaner than maintaining a separate beta environment. Beta users get the same production infrastructure with additional features enabled.

Feature Entitlements

SaaS products often gate features by subscription tier. Feature flags provide a centralized way to manage these entitlements:

const canUseAdvancedAnalytics = await client.getBooleanValue(
  "permission-advanced-analytics",
  false,
  { targetingKey: org.id, plan: org.subscriptionTier }
);

When a customer upgrades their plan, you update the targeting rules. No code change, no deployment.

Feature Flag Lifecycle

Every feature flag should follow a defined lifecycle. Flags that linger without a clear state become a source of confusion and technical debt.

1. Planning

Before creating a flag, define:

Purpose — What is this flag for? Release, experiment, ops, or permission?
Owner — Who is responsible for this flag's lifecycle?
Expected lifetime — When should this flag be reviewed for cleanup?
Success criteria — What metrics determine whether to roll forward or roll back?
Rollback plan — What happens if you need to disable this flag?

2. Creation

Create the flag with a clear, descriptive name and appropriate metadata. Use a naming convention that signals intent (see best practices for naming patterns).

3. Testing

Test both code paths — the flag-on behavior and the flag-off behavior. This includes unit tests that explicitly set flag values, integration tests that validate end-to-end flows for each variant, and ideally, automated tests that cycle through all flag combinations relevant to a feature.

4. Rollout

Follow your rollout plan. For release flags, this typically means progressive rollout. For experiment flags, ensure you have sufficient traffic to reach statistical significance. Monitor your key metrics at each stage.

5. Cleanup

This is the stage that teams most often skip, and it's the most important one for long-term codebase health.

Once a release flag reaches 100% and has been stable for a defined period:

Remove the flag evaluation from your code
Remove the unused code path (the old behavior)
Delete or archive the flag from your management platform
Remove any associated tests that only tested the old path

The cost of skipping cleanup is real. Every unretired flag:

Adds a conditional branch that developers must understand when reading the code
Increases the combinatorial space of possible states your application can be in
Makes refactoring harder because you can't be sure if the "off" path is still needed
Confuses new team members who don't know if the flag is still relevant

Teams that are disciplined about flag cleanup treat it like they treat deleting dead code — as basic hygiene, not optional extra work.

Implementing Feature Flags

There are two broad approaches to implementing feature flags: building your own system or using a platform.

Approach 1: Build Your Own

For simple use cases, a basic feature flag implementation is straightforward:

// Simple in-memory flag store
const flags: Record<string, boolean> = {
  "release-new-search": false,
  "ops-maintenance-mode": false,
};
 
function isEnabled(flagKey: string): boolean {
  return flags[flagKey] ?? false;
}

This works for early-stage projects but quickly runs into limitations:

No targeting (every user sees the same value)
No gradual rollouts
Requires a deployment to change flag values
No audit trail of who changed what and when
No SDK for frontend and backend consistency

Approach 2: Use a Feature Flag Platform

A feature flag platform provides the targeting engine, management UI, SDKs, audit logging, and real-time propagation that production systems need. The key architectural decision is which evaluation protocol and SDK standard to adopt.

The OpenFeature standard provides a vendor-neutral SDK that decouples your application code from any specific feature flag provider. This means you can switch providers or run multiple providers without changing your application code.

Here's a complete example using the OpenFeature SDK pattern:

import { OpenFeature } from "@openfeature/server-sdk";
import { FlaggrProvider } from "@flaggr/openfeature-provider";
 
// Initialize the provider once at startup
OpenFeature.setProvider(new FlaggrProvider({
  endpoint: "https://your-instance.flaggr.dev",
  serviceId: "your-service-id",
}));
 
const client = OpenFeature.getClient();
 
// Evaluate a boolean flag with targeting context
async function shouldShowNewDashboard(user: User): Promise<boolean> {
  return client.getBooleanValue(
    "release-new-dashboard",
    false,  // Default value if evaluation fails
    {
      targetingKey: user.id,
      email: user.email,
      plan: user.subscription,
      country: user.country,
    }
  );
}
 
// Evaluate a string flag for A/B testing
async function getCheckoutVariant(user: User): Promise<string> {
  return client.getStringValue(
    "experiment-checkout-flow",
    "control",
    { targetingKey: user.id }
  );
}
 
// Evaluate a number flag for configuration
async function getRateLimitPerMinute(service: Service): Promise<number> {
  return client.getNumberValue(
    "ops-rate-limit-rpm",
    100,  // Default: 100 requests per minute
    { targetingKey: service.id, tier: service.tier }
  );
}

The evaluation context object is where targeting power comes from. By providing user attributes, organization details, and environment information, the evaluation engine can apply sophisticated targeting rules without your application code knowing the details. Your code just asks "what value should this flag have for this context?" and the engine handles the rest.

Targeting Rules

Targeting rules determine which value a user receives. They support a range of operators for flexible targeting:

Operator	Description	Example
`equals`	Exact match	`country equals "US"`
`not_equals`	Negation	`plan not_equals "free"`
`in`	Set membership	`email in ["alice@co.com", "bob@co.com"]`
`contains`	Substring match	`email contains "@yourcompany.com"`
`starts_with`	Prefix match	`user_id starts_with "beta_"`
`greater_than`	Numeric comparison	`account_age greater_than 30`
`regex`	Pattern match	`email regex ".*@(alpha\|beta)\.yourcompany\.com"`

Rules are evaluated top-to-bottom. The first matching rule determines the value. If no rules match, the default value is returned.

Feature Flag Best Practices

A few core practices separate teams that use feature flags effectively from teams that drown in flag-related complexity.

Naming Conventions

Use a consistent naming scheme that encodes the flag's type and purpose:

release-new-checkout — a release flag for the new checkout
experiment-hero-banner-cta — an experiment on the hero banner's call to action
ops-disable-email-notifications — an operational kill switch for email notifications
permission-team-analytics — an entitlement flag for the team analytics feature

Prefixes make flags scannable in a dashboard and communicate intent without reading documentation.

Flag Hygiene

Set review dates when creating flags. A release flag created today should have a review date 2-4 weeks out.
Track flag age in your dashboard. Flags older than their expected lifetime should be surfaced for cleanup.
Limit active flags per service. If a single service has 50 active release flags, the combinatorial complexity is unmanageable. Aim for fewer than 10-15 active release flags per service.
Use health scoring to identify stale flags. A flag that has been returning the same value for 100% of evaluations for two weeks is a candidate for cleanup.

Testing Strategies

Unit tests: Explicitly set flag values in your test setup to test each code path independently
Integration tests: Test the full evaluation flow — context in, value out, behavior observed
Combinatorial testing: For flags that interact, test the key combinations (you don't need to test all 2^N combinations, just the realistic ones)
Default value testing: Verify your application behaves correctly when the flag service is unavailable and defaults are used

Documentation

Document each flag's purpose, owner, and expected behavior in your flag management platform. When a new team member encounters release-catalog-v3 in the codebase, they should be able to find its documentation quickly without asking Slack.

For a comprehensive deep dive on these practices, see our Feature Flag Best Practices for 2026 guide.

OpenFeature: The Standard for Feature Flags

OpenFeature is a CNCF project that defines a vendor-neutral, community-driven API for feature flag evaluation. It's the equivalent of OpenTelemetry for observability — a standard interface that lets you swap implementations without changing application code.

The core idea: your application code depends on the OpenFeature SDK, not on a specific vendor's SDK. The vendor-specific logic lives in a provider that plugs into the OpenFeature API.

Your Application Code
    ↓ uses
OpenFeature SDK (vendor-neutral interface)
    ↓ delegates to
Provider (vendor-specific implementation)
    ↓ communicates with
Flag Management Backend

This matters because it eliminates vendor lock-in. If you start with one feature flag platform and later need to migrate, you change the provider initialization — one line — not every flag evaluation call across your codebase.

OpenFeature supports multiple languages (JavaScript/TypeScript, Go, Java, Python, .NET, PHP, and more) and defines standard evaluation types: boolean, string, number, and object. It also supports hooks for cross-cutting concerns like logging, metrics, and validation.

For a detailed walkthrough of OpenFeature concepts and usage, see our OpenFeature guide.

Choosing a Feature Flag Platform

When evaluating feature flag platforms, consider these dimensions:

Open Source vs. Proprietary

Open-source platforms give you full control over your data, the ability to self-host, and transparency into the evaluation logic. Proprietary platforms typically offer managed infrastructure, support SLAs, and more polished UIs. Some platforms, like Flaggr, are open source with an optional managed offering — giving you both options.

SDK and Language Support

Your platform needs SDKs for every language in your stack. Check for first-class support (not community-maintained wrappers) in your primary languages. OpenFeature-compatible platforms have an advantage here because the standard SDKs cover most languages, and you can write a thin provider for any custom backend.

Protocol Support

Modern platforms should support multiple evaluation protocols:

REST API — Universal, works everywhere
gRPC / gRPC-Web — High performance, streaming support for real-time updates
OpenFeature Remote Evaluation Protocol (OFREP) — Emerging standard for remote evaluation
Server-Sent Events (SSE) — Real-time flag updates without polling

Targeting Capabilities

The depth of targeting rules determines how sophisticated your rollout strategies can be. At minimum, you need:

User-level targeting (specific users or user attributes)
Percentage-based rollouts with deterministic bucketing
Environment-specific configurations
Segment-based targeting (cohorts, groups)

Advanced capabilities include mutual exclusion groups (ensuring users in one experiment aren't in another), scheduled rollouts, and multi-variate experiments.

Observability

Feature flags without observability are flags in the dark. Look for:

Evaluation metrics — how often each flag is evaluated, what values are returned
Health scoring — automated detection of stale, misconfigured, or underperforming flags
Impact analysis — before toggling a flag, see how many users will be affected
Audit logging — who changed what flag, when, and why
Alerting — notifications when evaluation patterns change unexpectedly

Integration Ecosystem

Consider how the platform integrates with your existing tools:

CI/CD pipelines — flag changes as part of deployment workflows
Infrastructure as code — Terraform providers for managing flags declaratively
Observability platforms — correlating flag changes with metrics dashboards
AI/LLM tools — MCP servers or API access for AI-native workflows

Getting Started

Getting feature flags into your workflow is a three-step process:

Step 1: Define Your First Flag

Start with a low-risk release flag. Pick a feature your team is about to ship and gate it behind a boolean flag. Don't start with a complex experiment or a critical kill switch — start simple.

Step 2: Instrument Your Code

Add the OpenFeature SDK to your application, configure a provider, and wrap your feature in a flag evaluation. The code change is minimal — an if/else around the new behavior.

Step 3: Roll Out and Observe

Enable the flag for your team first (internal dogfooding), then roll out to a small percentage of users. Watch your metrics. When you're confident, go to 100% and schedule the flag for cleanup.

For a hands-on walkthrough with code examples, see the Quick Start guide. For a deeper understanding of the data model and architecture, start with Core Concepts.

Feature flags are not a new idea, but the tooling around them has matured dramatically. Modern platforms provide the evaluation engines, targeting rules, real-time propagation, and observability that make feature flags practical at scale. The teams that adopt them don't just ship faster — they ship with more confidence, because every release comes with an instant rollback switch.

The best time to start using feature flags was when you first deployed to production. The second-best time is now.