Skip to main content

Choice Architecture in Practice: What dozens of A/B Tests Reveal About How People Actually Choose Complex Products

Nudging failed 5 times. Default bias won. Framing produced opposite results by audience. Here's what dozens of A/B tests reveal about choice architecture in practice.

A
Atticus LiApplied Experimentation Lead at NRG Energy (Fortune 150) · Creator of the PRISM Method
11 min read

Editorial disclosure

This article lives on the canonical GrowthLayer blog path for indexing consistency. Review rules, sourcing rules, and update rules are documented in our editorial policy and methodology.

Fortune 150 experimentation lead100+ experiments / yearCreator of the PRISM Method
A/B TestingExperimentation StrategyStatistical MethodsCRO MethodologyExperimentation at Scale

Richard Thaler and Cass Sunstein's "nudge" framework transformed how product designers and policymakers think about choice. The core insight — that the architecture of a choice environment shapes decisions as powerfully as the options themselves — is one of the most practically useful ideas in behavioral economics.

But the framework is widely misapplied. "Add a recommended badge" is not the same as understanding why defaults work. "Reduce the number of options" is not the same as understanding when choice restriction helps versus when it signals disrespect for the user's autonomy. The dozens of enterprise A/B tests I ran across a multi-brand energy program are a case study in how choice architecture principles actually behave in the wild — with results that are more nuanced, more surprising, and more instructive than the textbook version.

Here is what the data showed.

When Nudging Fails: The Limits of Curation in High-Consideration Decisions

The "recommended plan" concept — presenting users with a curated subset of options accompanied by a "Best for You" or "Most Popular" label — is a classic nudge implementation. It draws on a well-documented behavioral phenomenon: when people face too many options, decision quality degrades and the probability of choosing at all declines. Barry Schwartz documented this as the "Paradox of Choice." The policy prescription, widely adopted in product design, is to curate: reduce the apparent choice set, highlight a recommended option, and let the nudge do its work.

In our enterprise program, this intervention was tested five times across multiple brands. It failed every time.

The aggregate signal was not borderline. The fifth test produced cleaner negative evidence than the first. Users were actively resisting the nudge — not just ignoring it, but showing behavioral patterns consistent with increased hesitation, higher comparison-page revisit rates, and lower completion on the nudged path.

The behavioral explanation draws on a distinction the Thaler-Sunstein framework does not emphasize enough: the difference between low-consideration and high-consideration decisions. Nudging toward a recommended option works well when the decision is low-stakes, when users lack the information or expertise to evaluate options independently, and when the cost of a suboptimal choice is low. Energy plan selection — a multi-year financial commitment with material price differences between options — is the opposite: high stakes, users who believe (often correctly) that they can evaluate the options if given access to them, and a high cost of choosing wrong.

Key Takeaway: In high-consideration decisions, curation is not perceived as helpful simplification. It is perceived as restriction. Users who are capable of and motivated to compare their options do not experience a "Recommended" label as guidance — they experience it as the platform deciding for them. The nudge fails because it conflicts with the user's self-conception as an informed decision-maker.

Daniel Kahneman's System 1 / System 2 framework illuminates the mechanism further. Nudges are designed to work with System 1 — the fast, automatic, heuristic-based decision process that operates below the level of deliberate reasoning. A "Most Popular" badge activates social proof (System 1). A pre-selected default activates status quo bias (System 1). But high-consideration purchases activate System 2 — deliberate, analytic, comparative reasoning. When System 2 is engaged, System 1 nudges are not just ineffective; they are actively suspect. The user's deliberative mind recognizes that a nudge is an attempt to shortcut the analysis they have committed to doing, and that recognition triggers resistance rather than compliance.

The practical implication: before designing a nudge, identify which cognitive system your user is operating in at the moment of the nudge. In high-consideration, high-stakes decisions, design for System 2 — provide better information, clearer comparison tools, and more transparent criteria — rather than for System 1.

Default Bias Done Right: When the Architecture Works With the Decision

The "Pay Later" test produced one of the clearest default bias wins in the dataset.

The original design presented a payment timing choice with both options equally weighted visually — an interface that treated the choice as architecturally neutral. The variant made the "better for the user" payment option visually dominant: larger button, primary color, positioned as the default action, while the less favorable option was visually de-emphasized without being removed.

The result was an 8% shift in users toward the visually dominant option, with no increase in downstream complaints or cancellations — indicating that users who shifted were making a decision they were genuinely comfortable with, not being pushed into something they regretted.

This is the Thaler-Sunstein framework working as intended. The "libertarian paternalism" design principle holds that choice architecture can be designed to promote outcomes that are genuinely better for users while preserving full freedom to choose otherwise. The key is that the nudge aligns with what a well-informed user would choose if they took the time to fully evaluate both options. When that alignment exists, visual defaults work.

What distinguished the "Pay Later" test from the failed "Recommended Plans" tests? Two things.

First, the default pointed to an objectively better option for most users — not a subjective judgment made by the platform, but a measurable financial outcome that users themselves would endorse if they evaluated it. The "Recommended Plan" nudge was pointing users toward a plan that was "popular" — a social proof heuristic — not necessarily the plan that was best for their specific situation.

Second, the default did not restrict information. Users who wanted to understand why one option was visually emphasized could look at both options fully. The architecture made one path easier; it did not make the alternative path invisible.

Key Takeaway: Default bias works when the default is genuinely aligned with user interests and when the architecture makes the better option easier without hiding the alternatives. It fails when the default substitutes the platform's preferences for the user's own evaluation — particularly when users have both the motivation and the capability to evaluate independently.

The Framing Effect: Why "FREE" Produced Opposite Results by Audience

Perhaps the most behaviorally interesting finding in the entire dataset involved "FREE" messaging tested in two different audience contexts.

For new users — acquisition audiences with no prior relationship with the brand — "FREE" or "no upfront cost" messaging produced a strong positive result, with conversion rates increasing by approximately 60% in the test variant. For existing customers — retention audiences with established product relationships — the same messaging category produced a significant negative result, with the primary conversion metric dropping by approximately 35% and customer service inquiries increasing.

Same word. Same product. Opposite results.

Kahneman's prospect theory explains part of this. The subjective value of a gain ("it is free!") is evaluated differently depending on reference point. For a new user whose reference point is the market price of similar products, "FREE" signals an unexpectedly good deal — a gain relative to expectation. For an existing customer whose reference point is their current relationship with the provider, "FREE" signals something anomalous — a departure from the established exchange that triggers the question "what is actually changing?"

But the deeper mechanism is what behavioral economists call "learned priors." Existing customers have direct experience with the product and the billing relationship. Their interpretation of "FREE" is not naive — it is informed by what they know about how the company's pricing actually works. In their experience, nothing from this provider has actually been free. The word triggers a credibility discount, not a positive valuation.

This is a framing effect in the precise technical sense: the same objective information — an offer that costs nothing at the point of transaction — produces opposite behavioral responses depending on the interpretive frame the audience brings to it. Kahneman and Tversky's original framing research showed that choices between outcomes with equivalent expected values could be reversed simply by presenting those outcomes as gains versus losses. Here, the reversal is driven not by the presentation of the offer but by the different frames the two audiences use to interpret the same presentation.

Dan Ariely's work on "predictably irrational" pricing behavior is also relevant. Ariely demonstrated that "free" is categorically distinct from "very cheap" in how it is processed — it eliminates perceived risk at the psychological level. That risk elimination is valuable for an audience with no prior experience to draw on. For an audience whose experience has taught them that this specific provider's "free" offers come with conditions, the risk elimination does not operate — the word "free" no longer eliminates perceived risk because the audience's model of the provider overrides the categorical effect.

Key Takeaway: The framing effect in conversion optimization is not just about how you present information. It is about the interaction between how you present information and what prior beliefs your audience uses to interpret that presentation. Acquisition and retention audiences have fundamentally different prior beliefs about the same brand. Messaging that works with acquisition priors will often work against retention priors. Test every acquisition-facing concept against your retention audience before assuming transferability.

Information Provision vs. Information Overload: The Credit Check Paradox

Two tests in the dataset create an instructive contrast on the role of information provision in choice architecture.

The first involved credit check language on an enrollment form. Users were shown explicit, clear language about what type of credit assessment would occur and why. The result was a significant positive outcome — conversion increases ranging from approximately 5.9% to 7.4% across variants.

The second involved a comprehensive product comparison chart showing all available price tiers, terms, and feature comparisons. The hypothesis was that more information would produce better decisions and higher conversion rates. The result was flat — no improvement across multiple variants.

Why did one information provision test win and the other not?

The credit check language resolved a specific, salient anxiety. Users encountering an enrollment form that asked for personal information had an active, unresolved question: "What happens to my credit? Will this affect my score?" The language answered that question directly at the moment when the question was most salient. It reduced a genuine information asymmetry — the user did not know what kind of check would occur, and that uncertainty was creating hesitation.

The comparison chart added information, but it did not resolve a salient question. Users browsing plan options were not experiencing uncertainty about the list of available plans — they were experiencing uncertainty about which plan was right for them. A chart showing all the plans more comprehensively did not answer "which plan is right for me?" It added more data to a comparison that users were already struggling to process.

Stiglitz and Greenwald's work on information asymmetry in markets distinguishes between information that reduces genuine uncertainty and information that merely increases the volume of available data. More information is only valuable when it resolves the specific uncertainty that is preventing a decision. When the information does not map to the decision-stage uncertainty, it creates what information economics calls "noise" — data that competes with the signal the user needs.

Barry Schwartz's "Paradox of Choice" extends this: in the presence of many options, adding more information about each option does not help users choose more confidently. It increases the subjective cost of comparison. Users who feel they must process all available information before deciding will defer the decision rather than make an uninformed one. The comprehensive chart gave users more to process without giving them the framework to interpret it — a textbook paradox of choice situation.

Key Takeaway: Information provision works when it resolves the specific uncertainty preventing the decision. It fails when it adds data volume without addressing the decision-stage question. Before designing an information provision test, identify the precise uncertainty your audience has at this specific step. The test should answer that question, not just add more information.

Choice Set Restriction: When Less Is Not More

One of the most widely cited applications of behavioral economics in product design is choice restriction: reducing the number of options available to users on the grounds that more options create decision paralysis. The Iyengar and Lepper jam study — where customers bought more jam from a display of six than from a display of twenty-four — is the canonical reference.

In our dataset, a test that restricted the visible address search results to four or five options — reasoning that a shorter list would be easier to process and would drive faster selection — produced a negative result. Users who received fewer address options showed lower completion rates and more abandonment.

The mechanism: address search is not a preference choice. Users are not selecting from a range of options that reflect their values or priorities. They are looking for a specific, objectively correct answer — their home address. A search result that does not contain their address is simply wrong, regardless of how many options are presented. Restricting results to four or five entries increased the probability that users would not see their address in the results, turning a successful search into a failed search.

This is an important limitation of the choice restriction principle. It applies to preference decisions — situations where multiple options could satisfy the user's needs and where the cognitive cost of comparison is a real barrier. It does not apply to lookup tasks — situations where the user has a specific, objective target and needs to find it.

The behavioral economics literature is sometimes applied too broadly: "fewer choices are better" is not a universal principle. It is a principle that applies to preference decisions where multiple options are genuinely substitutable. Misapplied to lookup tasks, navigation decisions, or any context where the user has an objectively correct target, choice restriction simply creates failure.

A Choice Architecture Framework for Complex Product Decisions

Drawing on the enterprise dataset, here is a framework for applying choice architecture principles that accounts for the nuances the textbook version often misses.

Identify the cognitive system your user is operating in. High-consideration purchases engage System 2 thinking. Nudges designed for System 1 will be resisted by System 2 users. Design for the system that is actually active at each decision point.

Distinguish preference decisions from lookup tasks. Choice restriction and curation apply to preference decisions with substitutable options. They do not apply to lookup tasks, navigation decisions, or any context where the user has an objective target.

Audit your default for genuine alignment. A default nudge works when it points to what a well-informed user would genuinely prefer. It fails when it substitutes the platform's commercial preference for the user's interest. Before implementing a default, ask: if the user evaluated this fully, would they choose this option? If the honest answer is "some would, some wouldn't," the default is on shaky ground.

Test framing effects across audience segments, not just across treatments. A framing effect that works for your acquisition audience may actively harm your retention audience. Segment your tests by prior relationship status and analyze results separately.

Identify the specific uncertainty to resolve before designing information provision. Information that resolves the decision-stage uncertainty drives conversion. Information that adds volume without addressing the specific question creates noise.

Conclusion

Thaler and Sunstein's choice architecture framework is genuinely powerful — but it requires precision in application that the popular summary ("nudge people toward better choices") does not convey. The enterprise dataset is a record of what happens when these principles are tested at scale against real user behavior in a high-consideration product category.

The results are clarifying. Nudging fails in high-consideration decisions because it conflicts with System 2 engagement. Default bias works when the default genuinely serves the user's interests. Framing effects are audience-specific, not universal. Information provision works when it resolves specific uncertainty, not when it adds volume. And choice restriction is a principle about preference decisions, not a general rule about fewer options being better.

Applied with precision, these principles produced meaningful wins in our program. Applied as generic heuristics, they produced the failures. The difference is the difference between understanding behavioral economics and applying its vocabulary without understanding its mechanics.

If you are running A/B tests on complex product funnels and want to track which behavioral mechanisms are driving results across your program, GrowthLayer is designed to make those patterns visible over time — so that what your tests are teaching you about choice architecture compounds into a real competitive advantage.

About the author

A
Atticus Li

Applied Experimentation Lead at NRG Energy (Fortune 150) · Creator of the PRISM Method

Atticus Li leads applied experimentation at NRG Energy (Fortune 150), where he and his team run more than 100 controlled experiments per year on customer-facing surfaces. He is the creator of the PRISM Method, a framework for high-velocity experimentation programs at large enterprises. He writes regularly about the statistical and operational details of A/B testing — the parts most CRO content skips.

Keep exploring