Skip to main content

How to Build a Testing Roadmap That Actually Produces Revenue Impact

The highest-ICE test in our program produced zero provable ROI. The lowest-ICE test tripled the primary metric. Here's the quarterly roadmap framework that explains why.

A
Atticus Li
9 min read

Editorial disclosure

This article lives on the canonical GrowthLayer blog path for indexing consistency. Review rules, sourcing rules, and update rules are documented in our editorial policy and methodology.

_By Atticus Li — CRO Strategist & Founder of GrowthLayer_

The test I was most excited about produced nothing.

We had scored it highly on every dimension we tracked. The page had enormous traffic. The conversion gap between our rate and the industry benchmark was significant. The business value of closing that gap was obvious to everyone in the room. We shipped the test, ran it to significance, and declared a winner. Then we tried to find the revenue impact in the quarterly numbers. We could not.

Six weeks later, a test that almost did not make the roadmap — one that a senior stakeholder had nearly killed because it seemed "too small" — tripled the completion rate on a secondary step that turned out to be the actual bottleneck in the funnel. That secondary step had been invisible in our prioritization model because we had been measuring the wrong output.

That experience reshaped how I think about testing roadmaps. The question is not "which tests should we run?" The question is "which tests will produce results we can trace back to revenue?" Those are different questions, and most prioritization frameworks confuse them.

Why ICE Scores Fail in Practice

ICE scoring — Impact, Confidence, Ease — became the default prioritization framework for most CRO programs because it is simple to apply and simple to explain to stakeholders. You score each dimension on a scale, multiply the numbers, rank the list, and work from the top.

The problem is not the framework's simplicity. The problem is what it measures.

"Impact" in ICE scoring is almost always estimated based on traffic volume and assumed conversion improvement. High-traffic pages with large conversion gaps score highly. But conversion gap analysis only tells you where users are failing to complete an action. It does not tell you whether completing that action drives downstream revenue. A page with a 30% abandonment rate sounds like an obvious target — unless the users who abandon at that step are disproportionately tire-kickers who would not have converted further down the funnel regardless.

I have watched teams spend two quarters testing the highest-traffic pages in their funnel, producing statistically significant wins, and then struggle to explain why the revenue line did not move. The tests worked. The metric moved. The revenue did not follow. The reason, in almost every case, was that the metric they had optimized was not actually predictive of the outcome they cared about.

The fix is not to abandon prioritization frameworks. The fix is to start with a different question.

Start With Funnel Analysis, Not Page Analysis

Before you build a quarterly roadmap, you need a funnel model — not a page-by-page conversion report, but an end-to-end view of the path from first contact to revenue event, with each step annotated by both its conversion rate and its predictive relationship to downstream outcomes.

This sounds more complicated than it is. In practice, it means answering three questions for each step in your funnel:

What percentage of users who reach this step complete it? This is your standard conversion rate. Most teams already track this.

Of the users who complete this step, what percentage eventually generate revenue? This is your step-level downstream conversion rate. Most teams do not track this, which is why they optimize the wrong steps.

What is the marginal revenue value of a one-point improvement at this step? This requires combining the first two numbers with your average order value or LTV metric. It is the number that actually belongs in your impact score.

When you build your funnel model this way, you frequently find that the highest-traffic pages are not the highest-leverage pages. The steps closest to the money — late-funnel steps, checkout steps, confirmation steps — often have lower absolute traffic but dramatically higher downstream conversion rates. A two-point improvement at a late-funnel step can be worth more than a ten-point improvement at a top-of-funnel step.

This is what happened in the case I described at the top of this article. The test that tripled the primary metric was on a step that had modest traffic in absolute terms. But it sat immediately before the payment step, which meant that nearly every user who completed it converted to revenue. Our funnel model had not mapped this relationship. When we finally did the analysis, the leverage was obvious.

The 70/30 Allocation Rule

Once you have a funnel model that maps leverage accurately, you face a second prioritization problem: the difference between iteration and exploration.

Iteration means taking a concept that has already produced a signal — a winning test, a consistent directional pattern, a mechanism that held up under analysis — and running the next logical experiment to push that concept further. Exploration means testing a new hypothesis you have not previously validated.

Most teams get this balance wrong in one of two directions. Teams that over-iterate run diminishing-return tests on the same proven patterns, extracting smaller and smaller gains from territory that is already well-mapped. Teams that over-explore run too many first-time hypotheses, generate a lot of inconclusive results, and never build the iteration chains that produce compounding gains.

The right balance, based on what I have observed across multiple programs, is approximately 70% iteration and 30% exploration per quarter. The iteration budget goes toward extending and refining your best-performing hypotheses. The exploration budget goes toward testing genuinely new mechanisms that could open new optimization territory.

The 70% is not about playing it safe. It is about compounding. A concept that produced a 4% lift in its first test will frequently produce another 2-3% in a well-designed follow-up, and another 1-2% in a third iteration. The cumulative gain from three iterations on a proven mechanism typically exceeds the gain from three independent exploratory tests, because exploratory tests have a much higher base rate of inconclusive results.

The 30% exploration budget matters because iteration chains eventually reach diminishing returns, and you need new territory to open. But the exploration budget should be deliberately small, because exploratory tests are more expensive per expected unit of ROI. You are paying for learning, not confirming.

GrowthLayer's pipeline board makes this explicit — you can tag tests as iteration or exploration at the brief stage, which lets you track the balance over time and catch drift in either direction before it becomes a problem.

Building the Actual Roadmap

With a funnel model and an allocation framework, you can build a roadmap that is specific enough to be useful without being brittle.

A quarterly testing roadmap has four components.

The leverage map. A ranked list of the top 8-10 funnel steps by marginal revenue value per percentage point of improvement. This is the universe of pages and steps that are eligible for the roadmap. You build it once per quarter and update it if something material changes in the business.

The iteration pipeline. The 70% of your testing capacity committed to extending proven concepts. For each active iteration chain, document the original hypothesis, the results to date, and the specific next test. An iteration chain without a documented "next test" is stalled, not progressing.

The exploration slate. The 30% of your capacity committed to new hypotheses. Each exploration test needs a specific mechanism statement — not "test a new headline," but "test whether reducing cognitive load at the plan selection step increases progression, because we hypothesize that the current three-column comparison layout creates choice overload." The mechanism statement is what allows you to learn from inconclusive results.

The ROI model. For each test on the roadmap, a pre-calculated estimate of the revenue impact if the test produces a result at your historical win rate and average effect size. This is not a forecast. It is a shared language that lets stakeholders understand why each test is on the roadmap and lets your team understand what success actually looks like before the test runs.

The ROI model also serves a function that most teams overlook: it catches prioritization errors before they become wasted test cycles. If you calculate the expected ROI for a test and the number is trivially small even under optimistic assumptions, that test does not belong on the roadmap regardless of its ICE score.

Measuring Program ROI, Not Just Test-Level Wins

The most important shift in mindset for any team trying to build a serious testing roadmap is the move from test-level thinking to program-level thinking.

Test-level thinking asks: "Did this test win?" Program-level thinking asks: "Did this program deliver revenue impact this quarter?"

These questions have different answers more often than most practitioners acknowledge. A program can produce a high test win rate while delivering mediocre revenue impact, if the wins are concentrated in low-leverage areas of the funnel. A program can produce a modest test win rate while delivering strong revenue impact, if the wins are concentrated in high-leverage areas. The win rate is a vanity metric. The revenue contribution is the actual measure of program health.

Measuring program ROI requires two things that most teams do not have in place. First, you need a methodology for attributing revenue impact to test results — not just the immediate conversion lift, but the downstream revenue value calculated using your funnel model. Second, you need a comparison baseline: what would the revenue line have looked like without the testing program? This is genuinely hard to measure, which is why most teams do not do it. But an imperfect measurement of program ROI is more valuable than a precise measurement of test win rate.

I track program ROI in GrowthLayer by maintaining a running tally of the downstream revenue estimates from each winning test, adjusted for the percentage of traffic affected and the expected duration of the gain. The number is not exact. But it is directionally accurate enough to answer the question that matters most to the business: "Is the testing program paying for itself?"

In most programs I have worked with, the answer to that question is much more uncertain than it should be. The highest-ICE tests are generating statistically significant results. The dashboard looks productive. But nobody has actually traced those results to the revenue line.

The testing roadmap is not a document you build once per quarter and then execute against. It is a framework for making the right tradeoffs between speed and rigor, between iteration and exploration, between high-traffic pages and high-leverage pages. The teams that build revenue-producing programs are the ones that stay honest about the difference.

The Roadmap Review Process

A quarterly testing roadmap without a review process is a plan, not a program.

The review process I recommend runs at three cadences.

Weekly: Test status updates. Which tests are running, which have concluded, which are blocked. No decisions made at this level — just a shared understanding of current state.

Biweekly: Result analysis. For each concluded test, a brief review that covers the result, the mechanism explanation, and the implication for the iteration pipeline. This is where "what did we learn?" gets answered, not just "did it win?"

Quarterly: Roadmap rebuild. A full refresh of the leverage map, a retrospective on the prior quarter's program ROI, and a new roadmap built using the process above. The retrospective should include honest accounting of which tests were on the roadmap for good reasons and which were there because someone had a pet hypothesis they wanted to test.

The quarterly retrospective is where the discipline lives. It is easy to build a good roadmap. It is harder to maintain the intellectual honesty required to keep it good when stakeholder pressure, organizational politics, and confirmation bias are all pushing in the direction of testing the things that feel important rather than the things that are important.

If you are managing a testing program and the quarterly retrospective is not a meeting that produces uncomfortable conversations, the retrospective is probably not being done rigorously enough.

Building a testing roadmap that produces revenue impact is not a technical problem. The analytics infrastructure required is straightforward. The prioritization math is not complicated. The challenge is the discipline to use a funnel model instead of page-level traffic data, to maintain the 70/30 allocation instead of chasing every interesting idea, and to measure program ROI instead of defaulting to test win rate as the success metric.

If you want to build that discipline into your program infrastructure — test briefs, pipeline boards, iteration chain tracking, and meta-analysis across quarters — GrowthLayer is the platform I built specifically for that purpose. Start with the free tier to map your funnel and see where the leverage actually lives.

Keep exploring