Question 1

What is meta-analysis in A/B testing?

Accepted Answer

Meta-analysis is a statistical technique for combining results from multiple independent experiments that test similar hypotheses. In A/B testing, it lets you pool results from several tests — for example, running the same CTA change across different pages or markets — to get a more precise estimate of the true effect size. This is especially useful when individual tests are underpowered.

Question 2

When should I combine A/B test results?

Accepted Answer

Combine results when you have multiple tests that address the same underlying question — such as testing a similar design change across different pages, testing the same hypothesis in different markets, or re-running a previous test. Avoid combining tests that measure fundamentally different things, as this can produce misleading pooled estimates.

Question 3

What is heterogeneity and why does it matter?

Accepted Answer

Heterogeneity measures how much the effect sizes vary across your studies beyond what random chance would explain. High heterogeneity (I² > 75%) suggests the true effect differs meaningfully between tests, which means a single pooled estimate may not accurately represent any individual context. In such cases, a random-effects model or investigating the sources of variation may be more appropriate.

Question 4

What is the difference between fixed and random effects models?

Accepted Answer

A fixed-effects model assumes all studies share one true effect size, and differences are due to sampling error only. A random-effects model assumes the true effect varies between studies and accounts for this extra variability. This calculator uses a fixed-effects model, which is appropriate when your studies are similar in design and context. If heterogeneity is high, consider a random-effects approach.

Question 5

How many studies do I need for a meta-analysis?

Accepted Answer

While technically you can combine as few as two studies, meta-analysis becomes more reliable with more studies. With only 2–3 studies, the heterogeneity tests have low statistical power and the pooled estimate is heavily influenced by each individual study. For robust conclusions, 5 or more studies is generally recommended, though even combining 2–3 well-designed tests provides a better estimate than looking at each in isolation.

Study	Effect (%)	Weight
Test 1	5.20%	23.5%
Test 2	3.80%	65.9%
Test 3	6.10%	10.7%
Pooled	4.37%	100%

Meta-Analysis Calculator

Results

Heterogeneity

Study Breakdown

Methodology

Frequently Asked Questions

Related Calculators