CUPED Variance Reduction

Calculate A/B test variance reduction with CUPED. Enter values for instant results with step-by-step formulas.

December 2025

Worked Examples

Example 1: E-commerce Revenue Experiment

Problem:An e-commerce site runs a checkout flow A/B test. Revenue per user has high variance (σ²=2500). Pre-experiment revenue correlation is 0.65. Current sample: 50,000 per variant.

Solution:CUPED Analysis:\nCorrelation (r) = 0.65\nBaseline variance = 2500\nSample size = 50,000 per variant\n\nVariance Reduction:\nr² = 0.65² = 0.4225\nReduction = 42.25%\n\nNew Variance:\n2500 × (1 - 0.4225) = 1444\n\nStandard Error Impact:\nBaseline SE = √(2500/50000) = 0.224\nCUPED SE = √(1444/50000) = 0.170\nSE reduction = 24%\n\nMDE Impact (80% power, 95% CI):\nBaseline MDE = 2.8 × √2 × 0.224 = $0.89\nCUPED MDE = 2.8 × √2 × 0.170 = $0.67\nCan now detect 25% smaller effects!\n\nRuntime Equivalent:\n50,000 with CUPED ≈ 87,000 without\nSaves 37,000 users worth of time

Result:42% variance reduction | 25% smaller MDE | Equivalent to 87,000 users/variant

Example 2: Engagement Metric Optimization

Problem:A social app tests a new feed algorithm. Sessions per user is the metric. Pre-experiment correlation is only 0.4. Is CUPED worth implementing?

Solution:Correlation Analysis:\nr = 0.4\nr² = 0.16\nVariance reduction = 16%\n\nRuntime Impact:\nRuntime reduction ≈ 1 - √(1-0.16)\n= 1 - 0.917 = 8.3%\n\nA/B test running 4 weeks:\n4 weeks × 8.3% = 0.33 weeks = 2.3 days saved\n\nImplementation Trade-off:\n- Engineering effort: ~1-2 weeks\n- Runtime savings: 2 days per test\n- Break-even: ~5-7 experiments\n\nImproving Correlation:\n- Use 2-week pre-period instead of 1-week\n- Combine multiple pre-period metrics\n- Segment by user tenure (new vs returning)\n\nIf improved to r=0.55:\nr² = 0.30\nRuntime reduction ≈ 16%\nSaves ~4.5 days on 4-week test\nMuch better ROI

Result:16% variance reduction (r=0.4) | 8% faster | Consider improving covariate first

Example 3: Netflix-Style CUPED Implementation

Problem:A streaming service runs 100+ A/B tests annually on watch time. Pre-experiment watch time correlation is 0.75. Calculate the platform-wide value of implementing CUPED.

Solution:Single Experiment Analysis:\nr = 0.75\nr² = 0.5625\nVariance reduction = 56%\nRuntime reduction = 34%\n\nTypical experiment: 2 weeks\nWith CUPED: 2 × (1-0.34) = 1.32 weeks\nSavings per test: 4.8 days\n\nPlatform-Wide Impact:\n100 experiments/year × 4.8 days = 480 days\n= 16 experiment-months saved\n\nAlternative view - throughput increase:\nPreviously: 100 tests in 52 weeks\nWith CUPED: Can run 152 tests in same time\n52% more experiments!\n\nPower Improvement:\nIf experiments were 80% powered:\nSame sample now = 95%+ power\nOr: detect 25% smaller effects\n\nBusiness Value:\nFaster learning → faster shipping\n52% more tests → faster iteration\nHigher power → fewer false negatives\nCompetitive advantage: significant

Result:56% variance reduction | 52% more tests annually | 480 experiment-days saved/year

Frequently Asked Questions

What is CUPED?

CUPED (Controlled-experiment Using Pre-Experiment Data) is a variance reduction technique that uses pre-experiment user behavior to reduce noise in A/B test metrics. By controlling for pre-existing differences, CUPED increases statistical power without needing more users.

How does CUPED reduce variance?

CUPED adjusts each user's metric by their pre-experiment behavior. If a user historically has high engagement, we expect high engagement during the experiment. Subtracting this expected value removes predictable variation, leaving only the experiment's true effect plus random noise.

What correlation do I need for CUPED to work?

Variance reduction equals correlation squared (r²). A 0.5 correlation gives 25% variance reduction; 0.7 gives 49%; 0.8 gives 64%. Correlations below 0.3 provide minimal benefit (<9%). Most web metrics achieve 0.4-0.7 correlation with pre-period data.

Does CUPED change my experiment results?

CUPED doesn't bias results - it reduces variance without changing the expected treatment effect. The adjusted metric has the same mean difference between variants but less noise. This means higher confidence in whatever effect you observe.