A/B Test Significance
Calculate statistical significance and required sample size for A/B tests. Enter values for instant results with step-by-step formulas.
Formula
z = (p₂ - p₁) / √[p(1-p)(1/n₁ + 1/n₂)]
Where z is the test statistic, p₁ and p₂ are conversion rates for control and variation, p is the pooled proportion, and n₁ and n₂ are sample sizes. The p-value is calculated from the normal distribution. For 95% significance, z must exceed ±1.96 (two-tailed).
Worked Examples
Example 1: E-commerce Checkout Button Color Test
Problem: An e-commerce site tests green vs. orange checkout buttons. Control (green): 50,000 visitors, 1,750 conversions. Variation (orange): 50,000 visitors, 1,925 conversions. Is the orange button significantly better at 95% confidence?
Solution: Step 1: Calculate conversion rates\nControl rate = 1,750/50,000 = 3.50%\nVariation rate = 1,925/50,000 = 3.85%\nRelative uplift = (3.85-3.50)/3.50 = 10.0%\n\nStep 2: Calculate pooled proportion and standard error\nPooled p = (1,750+1,925)/(50,000+50,000) = 3.675%\nSE = √(0.03675 × 0.96325 × (1/50,000 + 1/50,000)) = 0.001189\n\nStep 3: Calculate z-score\nz = (0.0385 - 0.0350) / 0.001189 = 2.94\n\nStep 4: Calculate p-value (two-tailed)\np-value = 2 × (1 - Φ(2.94)) = 0.0033\n\nStep 5: Determine significance\np-value (0.0033) < α (0.05) ✓\n\nStep 6: Calculate 95% confidence interval\nCI = (0.35%) ± 1.96 × 0.119%\nCI = [0.12%, 0.58%]\n\nConclusion: Orange button shows a statistically significant 10% relative improvement.
Result: Significant: YES (p=0.003) | Uplift: +10.0% | 95% CI: [0.12%, 0.58%] | ~175 extra conversions
Example 2: Sample Size Calculation for Pricing Page Test
Problem: A SaaS company wants to test a new pricing page. Current conversion rate is 2.5%. They want to detect a 15% relative improvement with 80% power at 95% significance. How many visitors per variant are needed?
Solution: Step 1: Define parameters\nBaseline rate (p₁) = 2.5% = 0.025\nMinimum Detectable Effect = 15% relative\nNew rate (p₂) = 0.025 × 1.15 = 2.875% = 0.02875\nAbsolute effect = 0.02875 - 0.025 = 0.00375\n\nStep 2: Get z-scores\nα = 0.05, two-tailed: z_α/2 = 1.96\nPower = 80%: z_β = 0.84\n\nStep 3: Apply sample size formula used by this calculator\nn = 2 × (z_α/2 + z_β)² × [p₁(1-p₁) + p₂(1-p₂)] / (p₂-p₁)²\n\nn = 2 × (1.96 + 0.84)² × [0.025×0.975 + 0.02875×0.97125] / (0.00375)²\nn ≈ 58,380 per variant\n\nStep 4: Calculate test duration\nAt 2,000 visitors/day split 50/50:\nDays needed = 58,380 / 1,000 = 59 days (about 2 months)
Result: Required: 58,380 visitors per variant | 116,760 total | ~59 days at 2K visitors/day
Example 3: Marginal Result Interpretation
Problem: A landing page test shows: Control: 8,000 visitors, 240 conversions. Variation: 8,000 visitors, 272 conversions. The uplift looks promising. How should this be interpreted?
Solution: Step 1: Calculate metrics\nControl rate = 240/8,000 = 3.00%\nVariation rate = 272/8,000 = 3.40%\nRelative uplift = 13.3%\n\nStep 2: Assess statistical significance\np-value ≈ 0.151 > α = 0.05\nResult is NOT statistically significant at 95% confidence\nIt is also not significant at 90% confidence\n\nStep 3: Calculate confidence interval\n95% CI for difference: [-0.15%, 0.95%]\nThe CI includes zero, confirming non-significance\n\nStep 4: Interpret the situation\n- Promising directional lift but still inconclusive\n- Could be real effect or random variation\n- Sample size is too small for confidence\n\nStep 5: Recommendations\nOption A: Continue test until the planned sample size is reached\nOption B: Re-run with a larger traffic allocation\nOption C: Treat this as exploratory evidence, not a
Result: Not significant at 95% (p=0.151) | Suggestive but inconclusive | Recommend: keep running or gather more traffic
Frequently Asked Questions
What is statistical significance in A/B testing?
Statistical significance indicates that an observed difference between variants is unlikely to have occurred by random chance alone. When we say a result is 'statistically significant at 95% confidence,' we mean there's less than a 5% probability that the observed difference happened by chance (p-value < 0.05). However, statistical significance doesn't mean the result is practically important—a tiny difference can be statistically significant with large enough sample sizes.
How do I calculate the sample size needed for an A/B test?
Sample size depends on: 1) Baseline conversion rate, 2) Minimum Detectable Effect (MDE) - the smallest improvement worth detecting, 3) Statistical power (typically 80%), and 4) Significance level (typically 95%/α=0.05). The formula involves the z-scores for your desired power and significance level. Generally, smaller effects and lower baseline rates require larger samples. A 10% relative lift from a 3% baseline typically requires ~30,000 visitors per variant.
How long should I run an A/B test?
Run your test until you reach the required sample size calculated before the test starts. Never stop early just because you see significance—this inflates false positive rates (peeking problem). Also consider: 1) Run for at least one full week to capture day-of-week effects, 2) Account for seasonality, 3) Ensure you capture business cycles. Tools like sequential testing or always-valid p-values allow valid early stopping, but require specific statistical methods.
What inputs do I need to use A/B Test Significance accurately?
Each field is labelled with the required unit (metric or imperial). Gather your source values before starting — for example, a weight measurement in kilograms, a distance in metres, or a dollar amount — and enter them exactly as measured. The formula section on this page lists every variable and explains what each represents.
How accurate are the results from A/B Test Significance?
All calculations use established mathematical formulas and are performed with high-precision arithmetic. Results are accurate to the precision shown. For critical decisions in finance, medicine, or engineering, always verify results with a qualified professional.
Is my data stored or sent to a server?
No. All calculations run entirely in your browser using JavaScript. No data you enter is ever transmitted to any server or stored anywhere. Your inputs remain completely private.