Experiment Power & Sample Size Checker
Calculate A/B test sample size, statistical power, and test duration. Enter values for instant results with step-by-step formulas.
Worked Examples
Example 1: E-commerce Checkout Optimization
Problem: An e-commerce site has 5% checkout conversion, 50,000 daily visitors, and wants to detect a 10% relative improvement. Calculate sample size and test duration at 95% significance and 80% power.
Solution: Parameters:\n- Baseline: 5% conversion\n- MDE: 10% relative (5% → 5.5% absolute)\n- α = 0.05, Power = 80%\n- Daily traffic: 50,000\n\nCalculation:\n- Absolute effect: 5% × 10% = 0.5 percentage points\n- z_α/2 = 1.96 (two-tailed, 95%)\n- z_β = 0.84 (80% power)\n- Pooled p = (0.05 + 0.055) / 2 = 0.0525\n\nSample size per variant:\nn = [z_α√(2p̄(1-p̄)) + z_β√(p1(1-p1) + p2(1-p2))]² / (p2-p1)²\nn ≈ 31,000 per variant\n\nTotal sample: 62,000\nDays needed: 62,000 / 50,000 = 1.24 days\n\n⚠️ This seems too short. Let's verify:\n- With 50K daily traffic to checkout page: unlikely\n- More realistic: 50K site visitors, ~5% reach checkout = 2,500/day\n- Revised days: 62,000 / 2,500 = 25 days\n\nFeasibility: Good (25 days)\n\nSanity Check:\n- 10% MDE on 5% baseline is reasonable\n- 25 days covers multi
Result: 31K per variant | 62K total | 25 days (checkout traffic) | Good feasibility
Example 2: Low-Traffic SaaS Pricing Test
Problem: A B2B SaaS has 500 daily trial signups, 2% trial-to-paid conversion, and wants to test a pricing change. What MDE is realistic? They want the test done in 4 weeks.
Solution: Constraint-Based Planning:\n- Available sample in 28 days: 500 × 28 = 14,000\n- Per variant (2 variants): 7,000\n- Baseline: 2% conversion\n- Target duration: 4 weeks\n\nReverse-Calculate MDE:\nGiven n = 7,000 per variant, what MDE is detectable?\n\nUsing power formula rearranged:\nMDE = f(n, baseline, α, power)\n\nWith 2% baseline and 7,000 samples:\n- At 80% power, 95% significance\n- Detectable absolute effect ≈ 0.7 percentage points\n- Relative MDE ≈ 35% (2% → 2.7%)\n\nInterpretation:\nYou can only reliably detect a 35%+ relative improvement.\n\nIs this acceptable?\n- If pricing change is expected to have 30%+ impact: Yes\n- If you need to detect 10% improvement: No (need 63,000 samples = 18 weeks)\n\nRecommendations:\n1. Accept 35% MDE if pricing change is substantial\n2. Extend test
Result: 7K per variant in 4 weeks | Only detects 35%+ MDE | Consider 8 weeks for 20% MDE
Example 3: Mobile App Feature Launch
Problem: A mobile app tests a new feature. DAU: 200,000. Baseline engagement: 15% use feature X. Want to detect 5% relative lift at 95%/80%. Traffic split: 50/50.
Solution: High-Traffic Scenario:\n\n- Baseline: 15% feature usage\n- MDE: 5% relative (15% → 15.75%)\n- Absolute effect: 0.75 percentage points\n- DAU: 200,000\n- Split: 50/50 (100K per variant daily)\n\nSample Size Calculation:\nWith 15% baseline (high rate), variance is higher:\n- p(1-p) = 0.15 × 0.85 = 0.1275\n\nn = [1.96√(2×0.15375×0.84625) + 0.84√(0.15×0.85 + 0.1575×0.8425)]² / (0.0075)²\nn ≈ 55,000 per variant\n\nTotal: 110,000\nDays needed: 110,000 / 200,000 = 0.55 days\n\n⚠️ Sub-day result suggests we should:\n1. Run for minimum 1-2 weeks anyway (weekly patterns, novelty)\n2. Consider smaller MDE since we have traffic headroom\n\nOptimized Plan:\n- Run for 14 days minimum (industry best practice)\n- Available sample: 200K × 14 = 2.8M\n- Per variant: 1.4M\n- Detectable MDE: ~1% relative (extr
Result: 55K per variant needed | <1 day for sample | Run 2 weeks minimum for validity
Frequently Asked Questions
What is statistical power in A/B testing?
Statistical power (1-β) is the probability of detecting a real effect when it exists. 80% power means if a true effect exists, you have 80% chance of detecting it. Low power means high false negative risk—you might miss real improvements.
How do I calculate sample size for A/B tests?
Sample size depends on: baseline conversion rate, MDE, significance level (α), and power (1-β). The formula involves z-scores and pooled variance. Higher baseline rates and larger MDEs require smaller samples; lower α and higher power require larger samples.
Is my data stored or sent to a server?
No. All calculations run entirely in your browser using JavaScript. No data you enter is ever transmitted to any server or stored anywhere. Your inputs remain completely private.
Can I use the results for professional or academic purposes?
You may use the results for reference and educational purposes. For professional reports, academic papers, or critical decisions, we recommend verifying outputs against peer-reviewed sources or consulting a qualified expert in the relevant field.
How do I interpret the result?
Results are displayed with a label and unit to help you understand the output. Many calculators include a short explanation or classification below the result (for example, a BMI category or risk level). Refer to the worked examples section on this page for real-world context.
Does Experiment Power & Sample Size Checker work offline?
Once the page is loaded, the calculation logic runs entirely in your browser. If you have already opened the page, most calculators will continue to work even if your internet connection is lost, since no server requests are needed for computation.