Q: What is statistical power and why does it matter?

Statistical power is the probability that your test will correctly detect a real difference between variants when one actually exists. A test with 80% power has an 80% chance of detecting a true effect and a 20% chance of missing it (a Type II error or false negative). Power depends on four factors: sample size, effect size (how big the real difference is), significance level (alpha), and baseline conversion rate. Running underpowered tests is a common mistake that leads teams to conclude that a variant has no effect when it actually does. Before starting an A/B test, you should calculate the required sample size to achieve at least 80% power for the minimum detectable effect that would be practically meaningful to your business.

Q: How long should I run an A/B test?

You should run an A/B test until you reach the pre-calculated sample size needed for adequate statistical power, typically 80% or higher. Stopping a test early because it looks significant (called peeking) inflates your false positive rate dramatically. As a guideline, most tests should run for at least one full business cycle (usually one to two weeks) to account for day-of-week effects and traffic patterns. Additionally, never run a test indefinitely hoping for significance, as this is a form of p-hacking. If your traffic is low, you may need to test larger changes that produce bigger effect sizes, or accept that you need several weeks or months of data. Tools like sequential testing or Bayesian methods can allow valid early stopping.

Question 1

What is statistical significance in A/B testing?

Accepted Answer

Statistical significance in A/B testing tells you whether the observed difference between your control (A) and variant (B) is likely due to a real effect rather than random chance. It is quantified by the p-value, which represents the probability of observing a difference as large as (or larger than) what you measured, assuming there is actually no real difference between the two versions. A commonly used threshold is p < 0.05, meaning there is less than a 5% chance the result is due to random variation. However, statistical significance alone does not tell you the practical importance of the difference or whether the observed lift is meaningful for your business. You should always consider effect size and confidence intervals alongside significance.

Question 2

What is statistical power and why does it matter?

Accepted Answer

Statistical power is the probability that your test will correctly detect a real difference between variants when one actually exists. A test with 80% power has an 80% chance of detecting a true effect and a 20% chance of missing it (a Type II error or false negative). Power depends on four factors: sample size, effect size (how big the real difference is), significance level (alpha), and baseline conversion rate. Running underpowered tests is a common mistake that leads teams to conclude that a variant has no effect when it actually does. Before starting an A/B test, you should calculate the required sample size to achieve at least 80% power for the minimum detectable effect that would be practically meaningful to your business.

Question 3

How long should I run an A/B test?

Accepted Answer

You should run an A/B test until you reach the pre-calculated sample size needed for adequate statistical power, typically 80% or higher. Stopping a test early because it looks significant (called peeking) inflates your false positive rate dramatically. As a guideline, most tests should run for at least one full business cycle (usually one to two weeks) to account for day-of-week effects and traffic patterns. Additionally, never run a test indefinitely hoping for significance, as this is a form of p-hacking. If your traffic is low, you may need to test larger changes that produce bigger effect sizes, or accept that you need several weeks or months of data. Tools like sequential testing or Bayesian methods can allow valid early stopping.

A B Test Significance Power Analyzer

Formula

Worked Examples

Example 1: E-commerce Button Color Test

Example 2: Landing Page Headline Test

Frequently Asked Questions

What is statistical significance in A/B testing?

What is statistical power and why does it matter?

How long should I run an A/B test?

References