Question 1

What is statistical power and why does it matter?

Accepted Answer

Statistical power is the probability that your experiment will correctly detect a real effect when one exists — mathematically, it equals 1 minus the Type II error rate (beta). A power of 0.80 means there is an 80% chance of finding a statistically significant result if the true effect exists. Industry standard is 80% power, though clinical trials often require 90%. Low power leads to inconclusive experiments, wasted resources, and the risk of falsely concluding that an intervention does not work when it actually does. Underpowered studies are one of the biggest problems in research, contributing to the replication crisis. Proper power analysis before data collection prevents this issue.

Question 2

What is effect size and how do I choose one?

Accepted Answer

Effect size (Cohen d) measures the magnitude of the difference between groups in standard deviation units. Cohen defined d = 0.2 as small, 0.5 as medium, and 0.8 as large. To choose an appropriate effect size: (1) Review prior literature for similar interventions — what effects have others found? (2) Determine the minimum clinically or practically meaningful difference — the smallest change worth detecting. (3) Conduct a pilot study to estimate the likely effect. In practice, most real-world effects in social science are small (d = 0.2-0.4), medical interventions are small to medium (d = 0.3-0.6), and educational interventions can range from small to large depending on the context.

Question 3

How does the number of groups affect sample size?

Accepted Answer

Adding groups increases the total sample size needed. For a two-group comparison, you need N participants per group. With three groups (e.g., placebo, low dose, high dose), you need N per group times 3, and the per-group N increases slightly to maintain power across multiple comparisons. For ANOVA designs comparing k groups, the sample size per group is approximately (k-1)/k times what you would need for a two-sample test, multiplied by a correction for the F-test distribution. Factorial designs (2x2, 2x3) are more efficient because they test multiple factors simultaneously, requiring fewer total participants than running separate experiments for each factor.

Question 4

How do I conduct a power analysis for a randomized controlled trial?

Accepted Answer

To conduct a power analysis for an RCT, you need four inputs and solve for the fifth: sample size, effect size, significance level, power, and the number of groups. Typically you specify the desired power (usually 0.80 or 0.90), significance level (usually 0.05), and estimated effect size, then solve for the required sample size per group. The effect size should come from pilot data, prior literature, or the minimum clinically important difference. Account for expected dropout rates by inflating the calculated sample size, typically by 10 to 20 percent. If using stratified randomization or repeated measures, the sample size calculation requires adjustments. Software tools like G*Power, R, or Experiment Design Assistant: Sample Size & Power can perform these calculations accurately.

Experiment Design Assistant: Sample Size & Power

Formula

Worked Examples

Example 1: Clinical Trial Sample Size

Example 2: High-Power A/B Test Design

Frequently Asked Questions

What is statistical power and why does it matter?

What is effect size and how do I choose one?

How does the number of groups affect sample size?

How do I conduct a power analysis for a randomized controlled trial?

References