Hypergeometric Distribution Calculator
Our free exponents & logarithms calculator solves hypergeometric distribution problems. Get worked examples, visual aids, and downloadable results.
Calculator
Adjust values & calculateProbability Distribution
Formula
Where N is the population size, K is the number of success states in the population, n is the number of draws (sample size), and k is the desired number of observed successes. C(a,b) is the binomial coefficient 'a choose b'.
Last reviewed: December 2025
Worked Examples
Example 1: Drawing Hearts from a Deck
Example 2: Quality Control Inspection
Background & Theory
The Hypergeometric Distribution Calculator applies the following established principles and formulas. Statistics and probability provide the mathematical framework for drawing conclusions from data under uncertainty. The measures of central tendency describe where data cluster. The mean is the arithmetic average, computed as the sum of all values divided by the count. The median is the middle value of an ordered dataset, robust to extreme outliers. The mode is the most frequent value. Spread is quantified by variance, the average squared deviation from the mean, and by its square root, the standard deviation. For a sample, variance uses n minus one in the denominator to correct for bias in estimation. The normal distribution, defined by its mean and standard deviation, is the cornerstone of parametric statistics. Its bell-shaped probability density follows the formula f(x) = (1 / (sigma * sqrt(2*pi))) * exp(-0.5 * ((x - mu) / sigma)^2). The empirical rule states that approximately 68 percent of observations fall within one standard deviation of the mean, 95 percent within two, and 99.7 percent within three. A z-score standardizes a data point by subtracting the mean and dividing by the standard deviation, expressing how many standard deviations an observation lies from the mean. In hypothesis testing, the p-value is the probability of observing a result at least as extreme as the one obtained, assuming the null hypothesis is true. Confidence intervals express the range within which the true population parameter falls with a specified probability, typically 95 percent. Correlation measures linear association between two variables, with Pearson's r ranging from negative one to positive one. Correlation does not imply causation. Linear regression fits a line of the form y = a + bx to minimize the sum of squared residuals. Bayes' theorem relates conditional probabilities: P(A|B) = P(B|A) * P(A) / P(B), allowing prior beliefs to be updated on new evidence. The law of large numbers guarantees that the sample mean converges to the population mean as sample size grows. The central limit theorem states that the distribution of sample means approaches normality regardless of the population distribution, provided the sample size is sufficiently large, typically 30 or more.
History
The history behind the Hypergeometric Distribution Calculator traces back through the following developments. The mathematical study of probability emerged in the 17th century from correspondence between Blaise Pascal and Pierre de Fermat in 1654. Their exchange, prompted by a gambling problem posed by the Chevalier de Mere, established the foundations of probability theory by calculating expected outcomes through systematic enumeration of cases. Jacob Bernoulli formalized the law of large numbers in his posthumously published Ars Conjectandi of 1713, proving rigorously that empirical frequencies converge to theoretical probabilities with increasing observations. His work laid the groundwork for inferential statistics by connecting mathematical probability to observed data. Carl Friedrich Gauss developed the method of least squares around 1795 while adjusting astronomical observations, and he recognized the bell-shaped error distribution that now bears his name. Pierre-Simon Laplace independently worked on the normal distribution and proved an early version of the central limit theorem around 1810, demonstrating why errors in measurement tend toward normality. The late 19th century saw statistics emerge as a distinct scientific discipline. Francis Galton introduced regression and correlation in the 1880s while studying heredity. Karl Pearson formalized these concepts, developed the chi-squared test, and founded the journal Biometrika in 1901, establishing statistics as a rigorous academic field. Ronald Fisher transformed statistical practice in the early 20th century. His 1925 book Statistical Methods for Research Workers introduced significance testing, analysis of variance, and the concept of the p-value as a decision threshold, establishing the framework still used in scientific research. Fisher and Jerzy Neyman engaged in a prolonged methodological dispute over the interpretation of hypothesis tests. The Bayesian approach, rooted in the 18th century work of Thomas Bayes and Laplace, was largely eclipsed by frequentist methods through much of the 20th century but experienced a revival after World War II and accelerated with computational advances. The late 20th and early 21st centuries brought statistics into every domain through big data, machine learning, and the routine availability of software capable of processing millions of observations.
Frequently Asked Questions
Formula
P(X=k) = C(K,k) * C(N-K, n-k) / C(N,n)
Where N is the population size, K is the number of success states in the population, n is the number of draws (sample size), and k is the desired number of observed successes. C(a,b) is the binomial coefficient 'a choose b'.
Worked Examples
Example 1: Drawing Hearts from a Deck
Problem: What is the probability of drawing exactly 2 hearts in a 5-card hand from a standard 52-card deck (13 hearts)?
Solution: N = 52 (population), K = 13 (hearts), n = 5 (draw), k = 2 (desired hearts)\nP(X=2) = C(13,2) * C(39,3) / C(52,5)\nC(13,2) = 78\nC(39,3) = 9,139\nC(52,5) = 2,598,960\nP(X=2) = 78 * 9,139 / 2,598,960 = 712,842 / 2,598,960
Result: P(X=2) = 0.2743 or 27.43%
Example 2: Quality Control Inspection
Problem: A lot of 200 items contains 15 defective items. If 10 items are inspected, what is the probability of finding exactly 1 defective?
Solution: N = 200, K = 15, n = 10, k = 1\nP(X=1) = C(15,1) * C(185,9) / C(200,10)\nMean = 10 * 15/200 = 0.75\nP(X=1) = 15 * C(185,9) / C(200,10)
Result: P(X=1) = 0.3670 or 36.70%
Frequently Asked Questions
What is the hypergeometric distribution?
The hypergeometric distribution models the probability of drawing a specific number of successes from a finite population without replacement. Unlike the binomial distribution which assumes replacement (or infinite population), the hypergeometric distribution accounts for the changing probability as items are drawn. A classic example is drawing cards from a deck: what is the probability of getting exactly 2 hearts in a 5-card hand from a standard 52-card deck? The distribution is defined by three parameters: the population size N, the number of success states K in the population, and the number of draws n. Each draw changes the composition of the remaining population.
How is the hypergeometric distribution different from the binomial distribution?
The key difference is sampling with versus without replacement. The binomial distribution assumes each trial is independent with a constant probability of success, which applies when sampling with replacement or from an effectively infinite population. The hypergeometric distribution accounts for the fact that each draw changes the remaining population composition. For example, after drawing a heart from a deck, the probability of the next card being a heart changes from 13/52 to 12/51. When the population is very large relative to the sample size, the hypergeometric distribution approximates the binomial distribution because removing one item barely changes the probabilities.
What is the formula for the hypergeometric probability?
The probability mass function is P(X = k) = C(K,k) * C(N-K, n-k) / C(N,n), where C(a,b) is the binomial coefficient (a choose b). Here N is the total population, K is the number of success items, n is the sample size, and k is the desired number of successes. The numerator counts the favorable outcomes: C(K,k) ways to choose k successes from K success items, times C(N-K, n-k) ways to choose the remaining n-k items from the N-K non-success items. The denominator C(N,n) counts all possible ways to draw n items from N. This ratio gives the exact probability.
What are common applications of the hypergeometric distribution?
The hypergeometric distribution appears in quality control when inspecting a batch of products without replacement, such as testing 10 items from a batch of 100 to check for defects. It is used in ecology for capture-recapture methods to estimate animal populations. In card games, it calculates the probability of specific hands. In genetics, it models the likelihood of observing a certain number of genes of interest in a random sample. Statistical tests like Fisher's exact test use the hypergeometric distribution for analyzing contingency tables, especially with small sample sizes where chi-squared approximations are unreliable.
What are the mean and variance of the hypergeometric distribution?
The mean (expected value) of the hypergeometric distribution is E(X) = nK/N, which is intuitive: if 25% of the population are successes and you draw 10 items, you expect 2.5 successes on average. The variance is Var(X) = n*K*(N-K)*(N-n) / (N^2*(N-1)). The factor (N-n)/(N-1) is called the finite population correction factor, and it makes the variance smaller than the corresponding binomial variance. As the population grows much larger than the sample, this correction factor approaches 1 and the variance approaches the binomial variance of n*p*(1-p) where p = K/N.
How do you calculate cumulative hypergeometric probabilities?
Cumulative probabilities are computed by summing individual probabilities. P(X <= k) sums P(X = x) for all valid x from the minimum possible successes to k. The minimum is max(0, n+K-N), which accounts for cases where you must draw some successes because there are not enough non-successes to fill the sample. The maximum possible successes is min(K, n). For P(X >= k), compute 1 - P(X <= k-1). These cumulative values answer practical questions like: what is the probability of getting at least 3 defective items when inspecting 10 items from a lot where 20 out of 200 are defective?
References
Reviewed by Manoj Kumar, Mathematics Educator ยท Editorial policy