Correlation and Covariance Calculator
Our free statistics calculator solves correlation covariance problems. Get worked examples, visual aids, and downloadable results.
Calculator
Adjust values & calculateFormula
The Pearson correlation coefficient is calculated by dividing the sum of the products of deviations from the means by the geometric mean of the sums of squared deviations. Covariance uses the same numerator but divides by N (population) or N-1 (sample).
Last reviewed: December 2025
Worked Examples
Example 1: Height and Weight Correlation
Example 2: Study Hours vs Exam Score
Background & Theory
The Correlation and Covariance Calculator applies the following established principles and formulas. Statistics and probability provide the mathematical framework for drawing conclusions from data under uncertainty. The measures of central tendency describe where data cluster. The mean is the arithmetic average, computed as the sum of all values divided by the count. The median is the middle value of an ordered dataset, robust to extreme outliers. The mode is the most frequent value. Spread is quantified by variance, the average squared deviation from the mean, and by its square root, the standard deviation. For a sample, variance uses n minus one in the denominator to correct for bias in estimation. The normal distribution, defined by its mean and standard deviation, is the cornerstone of parametric statistics. Its bell-shaped probability density follows the formula f(x) = (1 / (sigma * sqrt(2*pi))) * exp(-0.5 * ((x - mu) / sigma)^2). The empirical rule states that approximately 68 percent of observations fall within one standard deviation of the mean, 95 percent within two, and 99.7 percent within three. A z-score standardizes a data point by subtracting the mean and dividing by the standard deviation, expressing how many standard deviations an observation lies from the mean. In hypothesis testing, the p-value is the probability of observing a result at least as extreme as the one obtained, assuming the null hypothesis is true. Confidence intervals express the range within which the true population parameter falls with a specified probability, typically 95 percent. Correlation measures linear association between two variables, with Pearson's r ranging from negative one to positive one. Correlation does not imply causation. Linear regression fits a line of the form y = a + bx to minimize the sum of squared residuals. Bayes' theorem relates conditional probabilities: P(A|B) = P(B|A) * P(A) / P(B), allowing prior beliefs to be updated on new evidence. The law of large numbers guarantees that the sample mean converges to the population mean as sample size grows. The central limit theorem states that the distribution of sample means approaches normality regardless of the population distribution, provided the sample size is sufficiently large, typically 30 or more.
History
The history behind the Correlation and Covariance Calculator traces back through the following developments. The mathematical study of probability emerged in the 17th century from correspondence between Blaise Pascal and Pierre de Fermat in 1654. Their exchange, prompted by a gambling problem posed by the Chevalier de Mere, established the foundations of probability theory by calculating expected outcomes through systematic enumeration of cases. Jacob Bernoulli formalized the law of large numbers in his posthumously published Ars Conjectandi of 1713, proving rigorously that empirical frequencies converge to theoretical probabilities with increasing observations. His work laid the groundwork for inferential statistics by connecting mathematical probability to observed data. Carl Friedrich Gauss developed the method of least squares around 1795 while adjusting astronomical observations, and he recognized the bell-shaped error distribution that now bears his name. Pierre-Simon Laplace independently worked on the normal distribution and proved an early version of the central limit theorem around 1810, demonstrating why errors in measurement tend toward normality. The late 19th century saw statistics emerge as a distinct scientific discipline. Francis Galton introduced regression and correlation in the 1880s while studying heredity. Karl Pearson formalized these concepts, developed the chi-squared test, and founded the journal Biometrika in 1901, establishing statistics as a rigorous academic field. Ronald Fisher transformed statistical practice in the early 20th century. His 1925 book Statistical Methods for Research Workers introduced significance testing, analysis of variance, and the concept of the p-value as a decision threshold, establishing the framework still used in scientific research. Fisher and Jerzy Neyman engaged in a prolonged methodological dispute over the interpretation of hypothesis tests. The Bayesian approach, rooted in the 18th century work of Thomas Bayes and Laplace, was largely eclipsed by frequentist methods through much of the 20th century but experienced a revival after World War II and accelerated with computational advances. The late 20th and early 21st centuries brought statistics into every domain through big data, machine learning, and the routine availability of software capable of processing millions of observations.
Frequently Asked Questions
Formula
r = Sum[(xi - x_mean)(yi - y_mean)] / sqrt[Sum(xi - x_mean)^2 * Sum(yi - y_mean)^2]
The Pearson correlation coefficient is calculated by dividing the sum of the products of deviations from the means by the geometric mean of the sums of squared deviations. Covariance uses the same numerator but divides by N (population) or N-1 (sample).
Worked Examples
Example 1: Height and Weight Correlation
Problem: Given heights (cm): 160, 165, 170, 175, 180 and weights (kg): 55, 62, 68, 72, 80, calculate the correlation and covariance.
Solution: Mean X = 170, Mean Y = 67.4\nDeviations: (-10,-12.4), (-5,-5.4), (0,0.6), (5,4.6), (10,12.6)\nSum of dx*dy = 124+27+0+23+126 = 300\nSum dx^2 = 100+25+0+25+100 = 250\nSum dy^2 = 153.76+29.16+0.36+21.16+158.76 = 363.2\nSample Cov = 300/4 = 75\nPearson r = 300 / sqrt(250 x 363.2) = 300/301.33 = 0.9956
Result: r = 0.9956 (Very Strong Positive) | Sample Cov = 75 | R-squared = 99.12%
Example 2: Study Hours vs Exam Score
Problem: Study hours: 2, 4, 6, 8, 10 and exam scores: 50, 55, 70, 80, 90. Find correlation.
Solution: Mean X = 6, Mean Y = 69\nSum dx*dy = (-4)(-19)+(-2)(-14)+(0)(1)+(2)(11)+(4)(21) = 76+28+0+22+84 = 210\nSum dx^2 = 16+4+0+4+16 = 40\nSum dy^2 = 361+196+1+121+441 = 1120\nPearson r = 210 / sqrt(40 x 1120) = 210/211.66 = 0.9922
Result: r = 0.9922 (Very Strong Positive) | More study hours strongly predict higher scores
Frequently Asked Questions
What is the difference between correlation and covariance?
Correlation and covariance both measure the relationship between two variables, but they differ in scale and interpretation. Covariance measures the directional relationship between two variables and can take any value from negative infinity to positive infinity. Its magnitude depends on the units of measurement, making it difficult to compare across different datasets. Correlation, specifically Pearson correlation, is a standardized version of covariance that always falls between -1 and +1. It is calculated by dividing the covariance by the product of the two standard deviations. This normalization makes correlation unitless and directly comparable across any pair of variables regardless of their scales. A correlation of +1 means perfect positive linear relationship, -1 means perfect negative, and 0 means no linear relationship.
How do I interpret the Pearson correlation coefficient?
The Pearson correlation coefficient r ranges from -1 to +1 and measures the strength and direction of a linear relationship between two variables. Values close to +1 indicate a strong positive relationship where both variables increase together. Values close to -1 indicate a strong negative relationship where one variable increases as the other decreases. Values near 0 suggest no linear relationship. Common interpretation thresholds are: 0.9 to 1.0 is very strong, 0.7 to 0.9 is strong, 0.5 to 0.7 is moderate, 0.3 to 0.5 is weak, and below 0.3 is very weak or negligible. However, context matters greatly. In physics, correlations below 0.95 might be considered poor, while in social sciences, correlations above 0.5 are often considered strong. Remember that correlation does not imply causation.
What is the difference between population and sample covariance?
The difference between population and sample covariance lies in the denominator used for calculation. Population covariance divides the sum of products of deviations by N (the total number of data points), assuming you have measured every member of the population. Sample covariance divides by N-1 instead, applying what is known as Bessel correction. This correction compensates for the fact that a sample tends to underestimate the true population variance because the sample mean is closer to the sample data points than the true population mean would be. When working with data from experiments, surveys, or any subset of a larger group, you should use the sample covariance (N-1). Population covariance is only appropriate when you have data for the entire population. For large datasets, the difference becomes negligible.
When should I use Spearman rank correlation instead of Pearson correlation?
Spearman rank correlation should be used instead of Pearson correlation in several situations. First, when the relationship between variables is monotonic but not necessarily linear, Spearman captures this better because it measures rank-order association. Second, when your data contains significant outliers, Spearman is more robust because converting values to ranks reduces the influence of extreme values. Third, when variables are measured on ordinal scales (like satisfaction ratings from 1 to 5), Spearman is more appropriate since it does not assume interval-level measurement. Fourth, when the data violates normality assumptions required by Pearson, Spearman provides a non-parametric alternative. Pearson is preferred when the relationship is truly linear and data is normally distributed with no major outliers, as it uses more information from the data and is statistically more powerful in those conditions.
How do I interpret the result?
Results are displayed with a label and unit to help you understand the output. Many calculators include a short explanation or classification below the result (for example, a BMI category or risk level). Refer to the worked examples section on this page for real-world context.
How do I get the most accurate result?
Enter values as precisely as possible using the correct units for each field. Check that you have selected the right unit (e.g. kilograms vs pounds, meters vs feet) before calculating. Rounding inputs early can reduce output precision.
References
Reviewed by Manoj Kumar, Mathematics Educator ยท Editorial policy