Skip to main content

Pearson Correlation Calculator

Compute pearson correlation using validated scientific equations. See step-by-step derivations, unit analysis, and reference values.

Skip to calculator
Biology

Pearson Correlation Calculator

Calculate the Pearson correlation coefficient (r), R-squared, t-statistic, and p-value for paired data. Test the significance of linear relationships in biological and statistical research.

Last updated: December 2025

Calculator

Adjust values & calculate
Pearson Correlation (r)
1.0000
Positive Very Strong Correlation
R-squared
1.0000
100.0% variance
t-statistic
Infinity
df = 3
p-value
<0.001
Significant
Regression Line
y = 1.000x + 1.000
Data Points
5 pairs

Data Pairs

Pair 1X = 2, Y = 3
Pair 2X = 4, Y = 5
Pair 3X = 6, Y = 7
Pair 4X = 8, Y = 9
Pair 5X = 10, Y = 11
Your Result
r = 1.0000 (Positive Very Strong) | R-squared = 1.0000 | p-value ~ <0.001 (Significant)
Share Your Result
Understand the Math

Formula

r = [n(Sum XY) - (Sum X)(Sum Y)] / sqrt{[n(Sum X^2) - (Sum X)^2][n(Sum Y^2) - (Sum Y)^2]}

Where r is the Pearson correlation coefficient, n is the number of data pairs, Sum XY is the sum of products of paired values, Sum X and Sum Y are the sums of X and Y values respectively, and Sum X^2 and Sum Y^2 are sums of squared values. The t-statistic for significance testing is t = r * sqrt((n-2)/(1-r^2)) with n-2 degrees of freedom.

Last reviewed: December 2025

Worked Examples

Example 1: Gene Expression Correlation

A researcher measures expression levels of Gene A (X: 2.1, 3.5, 4.2, 5.8, 6.1) and Gene B (Y: 1.8, 3.2, 4.5, 5.1, 6.3) in 5 tissue samples. Calculate the Pearson correlation.
Solution:
n=5, Mean X=4.34, Mean Y=4.18 Sum of (Xi-MeanX)(Yi-MeanY) = 14.148 Sum of (Xi-MeanX)^2 = 11.628 Sum of (Yi-MeanY)^2 = 14.108 r = 14.148 / sqrt(11.628 * 14.108) = 14.148 / 12.805 = 0.9891 R-squared = 0.978, t = 11.55, df = 3
Result: r = 0.989 (Very Strong Positive), R-squared = 0.978 (97.8% variance explained)

Example 2: Drug Dosage vs Response Time

Test whether drug dosage (mg: 10, 20, 30, 40, 50, 60) correlates with response time (min: 45, 38, 32, 25, 20, 15) in 6 patients.
Solution:
n=6, Mean X=35, Mean Y=29.17 Sum of (Xi-MeanX)(Yi-MeanY) = -875 Sum of (Xi-MeanX)^2 = 1750 Sum of (Yi-MeanY)^2 = 458.83 r = -875 / sqrt(1750 * 458.83) = -875 / 896.1 = -0.9764 R-squared = 0.953, t = -9.01, df = 4, p < 0.001
Result: r = -0.976 (Very Strong Negative), higher dosage strongly associated with lower response time
Expert Insights

Background & Theory

The Pearson Correlation Calculator applies the following established principles and formulas. Statistics and probability provide the mathematical framework for drawing conclusions from data under uncertainty. The measures of central tendency describe where data cluster. The mean is the arithmetic average, computed as the sum of all values divided by the count. The median is the middle value of an ordered dataset, robust to extreme outliers. The mode is the most frequent value. Spread is quantified by variance, the average squared deviation from the mean, and by its square root, the standard deviation. For a sample, variance uses n minus one in the denominator to correct for bias in estimation. The normal distribution, defined by its mean and standard deviation, is the cornerstone of parametric statistics. Its bell-shaped probability density follows the formula f(x) = (1 / (sigma * sqrt(2*pi))) * exp(-0.5 * ((x - mu) / sigma)^2). The empirical rule states that approximately 68 percent of observations fall within one standard deviation of the mean, 95 percent within two, and 99.7 percent within three. A z-score standardizes a data point by subtracting the mean and dividing by the standard deviation, expressing how many standard deviations an observation lies from the mean. In hypothesis testing, the p-value is the probability of observing a result at least as extreme as the one obtained, assuming the null hypothesis is true. Confidence intervals express the range within which the true population parameter falls with a specified probability, typically 95 percent. Correlation measures linear association between two variables, with Pearson's r ranging from negative one to positive one. Correlation does not imply causation. Linear regression fits a line of the form y = a + bx to minimize the sum of squared residuals. Bayes' theorem relates conditional probabilities: P(A|B) = P(B|A) * P(A) / P(B), allowing prior beliefs to be updated on new evidence. The law of large numbers guarantees that the sample mean converges to the population mean as sample size grows. The central limit theorem states that the distribution of sample means approaches normality regardless of the population distribution, provided the sample size is sufficiently large, typically 30 or more.

History

The history behind the Pearson Correlation Calculator traces back through the following developments. The mathematical study of probability emerged in the 17th century from correspondence between Blaise Pascal and Pierre de Fermat in 1654. Their exchange, prompted by a gambling problem posed by the Chevalier de Mere, established the foundations of probability theory by calculating expected outcomes through systematic enumeration of cases. Jacob Bernoulli formalized the law of large numbers in his posthumously published Ars Conjectandi of 1713, proving rigorously that empirical frequencies converge to theoretical probabilities with increasing observations. His work laid the groundwork for inferential statistics by connecting mathematical probability to observed data. Carl Friedrich Gauss developed the method of least squares around 1795 while adjusting astronomical observations, and he recognized the bell-shaped error distribution that now bears his name. Pierre-Simon Laplace independently worked on the normal distribution and proved an early version of the central limit theorem around 1810, demonstrating why errors in measurement tend toward normality. The late 19th century saw statistics emerge as a distinct scientific discipline. Francis Galton introduced regression and correlation in the 1880s while studying heredity. Karl Pearson formalized these concepts, developed the chi-squared test, and founded the journal Biometrika in 1901, establishing statistics as a rigorous academic field. Ronald Fisher transformed statistical practice in the early 20th century. His 1925 book Statistical Methods for Research Workers introduced significance testing, analysis of variance, and the concept of the p-value as a decision threshold, establishing the framework still used in scientific research. Fisher and Jerzy Neyman engaged in a prolonged methodological dispute over the interpretation of hypothesis tests. The Bayesian approach, rooted in the 18th century work of Thomas Bayes and Laplace, was largely eclipsed by frequentist methods through much of the 20th century but experienced a revival after World War II and accelerated with computational advances. The late 20th and early 21st centuries brought statistics into every domain through big data, machine learning, and the routine availability of software capable of processing millions of observations.

Share this calculator

Explore More

Frequently Asked Questions

Pearson correlation coefficient (r) measures the linear relationship between two continuous variables. It ranges from -1 to +1, where +1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 means no linear relationship. You should use Pearson correlation when both variables are continuous, approximately normally distributed, and you expect a linear (not curved) relationship. It is the most commonly used correlation measure in biological and biostatistical research for measuring associations between variables like height and weight, dosage and response, or gene expression levels.
The p-value tests the null hypothesis that the true population correlation is zero (no linear relationship). If p < 0.05 (the conventional significance threshold), you reject the null hypothesis and conclude the correlation is statistically significant. However, statistical significance does not imply practical importance. With very large sample sizes, even tiny correlations (r = 0.05) can be significant. Always consider the magnitude of r alongside the p-value. In biological research, it is common to report both r and p together, and to use scatterplots to visually confirm the relationship.
A minimum of 3 data pairs is required mathematically, but meaningful results typically require at least 20-30 pairs. For detecting moderate correlations (r around 0.5) with 80% power at alpha = 0.05, you need approximately 30 pairs. For weaker correlations (r around 0.3), you need roughly 85 pairs. In biological studies, sample sizes under 10 should be interpreted very cautiously as the correlation estimate can be highly unstable. Power analysis can help determine the exact sample size needed for your expected effect size.
Pearson correlation requires several assumptions: (1) Both variables must be continuous and measured on interval or ratio scales. (2) The relationship between the variables should be approximately linear. (3) Both variables should be roughly normally distributed, especially for small samples. (4) Observations should be independent of each other. (5) There should be no significant outliers, as Pearson r is sensitive to extreme values. If these assumptions are violated, consider using Spearman rank correlation instead, which is more robust to non-normality and outliers.
Correlation measures the strength and direction of a linear relationship between two variables (r ranges from -1 to +1). Causation means one variable directly influences the other. Correlation alone cannot prove causation because confounding variables, reverse causality, or coincidence may explain the association.
You may use the results for reference and educational purposes. For professional reports, academic papers, or critical decisions, we recommend verifying outputs against peer-reviewed sources or consulting a qualified expert in the relevant field.
Educational Note: This calculator is provided for educational and informational purposes. Results are based on the formulas and inputs provided. Always verify important calculations independently. NovaCalculator processes calculator inputs client-side; optional analytics follow visitor consent settings. ยฉ 2024โ€“2026 NovaCalculator.

Share this calculator

Formula

r = [n(Sum XY) - (Sum X)(Sum Y)] / sqrt{[n(Sum X^2) - (Sum X)^2][n(Sum Y^2) - (Sum Y)^2]}

Where r is the Pearson correlation coefficient, n is the number of data pairs, Sum XY is the sum of products of paired values, Sum X and Sum Y are the sums of X and Y values respectively, and Sum X^2 and Sum Y^2 are sums of squared values. The t-statistic for significance testing is t = r * sqrt((n-2)/(1-r^2)) with n-2 degrees of freedom.

Worked Examples

Example 1: Gene Expression Correlation

Problem: A researcher measures expression levels of Gene A (X: 2.1, 3.5, 4.2, 5.8, 6.1) and Gene B (Y: 1.8, 3.2, 4.5, 5.1, 6.3) in 5 tissue samples. Calculate the Pearson correlation.

Solution: n=5, Mean X=4.34, Mean Y=4.18\nSum of (Xi-MeanX)(Yi-MeanY) = 14.148\nSum of (Xi-MeanX)^2 = 11.628\nSum of (Yi-MeanY)^2 = 14.108\nr = 14.148 / sqrt(11.628 * 14.108) = 14.148 / 12.805 = 0.9891\nR-squared = 0.978, t = 11.55, df = 3

Result: r = 0.989 (Very Strong Positive), R-squared = 0.978 (97.8% variance explained)

Example 2: Drug Dosage vs Response Time

Problem: Test whether drug dosage (mg: 10, 20, 30, 40, 50, 60) correlates with response time (min: 45, 38, 32, 25, 20, 15) in 6 patients.

Solution: n=6, Mean X=35, Mean Y=29.17\nSum of (Xi-MeanX)(Yi-MeanY) = -875\nSum of (Xi-MeanX)^2 = 1750\nSum of (Yi-MeanY)^2 = 458.83\nr = -875 / sqrt(1750 * 458.83) = -875 / 896.1 = -0.9764\nR-squared = 0.953, t = -9.01, df = 4, p < 0.001

Result: r = -0.976 (Very Strong Negative), higher dosage strongly associated with lower response time

Frequently Asked Questions

What is Pearson correlation and when should I use it?

Pearson correlation coefficient (r) measures the linear relationship between two continuous variables. It ranges from -1 to +1, where +1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 means no linear relationship. You should use Pearson correlation when both variables are continuous, approximately normally distributed, and you expect a linear (not curved) relationship. It is the most commonly used correlation measure in biological and biostatistical research for measuring associations between variables like height and weight, dosage and response, or gene expression levels.

How do I interpret the p-value for Pearson correlation?

The p-value tests the null hypothesis that the true population correlation is zero (no linear relationship). If p < 0.05 (the conventional significance threshold), you reject the null hypothesis and conclude the correlation is statistically significant. However, statistical significance does not imply practical importance. With very large sample sizes, even tiny correlations (r = 0.05) can be significant. Always consider the magnitude of r alongside the p-value. In biological research, it is common to report both r and p together, and to use scatterplots to visually confirm the relationship.

What sample size do I need for meaningful Pearson correlation?

A minimum of 3 data pairs is required mathematically, but meaningful results typically require at least 20-30 pairs. For detecting moderate correlations (r around 0.5) with 80% power at alpha = 0.05, you need approximately 30 pairs. For weaker correlations (r around 0.3), you need roughly 85 pairs. In biological studies, sample sizes under 10 should be interpreted very cautiously as the correlation estimate can be highly unstable. Power analysis can help determine the exact sample size needed for your expected effect size.

What are the assumptions of Pearson correlation?

Pearson correlation requires several assumptions: (1) Both variables must be continuous and measured on interval or ratio scales. (2) The relationship between the variables should be approximately linear. (3) Both variables should be roughly normally distributed, especially for small samples. (4) Observations should be independent of each other. (5) There should be no significant outliers, as Pearson r is sensitive to extreme values. If these assumptions are violated, consider using Spearman rank correlation instead, which is more robust to non-normality and outliers.

What is the difference between correlation and causation?

Correlation measures the strength and direction of a linear relationship between two variables (r ranges from -1 to +1). Causation means one variable directly influences the other. Correlation alone cannot prove causation because confounding variables, reverse causality, or coincidence may explain the association.

Is my data stored or sent to a server?

No. All calculations run entirely in your browser using JavaScript. No data you enter is ever transmitted to any server or stored anywhere. Your inputs remain completely private.

References

Reviewed by Daniel Agrici, Founder & Lead Developer ยท Editorial policy