Skip to main content

Pearson Correlation Calculator

Compute pearson correlation using validated scientific equations. See step-by-step derivations, unit analysis, and reference values.

Share this calculator

Formula

r = [n(Sum XY) - (Sum X)(Sum Y)] / sqrt{[n(Sum X^2) - (Sum X)^2][n(Sum Y^2) - (Sum Y)^2]}

Where r is the Pearson correlation coefficient, n is the number of data pairs, Sum XY is the sum of products of paired values, Sum X and Sum Y are the sums of X and Y values respectively, and Sum X^2 and Sum Y^2 are sums of squared values. The t-statistic for significance testing is t = r * sqrt((n-2)/(1-r^2)) with n-2 degrees of freedom.

Worked Examples

Example 1: Gene Expression Correlation

Problem: A researcher measures expression levels of Gene A (X: 2.1, 3.5, 4.2, 5.8, 6.1) and Gene B (Y: 1.8, 3.2, 4.5, 5.1, 6.3) in 5 tissue samples. Calculate the Pearson correlation.

Solution: n=5, Mean X=4.34, Mean Y=4.18\nSum of (Xi-MeanX)(Yi-MeanY) = 14.148\nSum of (Xi-MeanX)^2 = 11.628\nSum of (Yi-MeanY)^2 = 14.108\nr = 14.148 / sqrt(11.628 * 14.108) = 14.148 / 12.805 = 0.9891\nR-squared = 0.978, t = 11.55, df = 3

Result: r = 0.989 (Very Strong Positive), R-squared = 0.978 (97.8% variance explained)

Example 2: Drug Dosage vs Response Time

Problem: Test whether drug dosage (mg: 10, 20, 30, 40, 50, 60) correlates with response time (min: 45, 38, 32, 25, 20, 15) in 6 patients.

Solution: n=6, Mean X=35, Mean Y=29.17\nSum of (Xi-MeanX)(Yi-MeanY) = -875\nSum of (Xi-MeanX)^2 = 1750\nSum of (Yi-MeanY)^2 = 458.83\nr = -875 / sqrt(1750 * 458.83) = -875 / 896.1 = -0.9764\nR-squared = 0.953, t = -9.01, df = 4, p < 0.001

Result: r = -0.976 (Very Strong Negative), higher dosage strongly associated with lower response time

Frequently Asked Questions

What is Pearson correlation and when should I use it?

Pearson correlation coefficient (r) measures the linear relationship between two continuous variables. It ranges from -1 to +1, where +1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 means no linear relationship. You should use Pearson correlation when both variables are continuous, approximately normally distributed, and you expect a linear (not curved) relationship. It is the most commonly used correlation measure in biological and biostatistical research for measuring associations between variables like height and weight, dosage and response, or gene expression levels.

How do I interpret the p-value for Pearson correlation?

The p-value tests the null hypothesis that the true population correlation is zero (no linear relationship). If p < 0.05 (the conventional significance threshold), you reject the null hypothesis and conclude the correlation is statistically significant. However, statistical significance does not imply practical importance. With very large sample sizes, even tiny correlations (r = 0.05) can be significant. Always consider the magnitude of r alongside the p-value. In biological research, it is common to report both r and p together, and to use scatterplots to visually confirm the relationship.

What sample size do I need for meaningful Pearson correlation?

A minimum of 3 data pairs is required mathematically, but meaningful results typically require at least 20-30 pairs. For detecting moderate correlations (r around 0.5) with 80% power at alpha = 0.05, you need approximately 30 pairs. For weaker correlations (r around 0.3), you need roughly 85 pairs. In biological studies, sample sizes under 10 should be interpreted very cautiously as the correlation estimate can be highly unstable. Power analysis can help determine the exact sample size needed for your expected effect size.

What are the assumptions of Pearson correlation?

Pearson correlation requires several assumptions: (1) Both variables must be continuous and measured on interval or ratio scales. (2) The relationship between the variables should be approximately linear. (3) Both variables should be roughly normally distributed, especially for small samples. (4) Observations should be independent of each other. (5) There should be no significant outliers, as Pearson r is sensitive to extreme values. If these assumptions are violated, consider using Spearman rank correlation instead, which is more robust to non-normality and outliers.

What is the difference between correlation and causation?

Correlation measures the strength and direction of a linear relationship between two variables (r ranges from -1 to +1). Causation means one variable directly influences the other. Correlation alone cannot prove causation because confounding variables, reverse causality, or coincidence may explain the association.

Is my data stored or sent to a server?

No. All calculations run entirely in your browser using JavaScript. No data you enter is ever transmitted to any server or stored anywhere. Your inputs remain completely private.

References