Skip to main content

Scatter Plot Correlation Calculator

Calculate Pearson and Spearman correlation coefficients from paired data with significance test.

Share this calculator

Formula

r = SS_XY / sqrt(SS_XX * SS_YY)

Where r is the Pearson correlation coefficient, SS_XY = sum of (xi - x-mean)(yi - y-mean), SS_XX = sum of (xi - x-mean)^2, SS_YY = sum of (yi - y-mean)^2. Spearman rho uses the same formula applied to ranks instead of raw values.

Worked Examples

Example 1: Study Hours vs Exam Scores

Problem: Hours studied (x): 1, 2, 3, 4, 5, 6, 7, 8 and exam scores (y): 52, 58, 63, 70, 74, 80, 85, 91. Find Pearson and Spearman correlations.

Solution: n = 8, x-mean = 4.5, y-mean = 71.625\nSS_XX = 42, SS_YY = 1206.875, SS_XY = 224\nPearson r = 224 / sqrt(42 * 1206.875) = 224 / 225.17 = 0.9948\nR-squared = 0.9896 (98.96%)\nSpearman rho = 1.0 (perfect monotonic)\nt-statistic = 0.9948 * sqrt(6 / 0.0104) = 23.89, p < 0.0001

Result: Pearson r = 0.9948 | R-squared = 98.96% | Spearman rho = 1.0 | p < 0.0001

Example 2: Temperature vs Ice Cream Sales

Problem: Temperature (F): 60, 65, 70, 75, 80, 85, 90 and sales ($): 100, 125, 140, 180, 210, 260, 300. Find the correlation and regression line.

Solution: n = 7, x-mean = 75, y-mean = 187.86\nPearson r = 0.9935\nR-squared = 0.9871 (98.71%)\nSlope = 6.393, Intercept = -291.57\nRegression: y = 6.393x - 291.57\nSpearman rho = 1.0 (monotonically increasing)

Result: r = 0.9935 | y = 6.393x - 291.57 | Very strong positive correlation

Frequently Asked Questions

What is correlation and how is it measured?

Correlation measures the strength and direction of the linear relationship between two variables. It ranges from -1 to +1, where +1 indicates a perfect positive linear relationship (as one variable increases, the other increases proportionally), -1 indicates a perfect negative linear relationship (as one increases, the other decreases proportionally), and 0 indicates no linear relationship. The most common measure is the Pearson correlation coefficient (r), which quantifies linear relationships. The Spearman rank correlation coefficient (rho) measures monotonic relationships and is more robust to outliers. Correlation does not imply causation; two variables can be highly correlated due to a common third variable or coincidence.

What is the difference between Pearson and Spearman correlation?

Pearson correlation measures the strength of the linear relationship between two continuous variables, assuming both are approximately normally distributed with no extreme outliers. It uses actual data values in its calculation. Spearman rank correlation converts data to ranks first, then computes Pearson correlation on the ranks. This makes Spearman robust to outliers, applicable to ordinal data, and able to detect monotonic (consistently increasing or decreasing) relationships that may not be linear. For example, an exponential relationship y = 2^x would have Pearson r less than 1 because the relationship is not linear, but Spearman rho would be exactly 1 because the relationship is perfectly monotonic. Use Pearson when data is continuous, roughly normal, and you expect linearity. Use Spearman otherwise.

What does statistical significance of correlation mean?

Statistical significance tests whether the observed correlation is likely to have occurred by chance if the true population correlation were zero. The test uses a t-statistic calculated as t = r * sqrt((n-2) / (1-r^2)), which follows a t-distribution with n-2 degrees of freedom. A small p-value (typically below 0.05) means the correlation is statistically significant, meaning it is unlikely to be zero in the population. However, significance depends heavily on sample size: with large samples (n > 500), even tiny correlations like r = 0.10 become significant. Conversely, meaningful correlations may fail significance tests with small samples. Always report the correlation coefficient alongside the p-value, not just whether the result is significant.

What are the assumptions of Pearson correlation?

Pearson correlation requires several assumptions for valid inference. Both variables should be continuous and measured on interval or ratio scales. The relationship should be approximately linear; Pearson correlation can underestimate the strength of curvilinear relationships. Both variables should be approximately normally distributed, especially for hypothesis testing with small samples. Observations should be independent of each other. There should be no significant outliers, as a single extreme point can dramatically inflate or deflate the correlation coefficient. Homoscedasticity (equal variance of y across x values) is assumed. When these assumptions are violated, consider Spearman rank correlation, Kendall tau, or data transformations before computing Pearson correlation.

Why does correlation not imply causation?

Correlation measures association, not causation, for several important reasons. First, the relationship may be spurious, driven by a confounding variable. Ice cream sales and drowning rates are positively correlated because both increase in summer, not because ice cream causes drowning. Second, the causal direction may be reversed; we might observe correlation between X and Y when Y actually causes X. Third, there may be no causal relationship at all; with enough variables, some will correlate by pure chance (the multiple comparisons problem). Fourth, correlation measures linear association only and can miss nonlinear causal relationships. Establishing causation requires controlled experiments, temporal precedence, elimination of confounders, and theoretical justification.

How does sample size affect correlation results?

Sample size affects correlation results in multiple ways. With very small samples (n less than 10), correlation estimates are unstable and can be misleadingly high or low just by chance. Confidence intervals around the correlation are wide, making precise estimation difficult. As sample size increases, correlation estimates become more stable and confidence intervals narrow. For significance testing, larger samples make it easier to detect small correlations; a correlation of r = 0.10 is not significant with n = 30 but is highly significant with n = 1000. The recommended minimum sample size for meaningful correlation analysis is typically n = 30 or more. For detecting small effects (r around 0.10 to 0.20), sample sizes of several hundred may be needed.

References