Scatter Plot Correlation Calculator
Calculate Pearson and Spearman correlation coefficients from paired data with significance test.
Calculator
Adjust values & calculateDescriptive Statistics
Data Points and Ranks
| X | Y | X Rank | Y Rank | Predicted Y |
|---|---|---|---|---|
| 1 | 2.3 | 1 | 1 | 1.9382 |
| 2 | 4.1 | 2 | 2 | 3.9697 |
| 3 | 5.5 | 3 | 3 | 6.0012 |
| 4 | 7.8 | 4 | 4 | 8.0327 |
| 5 | 10.2 | 5 | 5 | 10.0642 |
| 6 | 12 | 6 | 6 | 12.0958 |
| 7 | 14.5 | 7 | 7 | 14.1273 |
| 8 | 15.8 | 8 | 8 | 16.1588 |
| 9 | 18.1 | 9 | 9 | 18.1903 |
| 10 | 20.5 | 10 | 10 | 20.2218 |
Formula
Where r is the Pearson correlation coefficient, SS_XY = sum of (xi - x-mean)(yi - y-mean), SS_XX = sum of (xi - x-mean)^2, SS_YY = sum of (yi - y-mean)^2. Spearman rho uses the same formula applied to ranks instead of raw values.
Last reviewed: December 2025
Worked Examples
Example 1: Study Hours vs Exam Scores
Example 2: Temperature vs Ice Cream Sales
Background & Theory
The Scatter Plot Correlation Calculator applies the following established principles and formulas. Mathematics rests on a hierarchy of number systems, each extending the previous. The natural numbers (1, 2, 3, ...) support counting and ordering. The integers add negative values and zero, enabling subtraction without restriction. The rational numbers, expressible as p/q where p and q are integers and q is nonzero, close the system under division. The real numbers fill the gaps left by irrationals such as the square root of 2 or pi, forming a complete ordered field. The complex numbers, written as a + bi where i is the square root of negative one, complete the algebraic closure of the reals and allow every polynomial to have a root. Prime factorization states that every integer greater than one is uniquely expressible as a product of primes, a result known as the Fundamental Theorem of Arithmetic. Computing the greatest common divisor (GCD) of two integers relies most efficiently on the Euclidean algorithm: repeatedly replace the larger number with the remainder when it is divided by the smaller, until the remainder is zero. The last nonzero remainder is the GCD. The least common multiple (LCM) follows from the identity LCM(a, b) = |a * b| / GCD(a, b). Modular arithmetic defines equivalence classes of integers that share the same remainder under division by a modulus n. Fermat's Little Theorem and Euler's Theorem arise from this structure and underpin modern cryptography. Logarithms are the inverses of exponential functions. If b raised to the power x equals y, then the logarithm base b of y equals x. The natural logarithm uses base e, approximately 2.71828. Combinatorics counts arrangements and selections. The number of ordered arrangements (permutations) of r objects from n distinct objects is nPr = n! / (n - r)!. The number of unordered selections (combinations) is nCr = n! / (r! * (n - r)!). Pascal's triangle arranges these binomial coefficients so that each entry equals the sum of the two entries directly above it. The Fibonacci sequence, defined by F(1) = 1, F(2) = 1, and F(n) = F(n-1) + F(n-2), appears throughout nature and connects deeply to the golden ratio via Binet's formula.
History
The history behind the Scatter Plot Correlation Calculator traces back through the following developments. Mathematics as a systematic discipline traces to ancient Mesopotamia. Babylonian clay tablets dating to around 1800 BCE demonstrate knowledge of quadratic equations, Pythagorean triples, and base-60 arithmetic, suggesting a practical mathematical tradition far preceding Greek formalism. Euclid of Alexandria compiled the Elements around 300 BCE, establishing the axiomatic method that would define rigorous mathematics for over two thousand years. His work organized plane geometry, number theory, and proportion into logically chained propositions derived from a small set of postulates. The algorithm bearing his name for computing GCDs appears in Book VII and remains in use today. In the 9th century, the Persian scholar Muhammad ibn Musa Al-Khwarizmi wrote Al-Kitab al-mukhtasar fi hisab al-jabr wal-muqabala, the treatise whose title gave algebra its name. He systematized the solution of linear and quadratic equations and described procedures that operated on unknowns as objects, a conceptual leap away from purely numerical calculation. Rene Descartes introduced coordinate geometry in 1637 by uniting algebra and Euclidean geometry, allowing curves to be studied through equations. This synthesis set the stage for calculus. Isaac Newton and Gottfried Wilhelm Leibniz independently developed calculus during the 1660s and 1670s, triggering a priority dispute that lasted decades and divided British and Continental mathematicians. Carl Friedrich Gauss proved the Fundamental Theorem of Algebra in 1799, showing that every nonconstant polynomial has at least one complex root. His Disquisitiones Arithmeticae of 1801 established modern number theory. David Hilbert's formalist program at the turn of the 20th century sought to place all of mathematics on an explicit axiomatic foundation, a project that Kurt Godel's incompleteness theorems of 1931 showed to be fundamentally limited. Alan Turing's work in the 1930s on computability introduced the theoretical model of the stored-program computer and linked mathematical logic directly to the limits of algorithmic calculation. His proof that no algorithm can decide in general whether an arbitrary program will halt or run forever placed fundamental boundaries on what mathematics can mechanically determine, and it opened the discipline now known as theoretical computer science.
Frequently Asked Questions
Formula
r = SS_XY / sqrt(SS_XX * SS_YY)
Where r is the Pearson correlation coefficient, SS_XY = sum of (xi - x-mean)(yi - y-mean), SS_XX = sum of (xi - x-mean)^2, SS_YY = sum of (yi - y-mean)^2. Spearman rho uses the same formula applied to ranks instead of raw values.
Worked Examples
Example 1: Study Hours vs Exam Scores
Problem: Hours studied (x): 1, 2, 3, 4, 5, 6, 7, 8 and exam scores (y): 52, 58, 63, 70, 74, 80, 85, 91. Find Pearson and Spearman correlations.
Solution: n = 8, x-mean = 4.5, y-mean = 71.625\nSS_XX = 42, SS_YY = 1206.875, SS_XY = 224\nPearson r = 224 / sqrt(42 * 1206.875) = 224 / 225.17 = 0.9948\nR-squared = 0.9896 (98.96%)\nSpearman rho = 1.0 (perfect monotonic)\nt-statistic = 0.9948 * sqrt(6 / 0.0104) = 23.89, p < 0.0001
Result: Pearson r = 0.9948 | R-squared = 98.96% | Spearman rho = 1.0 | p < 0.0001
Example 2: Temperature vs Ice Cream Sales
Problem: Temperature (F): 60, 65, 70, 75, 80, 85, 90 and sales ($): 100, 125, 140, 180, 210, 260, 300. Find the correlation and regression line.
Solution: n = 7, x-mean = 75, y-mean = 187.86\nPearson r = 0.9935\nR-squared = 0.9871 (98.71%)\nSlope = 6.393, Intercept = -291.57\nRegression: y = 6.393x - 291.57\nSpearman rho = 1.0 (monotonically increasing)
Result: r = 0.9935 | y = 6.393x - 291.57 | Very strong positive correlation
Frequently Asked Questions
What is correlation and how is it measured?
Correlation measures the strength and direction of the linear relationship between two variables. It ranges from -1 to +1, where +1 indicates a perfect positive linear relationship (as one variable increases, the other increases proportionally), -1 indicates a perfect negative linear relationship (as one increases, the other decreases proportionally), and 0 indicates no linear relationship. The most common measure is the Pearson correlation coefficient (r), which quantifies linear relationships. The Spearman rank correlation coefficient (rho) measures monotonic relationships and is more robust to outliers. Correlation does not imply causation; two variables can be highly correlated due to a common third variable or coincidence.
What is the difference between Pearson and Spearman correlation?
Pearson correlation measures the strength of the linear relationship between two continuous variables, assuming both are approximately normally distributed with no extreme outliers. It uses actual data values in its calculation. Spearman rank correlation converts data to ranks first, then computes Pearson correlation on the ranks. This makes Spearman robust to outliers, applicable to ordinal data, and able to detect monotonic (consistently increasing or decreasing) relationships that may not be linear. For example, an exponential relationship y = 2^x would have Pearson r less than 1 because the relationship is not linear, but Spearman rho would be exactly 1 because the relationship is perfectly monotonic. Use Pearson when data is continuous, roughly normal, and you expect linearity. Use Spearman otherwise.
What does statistical significance of correlation mean?
Statistical significance tests whether the observed correlation is likely to have occurred by chance if the true population correlation were zero. The test uses a t-statistic calculated as t = r * sqrt((n-2) / (1-r^2)), which follows a t-distribution with n-2 degrees of freedom. A small p-value (typically below 0.05) means the correlation is statistically significant, meaning it is unlikely to be zero in the population. However, significance depends heavily on sample size: with large samples (n > 500), even tiny correlations like r = 0.10 become significant. Conversely, meaningful correlations may fail significance tests with small samples. Always report the correlation coefficient alongside the p-value, not just whether the result is significant.
What are the assumptions of Pearson correlation?
Pearson correlation requires several assumptions for valid inference. Both variables should be continuous and measured on interval or ratio scales. The relationship should be approximately linear; Pearson correlation can underestimate the strength of curvilinear relationships. Both variables should be approximately normally distributed, especially for hypothesis testing with small samples. Observations should be independent of each other. There should be no significant outliers, as a single extreme point can dramatically inflate or deflate the correlation coefficient. Homoscedasticity (equal variance of y across x values) is assumed. When these assumptions are violated, consider Spearman rank correlation, Kendall tau, or data transformations before computing Pearson correlation.
Why does correlation not imply causation?
Correlation measures association, not causation, for several important reasons. First, the relationship may be spurious, driven by a confounding variable. Ice cream sales and drowning rates are positively correlated because both increase in summer, not because ice cream causes drowning. Second, the causal direction may be reversed; we might observe correlation between X and Y when Y actually causes X. Third, there may be no causal relationship at all; with enough variables, some will correlate by pure chance (the multiple comparisons problem). Fourth, correlation measures linear association only and can miss nonlinear causal relationships. Establishing causation requires controlled experiments, temporal precedence, elimination of confounders, and theoretical justification.
How does sample size affect correlation results?
Sample size affects correlation results in multiple ways. With very small samples (n less than 10), correlation estimates are unstable and can be misleadingly high or low just by chance. Confidence intervals around the correlation are wide, making precise estimation difficult. As sample size increases, correlation estimates become more stable and confidence intervals narrow. For significance testing, larger samples make it easier to detect small correlations; a correlation of r = 0.10 is not significant with n = 30 but is highly significant with n = 1000. The recommended minimum sample size for meaningful correlation analysis is typically n = 30 or more. For detecting small effects (r around 0.10 to 0.20), sample sizes of several hundred may be needed.
References
Reviewed by Manoj Kumar, Mathematics Educator ยท Editorial policy