Pearson correlation coefficient
Spearman’s correlation
Kendall’s \(\tau\) correlation
The correlation coefficient and its variants measure the strength and direction of relationship. Their values vary between +1 and -1. A value of ± 1 indicates a perfect degree of association between two variables. The relationship between the two variables tends to be weaker as the correlation value goes to 0.
For comparisons in details between different correlations, please see Edwin van den Heuvel & Zhuozhao Zhan (2022).
Below are plots of 4 data sets: The left - top (a. linear relationship between time and score); the right -top (b. nonlinear relationship); the left - bottom(c. linear relationship + one outlier); the right bottom (d. nonlinear relationship, using R data set “pressure”)
R code for calculation of the different correlations:
Download the data for plot a, and load the data … , suppose the data for (a) in loaded into data frame “dat” in R and data for (d) is loaded into “dat1”.
#pearson correlation
cor(dat$time, dat$score, method = "pearson")
## [1] 0.951577
#spearman correlation
cor(dat$time, dat$score, method = "spearman")
## [1] 0.9537421
#kendall's $\tau$ correlation
cor(dat$time, dat$score, method = "kendall")
## [1] 0.8390531
For data set show in a: according to its scatter plot (linear, and no outliers), the pearson correlation is the best choice. The sample pearson correlation is 0.95.
For data set show in d: according to its scatter plot (nonlinear - montoic relationship ), the pearson correlation may not be a good choice. We use spearman correlation or kendall’s-tau. The sample spearman correlation is 0.95. The sample kendall’s-tau is 0.84.
A general way of a null hypothesis for a correlation is:
H0: No [monotonic, linear, …] association between the two variables. (\(H_0: \rho =0\))
H1: Two variables are related in a [monotonic, linear, …] way (\(H_1: \rho \ne 0\))
Note: for theses correlation tests, a significant result (p-value < the level of significance) does not necessarily indicate the strength of the association. For example, a situation where a p-value = 0.001, may not have a stronger association than a situation with a p-value value of p = 0.04.
Example: Use the R cars data. Test if dist and speed are related.
plot( cars$speed, cars$dist, xlab="speed", ylab = "dist")
Two quantitative variables are linearly related and there are no outliers. The pearson correaltion shall be ok for the problem.
rout = cor.test(cars$speed, cars$dist)
rout
##
## Pearson's product-moment correlation
##
## data: cars$speed and cars$dist
## t = 9.464, df = 48, p-value = 1.49e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.6816422 0.8862036
## sample estimates:
## cor
## 0.8068949
R codes for other similar cases:
# Test if dist and speed are positively related based on the kendall tau correlation.
cor.test(cars$speed, cars$dist, alternative = "greater",
method = "kendall")
##
## Kendall's rank correlation tau
##
## data: cars$speed and cars$dist
## z = 6.6655, p-value = 1.319e-11
## alternative hypothesis: true tau is greater than 0
## sample estimates:
## tau
## 0.6689901
# Test if dist and speed are positively related based on the spearman correlation.
cor.test(cars$speed, cars$dist, alternative = "greater",
method = "spearman")
## Warning in cor.test.default(cars$speed, cars$dist, alternative = "greater", :
## Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: cars$speed and cars$dist
## S = 3532.8, p-value = 4.412e-14
## alternative hypothesis: true rho is greater than 0
## sample estimates:
## rho
## 0.8303568
Reference:
Edwin van den Heuvel & Zhuozhao Zhan (2022) Myths About Linear and Monotonic Associations: Pearson’s r, Spearman’s ρ, and Kendall’s τ, The American Statistician, 76:1, 44-52, DOI: 10.1080/00031305.2021.2004922