Sameer Mathur
In bivariate distribution, we may be interested to find out if there is any correlation or covariation between the two variables under study. If the change in one variable affects a change in the other variable, the variables are said to be correlated.
For example, the correlation between the heights and weights of a group of persons is positive and the correlation between price and demand of commodity is negative
Pearson, Spearman and Kendall
We will use the mtcars dataset available in the base R installation.
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).
It has 32 observations on 11 variables. See mtcars data description
attach(mtcars) # attach the data
library(psych)
describe(mtcars)[, c(1:5, 7:9)] # selected columns
vars n mean sd median mad min max
mpg 1 32 20.09 6.03 19.20 5.41 10.40 33.90
cyl 2 32 6.19 1.79 6.00 2.97 4.00 8.00
disp 3 32 230.72 123.94 196.30 140.48 71.10 472.00
hp 4 32 146.69 68.56 123.00 77.10 52.00 335.00
drat 5 32 3.60 0.53 3.70 0.70 2.76 4.93
wt 6 32 3.22 0.98 3.33 0.77 1.51 5.42
qsec 7 32 17.85 1.79 17.71 1.42 14.50 22.90
vs 8 32 0.44 0.50 0.00 0.00 0.00 1.00
am 9 32 0.41 0.50 0.00 0.00 0.00 1.00
gear 10 32 3.69 0.74 4.00 1.48 3.00 5.00
carb 11 32 2.81 1.62 2.00 1.48 1.00 8.00
# variance-covariance matrix
covmt <- cov(mtcars[,c(1:7)])
round(covmt,2)
mpg cyl disp hp drat wt qsec
mpg 36.32 -9.17 -633.10 -320.73 2.20 -5.12 4.51
cyl -9.17 3.19 199.66 101.93 -0.67 1.37 -1.89
disp -633.10 199.66 15360.80 6721.16 -47.06 107.68 -96.05
hp -320.73 101.93 6721.16 4700.87 -16.45 44.19 -86.77
drat 2.20 -0.67 -47.06 -16.45 0.29 -0.37 0.09
wt -5.12 1.37 107.68 44.19 -0.37 0.96 -0.31
qsec 4.51 -1.89 -96.05 -86.77 0.09 -0.31 3.19
# mpg, cyl, and disp as rows
x <- mtcars[1:3]
# hp, drat, and wt as columns
y <- mtcars[4:6]
z <- cor(x, y)
round(z,2)
hp drat wt
mpg -0.78 0.68 -0.87
cyl 0.83 -0.70 0.78
disp 0.79 -0.71 0.89
Pearson product-moment correlation coefficients
# Pearson - correlation matrix - all rows and columns
cormt <- cor(mtcars)
round(cormt,2)
mpg cyl disp hp drat wt qsec vs am gear carb
mpg 1.00 -0.85 -0.85 -0.78 0.68 -0.87 0.42 0.66 0.60 0.48 -0.55
cyl -0.85 1.00 0.90 0.83 -0.70 0.78 -0.59 -0.81 -0.52 -0.49 0.53
disp -0.85 0.90 1.00 0.79 -0.71 0.89 -0.43 -0.71 -0.59 -0.56 0.39
hp -0.78 0.83 0.79 1.00 -0.45 0.66 -0.71 -0.72 -0.24 -0.13 0.75
drat 0.68 -0.70 -0.71 -0.45 1.00 -0.71 0.09 0.44 0.71 0.70 -0.09
wt -0.87 0.78 0.89 0.66 -0.71 1.00 -0.17 -0.55 -0.69 -0.58 0.43
qsec 0.42 -0.59 -0.43 -0.71 0.09 -0.17 1.00 0.74 -0.23 -0.21 -0.66
vs 0.66 -0.81 -0.71 -0.72 0.44 -0.55 0.74 1.00 0.17 0.21 -0.57
am 0.60 -0.52 -0.59 -0.24 0.71 -0.69 -0.23 0.17 1.00 0.79 0.06
gear 0.48 -0.49 -0.56 -0.13 0.70 -0.58 -0.21 0.21 0.79 1.00 0.27
carb -0.55 0.53 0.39 0.75 -0.09 0.43 -0.66 -0.57 0.06 0.27 1.00
Spearman rank-order correlation coefficients.
# Spearman - correlation matrix
s <- cor(mtcars, method="spearman")
round(s,2)
mpg cyl disp hp drat wt qsec vs am gear carb
mpg 1.00 -0.91 -0.91 -0.89 0.65 -0.89 0.47 0.71 0.56 0.54 -0.66
cyl -0.91 1.00 0.93 0.90 -0.68 0.86 -0.57 -0.81 -0.52 -0.56 0.58
disp -0.91 0.93 1.00 0.85 -0.68 0.90 -0.46 -0.72 -0.62 -0.59 0.54
hp -0.89 0.90 0.85 1.00 -0.52 0.77 -0.67 -0.75 -0.36 -0.33 0.73
drat 0.65 -0.68 -0.68 -0.52 1.00 -0.75 0.09 0.45 0.69 0.74 -0.13
wt -0.89 0.86 0.90 0.77 -0.75 1.00 -0.23 -0.59 -0.74 -0.68 0.50
qsec 0.47 -0.57 -0.46 -0.67 0.09 -0.23 1.00 0.79 -0.20 -0.15 -0.66
vs 0.71 -0.81 -0.72 -0.75 0.45 -0.59 0.79 1.00 0.17 0.28 -0.63
am 0.56 -0.52 -0.62 -0.36 0.69 -0.74 -0.20 0.17 1.00 0.81 -0.06
gear 0.54 -0.56 -0.59 -0.33 0.74 -0.68 -0.15 0.28 0.81 1.00 0.11
carb -0.66 0.58 0.54 0.73 -0.13 0.50 -0.66 -0.63 -0.06 0.11 1.00
# four rows in x
x <- mtcars[,c("mpg", "disp", "hp", "drat")]
# two columns in y
y <- mtcars[,c("vs", "carb")]
z <- cor(x,y)
round(z,2)
vs carb
mpg 0.66 -0.55
disp -0.71 0.39
hp -0.72 0.75
drat 0.44 -0.09
This is useful when we are interested in the relationship between one set of variables and another.
# correlation test between mpg and wt
cor.test(mpg, wt)
Pearson's product-moment correlation
data: mpg and wt
t = -9.559, df = 30, p-value = 1.294e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.9338264 -0.7440872
sample estimates:
cor
-0.8676594
# correlation test between disp(3rd column) and drat (5th column)
cor.test(mtcars[,3], mtcars[,5])
Pearson's product-moment correlation
data: mtcars[, 3] and mtcars[, 5]
t = -5.5257, df = 30, p-value = 5.282e-06
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.8487237 -0.4805193
sample estimates:
cor
-0.7102139
It tests the null hypothesis that the Pearson correlation between disp and drat is 0.
library(psych)
corr.test(mtcars[,c(1,6:9)], use="complete")
We can test only one correlation at a time using cor.test(), but the corr.test() function provided in the psych package allows you to test more than one at a time.
Call:corr.test(x = mtcars[, c(1, 6:9)], use = "complete")
Correlation matrix
mpg wt qsec vs am
mpg 1.00 -0.87 0.42 0.66 0.60
wt -0.87 1.00 -0.17 -0.55 -0.69
qsec 0.42 -0.17 1.00 0.74 -0.23
vs 0.66 -0.55 0.74 1.00 0.17
am 0.60 -0.69 -0.23 0.17 1.00
Sample Size
[1] 32
Probability values (Entries above the diagonal are adjusted for multiple tests.)
mpg wt qsec vs am
mpg 0.00 0.00 0.07 0.00 0.00
wt 0.00 0.00 0.68 0.00 0.00
qsec 0.02 0.34 0.00 0.00 0.62
vs 0.00 0.00 0.00 0.00 0.68
am 0.00 0.00 0.21 0.36 0.00
To see confidence intervals of the correlations, print with the short=FALSE option
The rcorr() function in the Hmisc package produces correlations/covariances and significance levels for pearson and spearman correlations.
library(Hmisc)
rcorr(as.matrix(mtcars[,c(1,6:9)]), type="pearson") # type can be pearson or spearman
mpg wt qsec vs am
mpg 1.00 -0.87 0.42 0.66 0.60
wt -0.87 1.00 -0.17 -0.55 -0.69
qsec 0.42 -0.17 1.00 0.74 -0.23
vs 0.66 -0.55 0.74 1.00 0.17
am 0.60 -0.69 -0.23 0.17 1.00
n= 32
P
mpg wt qsec vs am
mpg 0.0000 0.0171 0.0000 0.0003
wt 0.0000 0.3389 0.0010 0.0000
qsec 0.0171 0.3389 0.0000 0.2057
vs 0.0000 0.0010 0.0000 0.3570
am 0.0003 0.0000 0.2057 0.3570
library(corrgram)
corrgram(mtcars[,c(1,6:9)], order=FALSE, lower.panel=panel.conf,
upper.panel=panel.pie, text.panel=panel.txt,
main="Corrgram - mtcars")