Sameer Mathur
In bivariate distribution, we may be interested to find out if there is any correlation or covariation between the two variables under study. If the change in one variable affects a change in the other variable, the variables are said to be correlated.
For example, the correlation between the heights and weights of a group of persons is positive and the correlation between price and demand of commodity is negative
Pearson, Spearman and Kendall
We will use the state.x77 dataset available in the base R installation. It provides data on the following for 50 US states in 1977.
For data description column please visit Data Description
states<- state.x77[,1:6] # all rows and 1-6 columns
library(psych)
describe(states)[, c(1:5, 7:9)] # selected columns
vars n mean sd median mad min max
Population 1 50 4246.42 4464.49 2838.50 2890.33 365.00 21198.0
Income 2 50 4435.80 614.47 4519.00 581.18 3098.00 6315.0
Illiteracy 3 50 1.17 0.61 0.95 0.52 0.50 2.8
Life Exp 4 50 70.88 1.34 70.67 1.54 67.96 73.6
Murder 5 50 7.38 3.69 6.85 5.19 1.40 15.1
HS Grad 6 50 53.11 8.08 53.25 8.60 37.80 67.3
Variances and Covariances
# variance-covariance matrix stored in 'matrix'
matrix1 <- cov(states)
# round upto 2 decimal places
round(matrix1, 2)
Population Income Illiteracy Life Exp Murder HS Grad
Population 19931683.76 571229.78 292.87 -407.84 5663.52 -3551.51
Income 571229.78 377573.31 -163.70 280.66 -521.89 3076.77
Illiteracy 292.87 -163.70 0.37 -0.48 1.58 -3.24
Life Exp -407.84 280.66 -0.48 1.80 -3.87 6.31
Murder 5663.52 -521.89 1.58 -3.87 13.63 -14.55
HS Grad -3551.51 3076.77 -3.24 6.31 -14.55 65.24
Pearson product-moment correlation coefficients
# Pearson product-moment correlation coefficients stored in 'matrix2'
matrix2 <- cor(states)
# round upto 2 decimal places
round(matrix2, 2)
Population Income Illiteracy Life Exp Murder HS Grad
Population 1.00 0.21 0.11 -0.07 0.34 -0.10
Income 0.21 1.00 -0.44 0.34 -0.23 0.62
Illiteracy 0.11 -0.44 1.00 -0.59 0.70 -0.66
Life Exp -0.07 0.34 -0.59 1.00 -0.78 0.58
Murder 0.34 -0.23 0.70 -0.78 1.00 -0.49
HS Grad -0.10 0.62 -0.66 0.58 -0.49 1.00
Spearman rank-order correlation coefficients.
# Spearman rank-order correlation coefficients stored in 'matrix3'
matrix3 <- cor(states, method="spearman")
# round upto 2 decimal places
round(matrix3, 2)
Population Income Illiteracy Life Exp Murder HS Grad
Population 1.00 0.12 0.31 -0.10 0.35 -0.38
Income 0.12 1.00 -0.31 0.32 -0.22 0.51
Illiteracy 0.31 -0.31 1.00 -0.56 0.67 -0.65
Life Exp -0.10 0.32 -0.56 1.00 -0.78 0.52
Murder 0.35 -0.22 0.67 -0.78 1.00 -0.44
HS Grad -0.38 0.51 -0.65 0.52 -0.44 1.00
We can see that, a strong positice correlation exists between income and high school graduation rate and that a strong negative correlation exists between illiteracy rates and life expectancy.
# Population, Income, Illiteracy and HS Grad as rows
x <- states[,c("Population", "Income", "Illiteracy", "HS Grad")]
# Life Exp and Murder as columns
y <- states[,c("Life Exp", "Murder")]
# non-square correlation matrix stored in 'matrix4'
matrix4 <- cor(x,y)
# round upto 2 decimal places
round(matrix4, 2)
Life Exp Murder
Population -0.07 0.34
Income 0.34 -0.23
Illiteracy -0.59 0.70
HS Grad 0.58 -0.49
This is useful when we are interested in the relationship between one set of variables and another.
# correlation test between Illiteracy(3rd column) and Life Exp (5th column)
cor.test(states[,3], states[,5])
Pearson's product-moment correlation
data: states[, 3] and states[, 5]
t = 6.8479, df = 48, p-value = 1.258e-08
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.5279280 0.8207295
sample estimates:
cor
0.7029752
It tests the null hypothesis that the Pearson correlation between life expectancy and the murder rate is 0.
library(psych)
corr.test(states, use="complete")
We can test only one correlation at a time using cor.test(), but the corr.test() function provided in the psych package allows you to test in more at a time.
The corr.test() function produces correlations and significance levels for matrices of Pearson, Spearman and Kendall correlations.
Call:corr.test(x = states, use = "complete")
Correlation matrix
Population Income Illiteracy Life Exp Murder HS Grad
Population 1.00 0.21 0.11 -0.07 0.34 -0.10
Income 0.21 1.00 -0.44 0.34 -0.23 0.62
Illiteracy 0.11 -0.44 1.00 -0.59 0.70 -0.66
Life Exp -0.07 0.34 -0.59 1.00 -0.78 0.58
Murder 0.34 -0.23 0.70 -0.78 1.00 -0.49
HS Grad -0.10 0.62 -0.66 0.58 -0.49 1.00
Sample Size
[1] 50
Probability values (Entries above the diagonal are adjusted for multiple tests.)
Population Income Illiteracy Life Exp Murder HS Grad
Population 0.00 0.59 1.00 1.0 0.10 1
Income 0.15 0.00 0.01 0.1 0.54 0
Illiteracy 0.46 0.00 0.00 0.0 0.00 0
Life Exp 0.64 0.02 0.00 0.0 0.00 0
Murder 0.01 0.11 0.00 0.0 0.00 0
HS Grad 0.50 0.00 0.00 0.0 0.00 0
To see confidence intervals of the correlations, print with the short=FALSE option
The rcorr() function in the Hmisc package produces correlations/covariances and significance levels for pearson and spearman correlations.
library(Hmisc)
rcorr(states, type="pearson") # type can be pearson or spearman
Population Income Illiteracy Life Exp Murder HS Grad
Population 1.00 0.21 0.11 -0.07 0.34 -0.10
Income 0.21 1.00 -0.44 0.34 -0.23 0.62
Illiteracy 0.11 -0.44 1.00 -0.59 0.70 -0.66
Life Exp -0.07 0.34 -0.59 1.00 -0.78 0.58
Murder 0.34 -0.23 0.70 -0.78 1.00 -0.49
HS Grad -0.10 0.62 -0.66 0.58 -0.49 1.00
n= 50
P
Population Income Illiteracy Life Exp Murder HS Grad
Population 0.1467 0.4569 0.6387 0.0146 0.4962
Income 0.1467 0.0015 0.0156 0.1080 0.0000
Illiteracy 0.4569 0.0015 0.0000 0.0000 0.0000
Life Exp 0.6387 0.0156 0.0000 0.0000 0.0000
Murder 0.0146 0.1080 0.0000 0.0000 0.0003
HS Grad 0.4962 0.0000 0.0000 0.0000 0.0003