In bivariate distribution, we may be interested to find out if there is any correlation or covariation between the two variables under study. If the change in one variable affects a change in the other variable, the variables are said to be correlated.
For example, the correlation between the heights and weights of a group of persons is positive and the correlation between price and demand of commodity is negative
Pearson, Spearman and Kendall
We will use the state.x77 dataset available in the base R installation. It provides data on the following for 50 US states in 1977. * population * income * illiteracy rate * life expectancy * murder rate and * high school graduation rate
states<- state.x77[,1:6]
library(psych)
describe(states)[, c(1:5, 7:9)] # selected columns
## vars n mean sd median mad min max
## Population 1 50 4246.42 4464.49 2838.50 2890.33 365.00 21198.0
## Income 2 50 4435.80 614.47 4519.00 581.18 3098.00 6315.0
## Illiteracy 3 50 1.17 0.61 0.95 0.52 0.50 2.8
## Life Exp 4 50 70.88 1.34 70.67 1.54 67.96 73.6
## Murder 5 50 7.38 3.69 6.85 5.19 1.40 15.1
## HS Grad 6 50 53.11 8.08 53.25 8.60 37.80 67.3
Variances and Covariances
cov(states)
## Population Income Illiteracy Life Exp Murder
## Population 19931683.7588 571229.7796 292.8679592 -407.8424612 5663.523714
## Income 571229.7796 377573.3061 -163.7020408 280.6631837 -521.894286
## Illiteracy 292.8680 -163.7020 0.3715306 -0.4815122 1.581776
## Life Exp -407.8425 280.6632 -0.4815122 1.8020204 -3.869480
## Murder 5663.5237 -521.8943 1.5817755 -3.8694804 13.627465
## HS Grad -3551.5096 3076.7690 -3.2354694 6.3126849 -14.549616
## HS Grad
## Population -3551.509551
## Income 3076.768980
## Illiteracy -3.235469
## Life Exp 6.312685
## Murder -14.549616
## HS Grad 65.237894
Pearson product-moment correlation coefficients
cor(states)
## Population Income Illiteracy Life Exp Murder
## Population 1.00000000 0.2082276 0.1076224 -0.06805195 0.3436428
## Income 0.20822756 1.0000000 -0.4370752 0.34025534 -0.2300776
## Illiteracy 0.10762237 -0.4370752 1.0000000 -0.58847793 0.7029752
## Life Exp -0.06805195 0.3402553 -0.5884779 1.00000000 -0.7808458
## Murder 0.34364275 -0.2300776 0.7029752 -0.78084575 1.0000000
## HS Grad -0.09848975 0.6199323 -0.6571886 0.58221620 -0.4879710
## HS Grad
## Population -0.09848975
## Income 0.61993232
## Illiteracy -0.65718861
## Life Exp 0.58221620
## Murder -0.48797102
## HS Grad 1.00000000
Spearman rank-order correlation coefficients.
cor(states, method="spearman")
## Population Income Illiteracy Life Exp Murder
## Population 1.0000000 0.1246098 0.3130496 -0.1040171 0.3457401
## Income 0.1246098 1.0000000 -0.3145948 0.3241050 -0.2174623
## Illiteracy 0.3130496 -0.3145948 1.0000000 -0.5553735 0.6723592
## Life Exp -0.1040171 0.3241050 -0.5553735 1.0000000 -0.7802406
## Murder 0.3457401 -0.2174623 0.6723592 -0.7802406 1.0000000
## HS Grad -0.3833649 0.5104809 -0.6545396 0.5239410 -0.4367330
## HS Grad
## Population -0.3833649
## Income 0.5104809
## Illiteracy -0.6545396
## Life Exp 0.5239410
## Murder -0.4367330
## HS Grad 1.0000000
We can see that, a strong positice correlation exists between income and high school graduation rate and that a strong negative correlation exists between illiteracy rates and life expectancy.
x <- states[,c("Population", "Income", "Illiteracy", "HS Grad")]
y <- states[,c("Life Exp", "Murder")]
cor(x,y)
## Life Exp Murder
## Population -0.06805195 0.3436428
## Income 0.34025534 -0.2300776
## Illiteracy -0.58847793 0.7029752
## HS Grad 0.58221620 -0.4879710
This is useful when we are interested in the relationship between one set of variables and another.
cor.test(states[,3], states[,5])
##
## Pearson's product-moment correlation
##
## data: states[, 3] and states[, 5]
## t = 6.8479, df = 48, p-value = 1.258e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5279280 0.8207295
## sample estimates:
## cor
## 0.7029752
It tests the null hypothesis that the Pearson correlation between life expectancy and the murder rate is 0.
library(psych)
corr.test(states, use="complete")
We can test only one correlation at a time using cor.test(), but the corr.test() function provided in the psych package allows you to test in more at a time. The corr.test() function produces correlations and significance levels for matrices of Pearson, Spearman and Kendall correlations.
## Call:corr.test(x = states, use = "complete")
## Correlation matrix
## Population Income Illiteracy Life Exp Murder HS Grad
## Population 1.00 0.21 0.11 -0.07 0.34 -0.10
## Income 0.21 1.00 -0.44 0.34 -0.23 0.62
## Illiteracy 0.11 -0.44 1.00 -0.59 0.70 -0.66
## Life Exp -0.07 0.34 -0.59 1.00 -0.78 0.58
## Murder 0.34 -0.23 0.70 -0.78 1.00 -0.49
## HS Grad -0.10 0.62 -0.66 0.58 -0.49 1.00
## Sample Size
## [1] 50
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## Population Income Illiteracy Life Exp Murder HS Grad
## Population 0.00 0.59 1.00 1.0 0.10 1
## Income 0.15 0.00 0.01 0.1 0.54 0
## Illiteracy 0.46 0.00 0.00 0.0 0.00 0
## Life Exp 0.64 0.02 0.00 0.0 0.00 0
## Murder 0.01 0.11 0.00 0.0 0.00 0
## HS Grad 0.50 0.00 0.00 0.0 0.00 0
##
## To see confidence intervals of the correlations, print with the short=FALSE option
The rcorr() function in the Hmisc package produces correlations/covariances and significance levels for pearson and spearman correlations.
library(Hmisc)
rcorr(states, type="pearson") # type can be pearson or spearman
## Population Income Illiteracy Life Exp Murder HS Grad
## Population 1.00 0.21 0.11 -0.07 0.34 -0.10
## Income 0.21 1.00 -0.44 0.34 -0.23 0.62
## Illiteracy 0.11 -0.44 1.00 -0.59 0.70 -0.66
## Life Exp -0.07 0.34 -0.59 1.00 -0.78 0.58
## Murder 0.34 -0.23 0.70 -0.78 1.00 -0.49
## HS Grad -0.10 0.62 -0.66 0.58 -0.49 1.00
##
## n= 50
##
##
## P
## Population Income Illiteracy Life Exp Murder HS Grad
## Population 0.1467 0.4569 0.6387 0.0146 0.4962
## Income 0.1467 0.0015 0.0156 0.1080 0.0000
## Illiteracy 0.4569 0.0015 0.0000 0.0000 0.0000
## Life Exp 0.6387 0.0156 0.0000 0.0000 0.0000
## Murder 0.0146 0.1080 0.0000 0.0000 0.0003
## HS Grad 0.4962 0.0000 0.0000 0.0000 0.0003