Correlations

In bivariate distribution, we may be interested to find out if there is any correlation or covariation between the two variables under study. If the change in one variable affects a change in the other variable, the variables are said to be correlated.

For example, the correlation between the heights and weights of a group of persons is positive and the correlation between price and demand of commodity is negative

Types of Correlations

Pearson, Spearman and Kendall

State dataset

We will use the state.x77 dataset available in the base R installation. It provides data on the following for 50 US states in 1977. * population * income * illiteracy rate * life expectancy * murder rate and * high school graduation rate

Summary of the State dataset

states<- state.x77[,1:6]
library(psych)
describe(states)[, c(1:5, 7:9)]  # selected columns
##            vars  n    mean      sd  median     mad     min     max
## Population    1 50 4246.42 4464.49 2838.50 2890.33  365.00 21198.0
## Income        2 50 4435.80  614.47 4519.00  581.18 3098.00  6315.0
## Illiteracy    3 50    1.17    0.61    0.95    0.52    0.50     2.8
## Life Exp      4 50   70.88    1.34   70.67    1.54   67.96    73.6
## Murder        5 50    7.38    3.69    6.85    5.19    1.40    15.1
## HS Grad       6 50   53.11    8.08   53.25    8.60   37.80    67.3

Variance-Covariance Matrix

Variances and Covariances

cov(states)
##               Population      Income   Illiteracy     Life Exp      Murder
## Population 19931683.7588 571229.7796  292.8679592 -407.8424612 5663.523714
## Income       571229.7796 377573.3061 -163.7020408  280.6631837 -521.894286
## Illiteracy      292.8680   -163.7020    0.3715306   -0.4815122    1.581776
## Life Exp       -407.8425    280.6632   -0.4815122    1.8020204   -3.869480
## Murder         5663.5237   -521.8943    1.5817755   -3.8694804   13.627465
## HS Grad       -3551.5096   3076.7690   -3.2354694    6.3126849  -14.549616
##                 HS Grad
## Population -3551.509551
## Income      3076.768980
## Illiteracy    -3.235469
## Life Exp       6.312685
## Murder       -14.549616
## HS Grad       65.237894

Correlation Matrix

Pearson product-moment correlation coefficients

cor(states)
##             Population     Income Illiteracy    Life Exp     Murder
## Population  1.00000000  0.2082276  0.1076224 -0.06805195  0.3436428
## Income      0.20822756  1.0000000 -0.4370752  0.34025534 -0.2300776
## Illiteracy  0.10762237 -0.4370752  1.0000000 -0.58847793  0.7029752
## Life Exp   -0.06805195  0.3402553 -0.5884779  1.00000000 -0.7808458
## Murder      0.34364275 -0.2300776  0.7029752 -0.78084575  1.0000000
## HS Grad    -0.09848975  0.6199323 -0.6571886  0.58221620 -0.4879710
##                HS Grad
## Population -0.09848975
## Income      0.61993232
## Illiteracy -0.65718861
## Life Exp    0.58221620
## Murder     -0.48797102
## HS Grad     1.00000000

Correlation Matrix

Spearman rank-order correlation coefficients.

cor(states, method="spearman")
##            Population     Income Illiteracy   Life Exp     Murder
## Population  1.0000000  0.1246098  0.3130496 -0.1040171  0.3457401
## Income      0.1246098  1.0000000 -0.3145948  0.3241050 -0.2174623
## Illiteracy  0.3130496 -0.3145948  1.0000000 -0.5553735  0.6723592
## Life Exp   -0.1040171  0.3241050 -0.5553735  1.0000000 -0.7802406
## Murder      0.3457401 -0.2174623  0.6723592 -0.7802406  1.0000000
## HS Grad    -0.3833649  0.5104809 -0.6545396  0.5239410 -0.4367330
##               HS Grad
## Population -0.3833649
## Income      0.5104809
## Illiteracy -0.6545396
## Life Exp    0.5239410
## Murder     -0.4367330
## HS Grad     1.0000000

We can see that, a strong positice correlation exists between income and high school graduation rate and that a strong negative correlation exists between illiteracy rates and life expectancy.

Non-sqaure Matrices

x <- states[,c("Population", "Income", "Illiteracy", "HS Grad")]
y <- states[,c("Life Exp", "Murder")]
cor(x,y)
##               Life Exp     Murder
## Population -0.06805195  0.3436428
## Income      0.34025534 -0.2300776
## Illiteracy -0.58847793  0.7029752
## HS Grad     0.58221620 -0.4879710

This is useful when we are interested in the relationship between one set of variables and another.

Testing a correlation coefficient for significance

cor.test(states[,3], states[,5])
## 
##  Pearson's product-moment correlation
## 
## data:  states[, 3] and states[, 5]
## t = 6.8479, df = 48, p-value = 1.258e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5279280 0.8207295
## sample estimates:
##       cor 
## 0.7029752

It tests the null hypothesis that the Pearson correlation between life expectancy and the murder rate is 0.

Correlation matrix and tests of significance via corr.test()

library(psych)
corr.test(states, use="complete")

We can test only one correlation at a time using cor.test(), but the corr.test() function provided in the psych package allows you to test in more at a time.
The corr.test() function produces correlations and significance levels for matrices of Pearson, Spearman and Kendall correlations.

Correlation matrix and tests of significance via corr.test()

## Call:corr.test(x = states, use = "complete")
## Correlation matrix 
##            Population Income Illiteracy Life Exp Murder HS Grad
## Population       1.00   0.21       0.11    -0.07   0.34   -0.10
## Income           0.21   1.00      -0.44     0.34  -0.23    0.62
## Illiteracy       0.11  -0.44       1.00    -0.59   0.70   -0.66
## Life Exp        -0.07   0.34      -0.59     1.00  -0.78    0.58
## Murder           0.34  -0.23       0.70    -0.78   1.00   -0.49
## HS Grad         -0.10   0.62      -0.66     0.58  -0.49    1.00
## Sample Size 
## [1] 50
## Probability values (Entries above the diagonal are adjusted for multiple tests.) 
##            Population Income Illiteracy Life Exp Murder HS Grad
## Population       0.00   0.59       1.00      1.0   0.10       1
## Income           0.15   0.00       0.01      0.1   0.54       0
## Illiteracy       0.46   0.00       0.00      0.0   0.00       0
## Life Exp         0.64   0.02       0.00      0.0   0.00       0
## Murder           0.01   0.11       0.00      0.0   0.00       0
## HS Grad          0.50   0.00       0.00      0.0   0.00       0
## 
##  To see confidence intervals of the correlations, print with the short=FALSE option

Correlations with significance levels

The rcorr() function in the Hmisc package produces correlations/covariances and significance levels for pearson and spearman correlations.

library(Hmisc)
rcorr(states, type="pearson") # type can be pearson or spearman
##            Population Income Illiteracy Life Exp Murder HS Grad
## Population       1.00   0.21       0.11    -0.07   0.34   -0.10
## Income           0.21   1.00      -0.44     0.34  -0.23    0.62
## Illiteracy       0.11  -0.44       1.00    -0.59   0.70   -0.66
## Life Exp        -0.07   0.34      -0.59     1.00  -0.78    0.58
## Murder           0.34  -0.23       0.70    -0.78   1.00   -0.49
## HS Grad         -0.10   0.62      -0.66     0.58  -0.49    1.00
## 
## n= 50 
## 
## 
## P
##            Population Income Illiteracy Life Exp Murder HS Grad
## Population            0.1467 0.4569     0.6387   0.0146 0.4962 
## Income     0.1467            0.0015     0.0156   0.1080 0.0000 
## Illiteracy 0.4569     0.0015            0.0000   0.0000 0.0000 
## Life Exp   0.6387     0.0156 0.0000              0.0000 0.0000 
## Murder     0.0146     0.1080 0.0000     0.0000          0.0003 
## HS Grad    0.4962     0.0000 0.0000     0.0000   0.0003