Correlations

Sameer Mathur

Correlations

In bivariate distribution, we may be interested to find out if there is any correlation or covariation between the two variables under study. If the change in one variable affects a change in the other variable, the variables are said to be correlated.

Correlations

  • If the two variables deviate in the same direction i.e. if the increase (or decrease) in one results in a corresponding increase (or decrease) in the other, correlation is said to be direct or positive.
  • If they constantly deviate in the opposite directions i.e. if increase (or decrease) in one results in corresponding decrease (or increase) in the other, correlation is said to be diverse or negative.

Correlations Example

For example, the correlation between the heights and weights of a group of persons is positive and the correlation between price and demand of commodity is negative

Types of Correlations

Pearson, Spearman and Kendall

  • The Pearson product moment correlation assesses the degree of linear relationship between two quantitative variables.
  • Spearman's rank-order correlation coefficient assesses the degree of relationship between two rank-ordered variables.
  • Kendall's tau is nonparametric measure of rank correlation.

State dataset

We will use the state.x77 dataset available in the base R installation. It provides data on the following for 50 US states in 1977.

  • population
  • income
  • illiteracy rate
  • life expectancy
  • murder rate and
  • high school graduation rate

For data description column please visit Data Description

Summary of the State dataset

states<- state.x77[,1:6]  # all rows and 1-6 columns
library(psych)
describe(states)[, c(1:5, 7:9)]  # selected columns
           vars  n    mean      sd  median     mad     min     max
Population    1 50 4246.42 4464.49 2838.50 2890.33  365.00 21198.0
Income        2 50 4435.80  614.47 4519.00  581.18 3098.00  6315.0
Illiteracy    3 50    1.17    0.61    0.95    0.52    0.50     2.8
Life Exp      4 50   70.88    1.34   70.67    1.54   67.96    73.6
Murder        5 50    7.38    3.69    6.85    5.19    1.40    15.1
HS Grad       6 50   53.11    8.08   53.25    8.60   37.80    67.3

Variance-Covariance Matrix

Variances and Covariances

# variance-covariance matrix stored in 'matrix'
matrix1 <- cov(states) 
# round upto 2 decimal places
round(matrix1, 2)   
            Population    Income Illiteracy Life Exp  Murder  HS Grad
Population 19931683.76 571229.78     292.87  -407.84 5663.52 -3551.51
Income       571229.78 377573.31    -163.70   280.66 -521.89  3076.77
Illiteracy      292.87   -163.70       0.37    -0.48    1.58    -3.24
Life Exp       -407.84    280.66      -0.48     1.80   -3.87     6.31
Murder         5663.52   -521.89       1.58    -3.87   13.63   -14.55
HS Grad       -3551.51   3076.77      -3.24     6.31  -14.55    65.24

Correlation Matrix

Pearson product-moment correlation coefficients

# Pearson product-moment correlation coefficients stored in 'matrix2'
matrix2 <- cor(states)
# round upto 2 decimal places
round(matrix2, 2) 
           Population Income Illiteracy Life Exp Murder HS Grad
Population       1.00   0.21       0.11    -0.07   0.34   -0.10
Income           0.21   1.00      -0.44     0.34  -0.23    0.62
Illiteracy       0.11  -0.44       1.00    -0.59   0.70   -0.66
Life Exp        -0.07   0.34      -0.59     1.00  -0.78    0.58
Murder           0.34  -0.23       0.70    -0.78   1.00   -0.49
HS Grad         -0.10   0.62      -0.66     0.58  -0.49    1.00

Correlation Matrix

Spearman rank-order correlation coefficients.

# Spearman rank-order correlation coefficients stored in 'matrix3'
matrix3 <- cor(states, method="spearman")
# round upto 2 decimal places
round(matrix3, 2) 
           Population Income Illiteracy Life Exp Murder HS Grad
Population       1.00   0.12       0.31    -0.10   0.35   -0.38
Income           0.12   1.00      -0.31     0.32  -0.22    0.51
Illiteracy       0.31  -0.31       1.00    -0.56   0.67   -0.65
Life Exp        -0.10   0.32      -0.56     1.00  -0.78    0.52
Murder           0.35  -0.22       0.67    -0.78   1.00   -0.44
HS Grad         -0.38   0.51      -0.65     0.52  -0.44    1.00

We can see that, a strong positice correlation exists between income and high school graduation rate and that a strong negative correlation exists between illiteracy rates and life expectancy.

Non-sqaure Matrices

# Population, Income, Illiteracy and HS Grad as rows
x <- states[,c("Population", "Income", "Illiteracy", "HS Grad")]
# Life Exp and Murder as columns
y <- states[,c("Life Exp", "Murder")]
# non-square correlation matrix stored in 'matrix4'
matrix4 <- cor(x,y)
# round upto 2 decimal places
round(matrix4, 2)
           Life Exp Murder
Population    -0.07   0.34
Income         0.34  -0.23
Illiteracy    -0.59   0.70
HS Grad        0.58  -0.49

This is useful when we are interested in the relationship between one set of variables and another.

Testing a correlation coefficient for significance

# correlation test between Illiteracy(3rd column) and Life Exp (5th column)
cor.test(states[,3], states[,5])

    Pearson's product-moment correlation

data:  states[, 3] and states[, 5]
t = 6.8479, df = 48, p-value = 1.258e-08
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.5279280 0.8207295
sample estimates:
      cor 
0.7029752 

It tests the null hypothesis that the Pearson correlation between life expectancy and the murder rate is 0.

Correlation matrix and tests of significance via corr.test()

library(psych)
corr.test(states, use="complete")

We can test only one correlation at a time using cor.test(), but the corr.test() function provided in the psych package allows you to test in more at a time.
The corr.test() function produces correlations and significance levels for matrices of Pearson, Spearman and Kendall correlations.

Correlation matrix and tests of significance via corr.test()

Call:corr.test(x = states, use = "complete")
Correlation matrix 
           Population Income Illiteracy Life Exp Murder HS Grad
Population       1.00   0.21       0.11    -0.07   0.34   -0.10
Income           0.21   1.00      -0.44     0.34  -0.23    0.62
Illiteracy       0.11  -0.44       1.00    -0.59   0.70   -0.66
Life Exp        -0.07   0.34      -0.59     1.00  -0.78    0.58
Murder           0.34  -0.23       0.70    -0.78   1.00   -0.49
HS Grad         -0.10   0.62      -0.66     0.58  -0.49    1.00
Sample Size 
[1] 50
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
           Population Income Illiteracy Life Exp Murder HS Grad
Population       0.00   0.59       1.00      1.0   0.10       1
Income           0.15   0.00       0.01      0.1   0.54       0
Illiteracy       0.46   0.00       0.00      0.0   0.00       0
Life Exp         0.64   0.02       0.00      0.0   0.00       0
Murder           0.01   0.11       0.00      0.0   0.00       0
HS Grad          0.50   0.00       0.00      0.0   0.00       0

 To see confidence intervals of the correlations, print with the short=FALSE option

Correlations with significance levels

The rcorr() function in the Hmisc package produces correlations/covariances and significance levels for pearson and spearman correlations.

library(Hmisc)
rcorr(states, type="pearson")    # type can be pearson or spearman

Correlations with significance levels

           Population Income Illiteracy Life Exp Murder HS Grad
Population       1.00   0.21       0.11    -0.07   0.34   -0.10
Income           0.21   1.00      -0.44     0.34  -0.23    0.62
Illiteracy       0.11  -0.44       1.00    -0.59   0.70   -0.66
Life Exp        -0.07   0.34      -0.59     1.00  -0.78    0.58
Murder           0.34  -0.23       0.70    -0.78   1.00   -0.49
HS Grad         -0.10   0.62      -0.66     0.58  -0.49    1.00

n= 50 


P
           Population Income Illiteracy Life Exp Murder HS Grad
Population            0.1467 0.4569     0.6387   0.0146 0.4962 
Income     0.1467            0.0015     0.0156   0.1080 0.0000 
Illiteracy 0.4569     0.0015            0.0000   0.0000 0.0000 
Life Exp   0.6387     0.0156 0.0000              0.0000 0.0000 
Murder     0.0146     0.1080 0.0000     0.0000          0.0003 
HS Grad    0.4962     0.0000 0.0000     0.0000   0.0003