R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Correlations

In bivariate distribution, we may be interested to find out if there is any correlation or covariation between the two variables under study. If the change in one variable affects a change in the other variable, the variables are said to be correlated.

Correlations

Correlations Example

For example, the correlation between the heights and weights of a group of persons is positive and the correlation between price and demand of commodity is negative

Types of Correlations

Pearson, Spearman and Kendall

State dataset

We will use the state.x77 dataset available in the base R installation. It provides data on the following for 50 US states in 1977. * population * income * illiteracy rate * life expectancy * murder rate and * high school graduation rate

For data description column please visit Data Description

Summary of the State dataset

states<- state.x77[,1:6]  # all rows and 1-6 columns
library(psych)
describe(states)[, c(1:5, 7:9)]  # selected columns
##            vars  n    mean      sd  median     mad     min     max
## Population    1 50 4246.42 4464.49 2838.50 2890.33  365.00 21198.0
## Income        2 50 4435.80  614.47 4519.00  581.18 3098.00  6315.0
## Illiteracy    3 50    1.17    0.61    0.95    0.52    0.50     2.8
## Life Exp      4 50   70.88    1.34   70.67    1.54   67.96    73.6
## Murder        5 50    7.38    3.69    6.85    5.19    1.40    15.1
## HS Grad       6 50   53.11    8.08   53.25    8.60   37.80    67.3

Variance-Covariance Matrix

Variances and Covariances

# variance-covariance matrix stored in 'matrix'
matrix1 <- cov(states) 
# round upto 2 decimal places
round(matrix1, 2)   
##             Population    Income Illiteracy Life Exp  Murder  HS Grad
## Population 19931683.76 571229.78     292.87  -407.84 5663.52 -3551.51
## Income       571229.78 377573.31    -163.70   280.66 -521.89  3076.77
## Illiteracy      292.87   -163.70       0.37    -0.48    1.58    -3.24
## Life Exp       -407.84    280.66      -0.48     1.80   -3.87     6.31
## Murder         5663.52   -521.89       1.58    -3.87   13.63   -14.55
## HS Grad       -3551.51   3076.77      -3.24     6.31  -14.55    65.24

Correlation Matrix

Pearson product-moment correlation coefficients

# Pearson product-moment correlation coefficients stored in 'matrix2'
matrix2 <- cor(states)
# round upto 2 decimal places
round(matrix2, 2) 
##            Population Income Illiteracy Life Exp Murder HS Grad
## Population       1.00   0.21       0.11    -0.07   0.34   -0.10
## Income           0.21   1.00      -0.44     0.34  -0.23    0.62
## Illiteracy       0.11  -0.44       1.00    -0.59   0.70   -0.66
## Life Exp        -0.07   0.34      -0.59     1.00  -0.78    0.58
## Murder           0.34  -0.23       0.70    -0.78   1.00   -0.49
## HS Grad         -0.10   0.62      -0.66     0.58  -0.49    1.00

Correlation Matrix

Spearman rank-order correlation coefficients.

# Spearman rank-order correlation coefficients stored in 'matrix3'
matrix3 <- cor(states, method="spearman")
# round upto 2 decimal places
round(matrix3, 2) 
##            Population Income Illiteracy Life Exp Murder HS Grad
## Population       1.00   0.12       0.31    -0.10   0.35   -0.38
## Income           0.12   1.00      -0.31     0.32  -0.22    0.51
## Illiteracy       0.31  -0.31       1.00    -0.56   0.67   -0.65
## Life Exp        -0.10   0.32      -0.56     1.00  -0.78    0.52
## Murder           0.35  -0.22       0.67    -0.78   1.00   -0.44
## HS Grad         -0.38   0.51      -0.65     0.52  -0.44    1.00

We can see that, a strong positice correlation exists between income and high school graduation rate and that a strong negative correlation exists between illiteracy rates and life expectancy.

Non-sqaure Matrices

# Population, Income, Illiteracy and HS Grad as rows
x <- states[,c("Population", "Income", "Illiteracy", "HS Grad")]
# Life Exp and Murder as columns
y <- states[,c("Life Exp", "Murder")]
# non-square correlation matrix stored in 'matrix4'
matrix4 <- cor(x,y)
# round upto 2 decimal places
round(matrix4, 2)
##            Life Exp Murder
## Population    -0.07   0.34
## Income         0.34  -0.23
## Illiteracy    -0.59   0.70
## HS Grad        0.58  -0.49

This is useful when we are interested in the relationship between one set of variables and another.

Testing a correlation coefficient for significance

# correlation test between Illiteracy(3rd column) and Life Exp (5th column)
cor.test(states[,3], states[,5])
## 
##  Pearson's product-moment correlation
## 
## data:  states[, 3] and states[, 5]
## t = 6.8479, df = 48, p-value = 1.258e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5279280 0.8207295
## sample estimates:
##       cor 
## 0.7029752

It tests the null hypothesis that the Pearson correlation between life expectancy and the murder rate is 0.

Correlation matrix and tests of significance via corr.test()

library(psych)
corr.test(states, use="complete")

We can test only one correlation at a time using cor.test(), but the corr.test() function provided in the psych package allows you to test in more at a time.
The corr.test() function produces correlations and significance levels for matrices of Pearson, Spearman and Kendall correlations.

Correlation matrix and tests of significance via corr.test()

## Call:corr.test(x = states, use = "complete")
## Correlation matrix 
##            Population Income Illiteracy Life Exp Murder HS Grad
## Population       1.00   0.21       0.11    -0.07   0.34   -0.10
## Income           0.21   1.00      -0.44     0.34  -0.23    0.62
## Illiteracy       0.11  -0.44       1.00    -0.59   0.70   -0.66
## Life Exp        -0.07   0.34      -0.59     1.00  -0.78    0.58
## Murder           0.34  -0.23       0.70    -0.78   1.00   -0.49
## HS Grad         -0.10   0.62      -0.66     0.58  -0.49    1.00
## Sample Size 
## [1] 50
## Probability values (Entries above the diagonal are adjusted for multiple tests.) 
##            Population Income Illiteracy Life Exp Murder HS Grad
## Population       0.00   0.59       1.00      1.0   0.10       1
## Income           0.15   0.00       0.01      0.1   0.54       0
## Illiteracy       0.46   0.00       0.00      0.0   0.00       0
## Life Exp         0.64   0.02       0.00      0.0   0.00       0
## Murder           0.01   0.11       0.00      0.0   0.00       0
## HS Grad          0.50   0.00       0.00      0.0   0.00       0
## 
##  To see confidence intervals of the correlations, print with the short=FALSE option

Correlations with significance levels

The rcorr() function in the Hmisc package produces correlations/covariances and significance levels for pearson and spearman correlations.

library(Hmisc)
rcorr(states, type="pearson")    # type can be pearson or spearman

Correlations with significance levels

## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
## 
## Attaching package: 'Hmisc'
## The following object is masked from 'package:psych':
## 
##     describe
## The following objects are masked from 'package:base':
## 
##     format.pval, units
##            Population Income Illiteracy Life Exp Murder HS Grad
## Population       1.00   0.21       0.11    -0.07   0.34   -0.10
## Income           0.21   1.00      -0.44     0.34  -0.23    0.62
## Illiteracy       0.11  -0.44       1.00    -0.59   0.70   -0.66
## Life Exp        -0.07   0.34      -0.59     1.00  -0.78    0.58
## Murder           0.34  -0.23       0.70    -0.78   1.00   -0.49
## HS Grad         -0.10   0.62      -0.66     0.58  -0.49    1.00
## 
## n= 50 
## 
## 
## P
##            Population Income Illiteracy Life Exp Murder HS Grad
## Population            0.1467 0.4569     0.6387   0.0146 0.4962 
## Income     0.1467            0.0015     0.0156   0.1080 0.0000 
## Illiteracy 0.4569     0.0015            0.0000   0.0000 0.0000 
## Life Exp   0.6387     0.0156 0.0000              0.0000 0.0000 
## Murder     0.0146     0.1080 0.0000     0.0000          0.0003 
## HS Grad    0.4962     0.0000 0.0000     0.0000   0.0003