This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
In bivariate distribution, we may be interested to find out if there is any correlation or covariation between the two variables under study. If the change in one variable affects a change in the other variable, the variables are said to be correlated.
For example, the correlation between the heights and weights of a group of persons is positive and the correlation between price and demand of commodity is negative
Pearson, Spearman and Kendall
We will use the state.x77 dataset available in the base R installation. It provides data on the following for 50 US states in 1977. * population * income * illiteracy rate * life expectancy * murder rate and * high school graduation rate
For data description column please visit Data Description
states<- state.x77[,1:6] # all rows and 1-6 columns
library(psych)
describe(states)[, c(1:5, 7:9)] # selected columns
## vars n mean sd median mad min max
## Population 1 50 4246.42 4464.49 2838.50 2890.33 365.00 21198.0
## Income 2 50 4435.80 614.47 4519.00 581.18 3098.00 6315.0
## Illiteracy 3 50 1.17 0.61 0.95 0.52 0.50 2.8
## Life Exp 4 50 70.88 1.34 70.67 1.54 67.96 73.6
## Murder 5 50 7.38 3.69 6.85 5.19 1.40 15.1
## HS Grad 6 50 53.11 8.08 53.25 8.60 37.80 67.3
Variances and Covariances
# variance-covariance matrix stored in 'matrix'
matrix1 <- cov(states)
# round upto 2 decimal places
round(matrix1, 2)
## Population Income Illiteracy Life Exp Murder HS Grad
## Population 19931683.76 571229.78 292.87 -407.84 5663.52 -3551.51
## Income 571229.78 377573.31 -163.70 280.66 -521.89 3076.77
## Illiteracy 292.87 -163.70 0.37 -0.48 1.58 -3.24
## Life Exp -407.84 280.66 -0.48 1.80 -3.87 6.31
## Murder 5663.52 -521.89 1.58 -3.87 13.63 -14.55
## HS Grad -3551.51 3076.77 -3.24 6.31 -14.55 65.24
Pearson product-moment correlation coefficients
# Pearson product-moment correlation coefficients stored in 'matrix2'
matrix2 <- cor(states)
# round upto 2 decimal places
round(matrix2, 2)
## Population Income Illiteracy Life Exp Murder HS Grad
## Population 1.00 0.21 0.11 -0.07 0.34 -0.10
## Income 0.21 1.00 -0.44 0.34 -0.23 0.62
## Illiteracy 0.11 -0.44 1.00 -0.59 0.70 -0.66
## Life Exp -0.07 0.34 -0.59 1.00 -0.78 0.58
## Murder 0.34 -0.23 0.70 -0.78 1.00 -0.49
## HS Grad -0.10 0.62 -0.66 0.58 -0.49 1.00
Spearman rank-order correlation coefficients.
# Spearman rank-order correlation coefficients stored in 'matrix3'
matrix3 <- cor(states, method="spearman")
# round upto 2 decimal places
round(matrix3, 2)
## Population Income Illiteracy Life Exp Murder HS Grad
## Population 1.00 0.12 0.31 -0.10 0.35 -0.38
## Income 0.12 1.00 -0.31 0.32 -0.22 0.51
## Illiteracy 0.31 -0.31 1.00 -0.56 0.67 -0.65
## Life Exp -0.10 0.32 -0.56 1.00 -0.78 0.52
## Murder 0.35 -0.22 0.67 -0.78 1.00 -0.44
## HS Grad -0.38 0.51 -0.65 0.52 -0.44 1.00
We can see that, a strong positice correlation exists between income and high school graduation rate and that a strong negative correlation exists between illiteracy rates and life expectancy.
# Population, Income, Illiteracy and HS Grad as rows
x <- states[,c("Population", "Income", "Illiteracy", "HS Grad")]
# Life Exp and Murder as columns
y <- states[,c("Life Exp", "Murder")]
# non-square correlation matrix stored in 'matrix4'
matrix4 <- cor(x,y)
# round upto 2 decimal places
round(matrix4, 2)
## Life Exp Murder
## Population -0.07 0.34
## Income 0.34 -0.23
## Illiteracy -0.59 0.70
## HS Grad 0.58 -0.49
This is useful when we are interested in the relationship between one set of variables and another.
# correlation test between Illiteracy(3rd column) and Life Exp (5th column)
cor.test(states[,3], states[,5])
##
## Pearson's product-moment correlation
##
## data: states[, 3] and states[, 5]
## t = 6.8479, df = 48, p-value = 1.258e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5279280 0.8207295
## sample estimates:
## cor
## 0.7029752
It tests the null hypothesis that the Pearson correlation between life expectancy and the murder rate is 0.
library(psych)
corr.test(states, use="complete")
We can test only one correlation at a time using cor.test(), but the corr.test() function provided in the psych package allows you to test in more at a time. The corr.test() function produces correlations and significance levels for matrices of Pearson, Spearman and Kendall correlations.
## Call:corr.test(x = states, use = "complete")
## Correlation matrix
## Population Income Illiteracy Life Exp Murder HS Grad
## Population 1.00 0.21 0.11 -0.07 0.34 -0.10
## Income 0.21 1.00 -0.44 0.34 -0.23 0.62
## Illiteracy 0.11 -0.44 1.00 -0.59 0.70 -0.66
## Life Exp -0.07 0.34 -0.59 1.00 -0.78 0.58
## Murder 0.34 -0.23 0.70 -0.78 1.00 -0.49
## HS Grad -0.10 0.62 -0.66 0.58 -0.49 1.00
## Sample Size
## [1] 50
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## Population Income Illiteracy Life Exp Murder HS Grad
## Population 0.00 0.59 1.00 1.0 0.10 1
## Income 0.15 0.00 0.01 0.1 0.54 0
## Illiteracy 0.46 0.00 0.00 0.0 0.00 0
## Life Exp 0.64 0.02 0.00 0.0 0.00 0
## Murder 0.01 0.11 0.00 0.0 0.00 0
## HS Grad 0.50 0.00 0.00 0.0 0.00 0
##
## To see confidence intervals of the correlations, print with the short=FALSE option
The rcorr() function in the Hmisc package produces correlations/covariances and significance levels for pearson and spearman correlations.
library(Hmisc)
rcorr(states, type="pearson") # type can be pearson or spearman
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
##
## Attaching package: 'Hmisc'
## The following object is masked from 'package:psych':
##
## describe
## The following objects are masked from 'package:base':
##
## format.pval, units
## Population Income Illiteracy Life Exp Murder HS Grad
## Population 1.00 0.21 0.11 -0.07 0.34 -0.10
## Income 0.21 1.00 -0.44 0.34 -0.23 0.62
## Illiteracy 0.11 -0.44 1.00 -0.59 0.70 -0.66
## Life Exp -0.07 0.34 -0.59 1.00 -0.78 0.58
## Murder 0.34 -0.23 0.70 -0.78 1.00 -0.49
## HS Grad -0.10 0.62 -0.66 0.58 -0.49 1.00
##
## n= 50
##
##
## P
## Population Income Illiteracy Life Exp Murder HS Grad
## Population 0.1467 0.4569 0.6387 0.0146 0.4962
## Income 0.1467 0.0015 0.0156 0.1080 0.0000
## Illiteracy 0.4569 0.0015 0.0000 0.0000 0.0000
## Life Exp 0.6387 0.0156 0.0000 0.0000 0.0000
## Murder 0.0146 0.1080 0.0000 0.0000 0.0003
## HS Grad 0.4962 0.0000 0.0000 0.0000 0.0003