R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

State dataset

We will use the state.x77 dataset available in the base R installation. It provides data on the following for 50 US states in 1977. * population * income * illiteracy rate * life expectancy * murder rate and * high school graduation rate..

For data description column please visit Data Description

Summary State dataset

states<- state.x77[,1:6]
library(psych)
describe(states)[, c(1:5, 7:9)]  # selected column
##            vars  n    mean      sd  median     mad     min     max
## Population    1 50 4246.42 4464.49 2838.50 2890.33  365.00 21198.0
## Income        2 50 4435.80  614.47 4519.00  581.18 3098.00  6315.0
## Illiteracy    3 50    1.17    0.61    0.95    0.52    0.50     2.8
## Life Exp      4 50   70.88    1.34   70.67    1.54   67.96    73.6
## Murder        5 50    7.38    3.69    6.85    5.19    1.40    15.1
## HS Grad       6 50   53.11    8.08   53.25    8.60   37.80    67.3

Corrgrams

Consider the correlations among the variables in the states data frame.

library(corrgram)
corrgram(states, order=TRUE, lower.panel=panel.conf,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram of states intercorrelations")

Corrgrams

Corrgram of the correlations among the variables in the states data frame. Rows and columns have been reordered using principal components analysis.

Interpretation of this Corrgram

Start with the lower triangle of the cells:

Corrgrams

corrgram(states, order=TRUE, lower.panel=panel.ellipse,
         upper.panel=panel.pts, text.panel=panel.txt,
         diag.panel=panel.minmax,
         main="Corrgram of states data using scatter plots
and ellipses")

Here we are using smoothed fit lines and confidence ellipses in the lower triangle and the scatter plots in the upper triangle.

Corrgrams

corrgram of the correlations among the variables in the states data frame. The lower triangle contains smoothed best fit lines and confidence ellipses, and the yupper triangle contains scatter plots. The diagonal panel contains minimum and maximum values. Rows and columns have been reordered using principal component analysis.

Corrgrams

cols <- colorRampPalette(c("darkgoldenrod4", "burlywood1",
                           "darkkhaki", "darkgreen"))
corrgram(states, order=TRUE, col.regions=cols,
         lower.panel=panel.shade,
         upper.panel=panel.conf, text.panel=panel.txt,
         main="A Corrgram (or Horse) of a Different Color")

Here we are using shading in the lower triangle, keeping the original variable order.

Corrgrams

corrgram of the correlations among the variables in the states data frame. The lower triangle is shaded to represent the magnitude and direction of the correlations. The variables are plotted in their original order.

Corrgrams

library("PerformanceAnalytics")
chart.Correlation(states, histogram = TRUE, pch=19)

Corrgrams

## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
## 
##     legend