Corrgrams

Sameer Mathur

State dataset

We will use the state.x77 dataset available in the base R installation. It provides data on the following for 50 US states in 1977.

  • population
  • income
  • illiteracy rate
  • life expectancy
  • murder rate and
  • high school graduation rate..

For data description column please visit Data Description

Summary State dataset

states<- state.x77[,1:6]
library(psych)
describe(states)[, c(1:5, 7:9)]  # selected column
           vars  n    mean      sd  median     mad     min     max
Population    1 50 4246.42 4464.49 2838.50 2890.33  365.00 21198.0
Income        2 50 4435.80  614.47 4519.00  581.18 3098.00  6315.0
Illiteracy    3 50    1.17    0.61    0.95    0.52    0.50     2.8
Life Exp      4 50   70.88    1.34   70.67    1.54   67.96    73.6
Murder        5 50    7.38    3.69    6.85    5.19    1.40    15.1
HS Grad       6 50   53.11    8.08   53.25    8.60   37.80    67.3

Corrgrams

Consider the correlations among the variables in the states data frame.

library(corrgram)
corrgram(states, order=TRUE, lower.panel=panel.conf,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram of states intercorrelations")

Corrgrams

plot of chunk unnamed-chunk-3

Corrgram of the correlations among the variables in the states data frame. Rows and columns have been reordered using principal components analysis.

Interpretation of this Corrgram

Start with the lower triangle of the cells:

  • The blue color and hashing that goes from lower left to upper right represent a positive correlation between the two variables that meet at that cell.
  • The red color and hashing that goes from the upper left to lower right represent a negative correlation
  • The darker and more saturated the color, the greater the magnitude of the correlation.
  • Weak correlation near zero, appear washed out.

Corrgrams

corrgram(states, order=TRUE, lower.panel=panel.ellipse,
         upper.panel=panel.pts, text.panel=panel.txt,
         diag.panel=panel.minmax,
         main="Corrgram of states data using scatter plots
and ellipses")

Here we are using smoothed fit lines and confidence ellipses in the lower triangle and the scatter plots in the upper triangle.

Corrgrams

plot of chunk unnamed-chunk-5

corrgram of the correlations among the variables in the states data frame. The lower triangle contains smoothed best fit lines and confidence ellipses, and the yupper triangle contains scatter plots. The diagonal panel contains minimum and maximum values. Rows and columns have been reordered using principal component analysis.

Corrgrams

cols <- colorRampPalette(c("darkgoldenrod4", "burlywood1",
                           "darkkhaki", "darkgreen"))
corrgram(states, order=TRUE, col.regions=cols,
         lower.panel=panel.shade,
         upper.panel=panel.conf, text.panel=panel.txt,
         main="A Corrgram (or Horse) of a Different Color")

Here we are using shading in the lower triangle, keeping the original variable order.

Corrgrams

plot of chunk unnamed-chunk-7

corrgram of the correlations among the variables in the states data frame. The lower triangle is shaded to represent the magnitude and direction of the correlations. The variables are plotted in their original order.