STAT 451, Day 25

Chapter 6: Visualizing Relationships

Example of Simpson's Paradox

Summary: Simpson's Paradox is the changing of the direction of a relationship with the introduction of another variable.

The relationship between Price and Number of pages in a book changes with the introduction of the variable Type of Book (Hardcover, Paperback).

See the R Markdown document Simpsons Paradox.

Very Useful Plot

The Scatterplot matrix is a very useful plot for seeing the correlations between variables in a dataset.

Not so useful with more than about 10 variables.

What to do with more variables?

The Correlogram is very useful

From the Quick-R website.

Correlogram

library(corrgram)

R code

corrgram(mtcars, order=TRUE, 
  lower.panel=panel.shade,
  upper.panel=panel.pie, 
  text.panel=panel.txt,
  main="Car Milage Data in PC2/PC1 Order")

plot of chunk unnamed-chunk-2

R code

corrgram(mtcars, order=TRUE, 
  lower.panel=panel.ellipse,
  upper.panel=panel.pts, 
  text.panel=panel.txt,
  diag.panel=panel.minmax, 
  main="Car Milage Data in PC2/PC1 Order")

plot of chunk unnamed-chunk-3

Bubbles

Be sure to study the discussion of the use area in the section about Bubbles on pages 193 and 194.

Look at Figure 6-12, 6-13, 6-14, 6-16 and 6-17. Study Figure 6-17 that uses the correct sized cirles.

Birth Year Comparison

Using the birth-rate-yearly.csv and the relationship.R code to make the plots in the Comparison section of Chapter 6. And here is an RPub that uses tidyr to reshape the data. birth-rate