
Summary: Simpson's Paradox is the changing of the direction of a relationship with the introduction of another variable.
The relationship between Price and Number of pages in a book changes with the introduction of the variable Type of Book (Hardcover, Paperback).
See the R Markdown document Simpsons Paradox.
The Scatterplot matrix is a very useful plot for seeing the correlations between variables in a dataset.
Not so useful with more than about 10 variables.
What to do with more variables?
corrgram(mtcars, order=TRUE,
lower.panel=panel.shade,
upper.panel=panel.pie,
text.panel=panel.txt,
main="Car Milage Data in PC2/PC1 Order")
corrgram(mtcars, order=TRUE,
lower.panel=panel.ellipse,
upper.panel=panel.pts,
text.panel=panel.txt,
diag.panel=panel.minmax,
main="Car Milage Data in PC2/PC1 Order")
Be sure to study the discussion of the use area in the section about Bubbles on pages 193 and 194.
Look at Figure 6-12, 6-13, 6-14, 6-16 and 6-17. Study Figure 6-17 that uses the correct sized cirles.
Using the birth-rate-yearly.csv and the relationship.R code to make the plots in the Comparison section of Chapter 6. And here is an RPub that uses tidyr to reshape the data. birth-rate