Numerical quantities focus on expected values, graphical summaries on unexpected values. – John Tukey
R is the lingua franca of statistical computing. One of the reasons for its popularity is R's ability to generate high quality graphics ranging from simple bar and pie charts to scatter plots, time siries plots, choropleth maps and many more. In this document I will demonstrate the use of some of those graphs for data exploration and analysis using Hadley Wickham's ggplot2 library.
As the first example of using simple Bar and Pie chart consider the following data set (only first 6 lines are shown.)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The following two charts show the count of cars by the number of cylinders (pie chart) and the breakdown vehicles with three different gear types (stacked bar chart.) As evident from the grahps, in this particular data set, the biggest category os the 8 cylinder cars which mostly come with 3 gears.
Scatterplot is a workhouse of statistical graphics showing relationships between two (or more) variables. Using the same dataset the following graph shows that heavier vehicles not surprisingly need more fuel to travel the same distance. We can also see that the relationship may differ in magnitude across 8, 6, and 4 cylinder cars.
For a more interesting example consider the results of a sleep study comparing the number of days in sleep deprivation to response times of 18 subjects.
## Reaction Days Subject
## 1 249.6 0 308
## 2 258.7 1 308
## 3 250.8 2 308
## 4 321.4 3 308
## 5 356.9 4 308
## 6 414.7 5 308
Here, we estimate the slopes for each subject and notice the response times always get worse with the exception of subject 335. This “outlier” may be in fact the most interesting of the bunch.