Why do we use graphs in data analysis?

Characteristics of exploratory graphs

Air Pollution in the United States

Data

Annual average PM2.5 averaged over the period 2008 through 2010

pollution <- read.csv("data/avgpm25.csv", colClasses = c("numeric", "character", "factor", "numeric", "numeric"))
head(pollution)
##     pm25  fips region longitude latitude
## 1  9.771 01003   east    -87.75    30.59
## 2  9.994 01027   east    -85.84    33.27
## 3 10.689 01033   east    -87.73    34.73
## 4 11.337 01049   east    -85.80    34.46
## 5 12.120 01055   east    -86.03    34.02
## 6 10.828 01069   east    -85.35    31.19

Do any counties exceed the standard of 12 μg/m3 ?

Simple Summaries of Data

One dimension + Five-number summary + Boxplots + Histograms + Density plot + Barplot

Five Number Summary

summary(pollution$pm25)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.38    8.55   10.00    9.84   11.40   18.40

Boxplot

boxplot(pollution$pm25,col="blue")

plot of chunk unnamed-chunk-3

Histogram

plot of chunk unnamed-chunk-4

Histogram

plot of chunk unnamed-chunk-5

Histogram

plot of chunk unnamed-chunk-6

Overlaying Features

boxplot(pollution$pm25)
abline(h=12,col="blue")

plot of chunk unnamed-chunk-7

Overlaying Features

plot of chunk unnamed-chunk-8

Barplot

barplot(table(pollution$region), col = "wheat", main = "Number of Counties in Each Region")

plot of chunk unnamed-chunk-9

Simple summaries of Data

Two dimensions

Multiple Boxplots

boxplot(pm25 ~ region, data = pollution, col = c("magenta","green"))

plot of chunk unnamed-chunk-10

Multiple Histogram

par(mfrow = c(2, 1), mar = c(4, 4, 2, 1)) 
hist(subset(pollution, region == "east")$pm25, col = "magenta") 
hist(subset(pollution, region == "west")$pm25, col = "green")

plot of chunk unnamed-chunk-11

Scatterplot

with(pollution, plot(latitude, pm25))
abline(h=12,lwd=2,lty=2)

plot of chunk unnamed-chunk-12

Scatterplot - Using color

with(pollution, plot(latitude, pm25,col=region))
abline(h=12,lwd=2,lty=2)

plot of chunk unnamed-chunk-13

Multiple Scatterplot

par(mfrow = c(1, 2), mar = c(5, 4, 2, 1))
with(subset(pollution, region == "west"), plot(latitude, pm25, main = "West",col=region)) 
with(subset(pollution, region == "east"), plot(latitude, pm25, main = "East",col=region))

plot of chunk unnamed-chunk-14

Summary