This lab provides the code to replicate the following NYT Graphic using the core R graphics package.
These baseball data come from the Lahman package, which contains Teams - seasonal stats for each team.
library(Lahman)
data(Teams) We can view the first few rows of Teams with function head().
head(Teams)Summary statistics required for this graphic are calculated as follows.
Note: The years in the dataset exceed the years shown in the NYT graphic and require filtering or plotting limits.
ave.so <- Teams$SO / Teams$G
year <- Teams$yearID
ave.so.min <- min(ave.so, na.rm = TRUE)
ave.so.max <- max(ave.so, na.rm = TRUE)
league.ave <- tapply(X = ave.so,
INDEX = as.factor(year),
FUN = "mean",
na.rm = TRUE)
league.year <- as.numeric(names(league.ave))
The following recreates the NYT graphic as closely as possible.
Note: You may replicate the graphic step-by-step or in a single code chunk.
The following establishes plot dimensions and specifies x- and y-axis variables.
plot.new()
plot.window(xlim = c(1900, 2012),
ylim = c(ave.so.min,
ave.so.max))
points(x = year,
y = ave.so)
The following additional code builds on the previous plot to specify point color, size, and shape.
plot.new()
plot.window(xlim = c(1900, 2012),
ylim = c(ave.so.min,
ave.so.max))
points(x = year,
y = ave.so,
col = "gray85",
pch = 16,
cex = 0.75)
The following code replicates the NYT graphic in its entirety.
ave.so <- Teams$SO / Teams$G
year <- Teams$yearID
ave.so.min <- min(ave.so, na.rm = TRUE)
ave.so.max <- max(ave.so, na.rm = TRUE)
league.ave <- tapply(X = ave.so,
INDEX = as.factor(year),
FUN = "mean",
na.rm = TRUE)
league.year <- as.numeric(names(league.ave))
plot.new()
plot.window(xlim = c(1895, 2015),
ylim = c(ave.so.min,
ave.so.max), xaxs="i")
points(x=year,y=ave.so,
col = "gray85",
pch = 16,
cex = 0.75)
points(x=league.year,
y=league.ave,
type="b",
pch=20,
lty=1,
lwd=1,
col="steel blue",
bg="steel blue",
)
lines(x=league.year,
y=league.ave,
lty=1,
lwd=1,
col="steel blue",
)
axis(side=1,
lwd=2,
hadj=-0.1,
padj=-1,
at=seq(from=1900,
to=2012,
by=10),cex.axis=.7)
axis(side=4,
lwd=0,
las=1,
col.axis="gray85",
at=seq(from=0,
to=9,
by=1),
cex.axis=.7)
title(xlab="Year", ylab="Strikeouts")
title(main="Strikeouts on the Rise", adj=0)
#I have seen how to do this in ggplot
title(main="There were more strikeouts in 2012 than at any other time in major league history", cex=0.4, line=10.7)
text(1916, 1,"U.S. enters \n World War I.", adj=0.5, col="gray85", cex=0.5)
text(1945, 1.8, "Players return \n from World War II", adj=0.5, col="gray85", cex=0.5)
text(1963, 2.8, "Strike zone enlarged \n from 1963-1968", adj=0.5, col="gray85", cex=0.5)
text(1972, 1.7, "Designated hitter \n rule took effect.", adj=0.5, col="gray85", cex=0.5)
text(2008, 3.75,"Mitchell report \n on steroids.", adj=0.5, col="gray85", cex=0.5)
text(1969, 8.9,"Pitching had become so dominant \n in the 1960s that the mound \n was lowered in 1969.", adj=0.5, col="gray85", cex=0.5)
segments(1917, 1.1, 1917, 3.47, col= "gray85")
segments(1946, 1.9, 1946, 3.88, col= "gray85")
segments(1963, 2.9, 1963, 5.80, col= "gray85")
segments(1973, 1.8, 1973, 5.20, col= "gray85")
segments(2008, 3.76, 2008, 6.80, col= "gray85")
segments(1969, 5.83, 1969, 8.2, col= "gray85")
legend("topleft",legend=c("League average", "Choose a team"),
col= c("steel blue", "darkgoldenrod2"), lty=1:1, cex=0.5,
title= "Strikeouts per game per team", box.lty=0)
abline(h=seq(1,9, 1), col="lightgray", lty=3)
points(1924,2.7,
col="black", cex=1.5, lwd=2 )
points(2012,7.5,
col="black", cex=1.5, lwd=2 )
text(1924,2.5, "2.7",
col="steel blue", pos=1,cex=1.5, adj=0.5)
text(1924,1.8, "League average \n 1924",
col="black", pos=1, cex=0.5,adj=0.5, )
text(2010,7.5, "7.5",
col="steel blue", pos=3, cex=1.5, adj=0.5)
text(2010,8.5, "2012 \n League average",
col="black", pos=3, cex=0.5, adj=0.5)