Introduction

This lab provides the code to replicate the following NYT Graphic using the core R graphics package.




Lahman Data

These baseball data come from the Lahman package, which contains Teams - seasonal stats for each team.


library(Lahman)
data(Teams)                               


Previewing Teams Data

We can view the first few rows of Teams with function head().


head(Teams)


Preprocessing

Summary statistics required for this graphic are calculated as follows.

Note: The years in the dataset exceed the years shown in the NYT graphic and require filtering or plotting limits.


ave.so <- Teams$SO / Teams$G                      
year <- Teams$yearID                              

ave.so.min <- min(ave.so, na.rm = TRUE)
ave.so.max <- max(ave.so, na.rm = TRUE)          

league.ave <- tapply(X = ave.so, 
                     INDEX = as.factor(year), 
                     FUN = "mean", 
                     na.rm = TRUE)                

league.year <- as.numeric(names(league.ave))      



Replication

The following recreates the NYT graphic as closely as possible.

Note: You may replicate the graphic step-by-step or in a single code chunk.


1. Average Strike-Outs by Year

The following establishes plot dimensions and specifies x- and y-axis variables.

plot.new()

plot.window(xlim = c(1900, 2012), 
            ylim = c(ave.so.min, 
                     ave.so.max))   

points(x = year, 
       y = ave.so)                  



2. Point Color, Size, & Shape

The following additional code builds on the previous plot to specify point color, size, and shape.

plot.new()

plot.window(xlim = c(1900, 2012),  
            ylim = c(ave.so.min, 
                     ave.so.max))   

points(x = year, 
       y = ave.so,
       col = "gray85",              
       pch = 16,                    
       cex = 0.75)                  



3. Full Replication

The following code replicates the NYT graphic in its entirety.

ave.so <- Teams$SO / Teams$G                      
year <- Teams$yearID                           

ave.so.min <- min(ave.so, na.rm = TRUE)
ave.so.max <- max(ave.so, na.rm = TRUE)           

league.ave <- tapply(X = ave.so, 
                     INDEX = as.factor(year), 
                     FUN = "mean", 
                     na.rm = TRUE)                

league.year <- as.numeric(names(league.ave))       


plot.new()

plot.window(xlim = c(1895, 2015),    
            ylim = c(ave.so.min, 
                     ave.so.max), xaxs="i")

points(x=year,y=ave.so,
       col = "gray85",              
       pch = 16,                    
       cex = 0.75)        
points(x=league.year, 
     y=league.ave,
     type="b",
     pch=20,
     lty=1,
lwd=1,
col="steel blue",
bg="steel blue",
)

lines(x=league.year, 
     y=league.ave,
lty=1,
lwd=1,
col="steel blue",
)
axis(side=1,
     lwd=2,
     hadj=-0.1,
     padj=-1,
     at=seq(from=1900,
            to=2012,
            by=10),cex.axis=.7)

axis(side=4, 
      lwd=0,
      las=1,
      col.axis="gray85",
      at=seq(from=0,
             to=9,
             by=1),
     cex.axis=.7)
     
title(xlab="Year", ylab="Strikeouts")
title(main="Strikeouts on the Rise", adj=0)  
#I have seen how to do this in ggplot
title(main="There were more strikeouts in 2012 than at any other time in major league history", cex=0.4, line=10.7)


text(1916, 1,"U.S. enters \n World War I.", adj=0.5, col="gray85", cex=0.5)
text(1945, 1.8, "Players return \n from World War II", adj=0.5, col="gray85", cex=0.5)
text(1963, 2.8, "Strike zone enlarged \n from 1963-1968", adj=0.5, col="gray85", cex=0.5)
text(1972, 1.7, "Designated hitter \n rule took effect.", adj=0.5, col="gray85", cex=0.5)
text(2008, 3.75,"Mitchell report \n on steroids.", adj=0.5, col="gray85", cex=0.5)
text(1969, 8.9,"Pitching had become so dominant \n in the 1960s that the mound \n was lowered in 1969.", adj=0.5, col="gray85", cex=0.5)

segments(1917, 1.1, 1917, 3.47, col= "gray85")
segments(1946, 1.9, 1946, 3.88, col= "gray85")
segments(1963, 2.9, 1963, 5.80, col= "gray85")
segments(1973, 1.8, 1973, 5.20, col= "gray85")
segments(2008, 3.76, 2008, 6.80, col= "gray85")
segments(1969, 5.83, 1969, 8.2, col= "gray85")

legend("topleft",legend=c("League average", "Choose a team"),
       col= c("steel blue", "darkgoldenrod2"), lty=1:1, cex=0.5,
       title= "Strikeouts per game per team", box.lty=0)       

abline(h=seq(1,9, 1), col="lightgray", lty=3)
   
points(1924,2.7, 
        col="black", cex=1.5, lwd=2 )

points(2012,7.5,
        col="black", cex=1.5, lwd=2 )

text(1924,2.5, "2.7",  
      col="steel blue", pos=1,cex=1.5, adj=0.5)
text(1924,1.8, "League average \n 1924",  
      col="black", pos=1, cex=0.5,adj=0.5, )
text(2010,7.5, "7.5",  
      col="steel blue", pos=3, cex=1.5, adj=0.5)
text(2010,8.5, "2012 \n League average",  
      col="black", pos=3, cex=0.5, adj=0.5)