Please indicate

  • Roughly how much time you spent on this HW so far: Around three hours.
  • The URL of the RPubs published URL here.
  • What gave you the most trouble: Figuring out how the calculate the SLG.
  • Any comments you have: Nope!

  • Load the Lahman R package to gain access to the data tables necessary to complete this assignment. Remember that help files are available for each data table. For example, ?Teams will pull up the help file for the Teams data table.

library(tidyverse)
library(Lahman)
Teams <- Teams

Problem 1: Adding Batting Average and Slugging Percentage

Define two new variables in the Teams data frame: batting average (BA) and slugging percentage (SLG). Batting average is the ratio of hits (H) to at-bats (AB), and slugging percentage is the total bases divided by at-bats. To compute the total bases, you get 1 for a single, 2 for a double, 3 for a triple, and 4 for a home run.

#Making a variable for number of single bases
B <- Teams$H - Teams$X2B - Teams$X3B - Teams$HR 

#Making a table for BA and SLG 
Teams1 <- Teams %>% 
mutate(BA = H / AB) %>% 
  mutate(SLG = (B+ HR + 2* X2B + 3* X3B + 4 * HR )/ AB)

Problem 2: Graphing SLG by League and Comparing

Plot a time series of SLG since 1954 by league (lgID). Is slugging percentage typically higher in the American League (AL) or the National League?

#Making a specific table for the time series visualization: select relevant columns, filter to AL, NL for league, and yearID to be greater than 1953 
LeagueTable <- Teams1 %>% 
  select(SLG, lgID, yearID) %>% 
  filter(lgID == "AL" | lgID == "NL") %>% 
  filter(yearID > 1953)

#Making time series visualization with geom_line()
SLine <- LeagueTable %>%
  ggplot(mapping = aes(x=yearID, y=SLG))+ 
  geom_line(size = .5) +
  facet_wrap(~lgID)+
  labs(title="SLG by League from 1954-2015", 
       subtitle = "AL represents American League, NL represents National League",
       x = "Year", 
       y= "SLG")
SLine

#Using the summarize function to safely answer the question "typically" seems to denote average 
ALslg <- LeagueTable %>% 
  group_by(lgID) %>%
  filter(lgID == "AL") %>% 
  summarize(mean(SLG))

NLslg <- LeagueTable %>% 
  group_by(lgID) %>% 
  filter(lgID == "NL") %>% 
  summarize(mean(SLG))

ALslg #(0.427) average SLG 
## # A tibble: 1 × 2
##     lgID `mean(SLG)`
##   <fctr>       <dbl>
## 1     AL   0.4279684
NLslg #(0.417) average SLG 
## # A tibble: 1 × 2
##     lgID `mean(SLG)`
##   <fctr>       <dbl>
## 1     NL   0.4178288
#Solution: The American League had a higher overall Slugging Percentage compared to the National League, but this was extremely close.

Problem 3: Ranking Teams by Slugging Percentage

Display the top 15 teams ranked in terms of slugging percentage in MLB history. Repeat this using teams since 1969.

#Making a table based on total SLG percentage throughout MLB history

TotalSLG <- Teams1 %>% 
  select(teamID, name, SLG, yearID) %>%
  group_by(name) %>% 
  mutate(MeanSLG = mean(SLG, na.rm =TRUE)) %>% #MeanSLG is the average SLG for the team across MLB history
  select(name, MeanSLG) %>%
  arrange(desc(MeanSLG), name) %>% 
  unique() 

TotalSLG[1:15,1:2]
## Source: local data frame [15 x 2]
## Groups: name [15]
## 
##                             name   MeanSLG
##                            <chr>     <dbl>
## 1               Colorado Rockies 0.4751648
## 2             Cincinnati Redlegs 0.4520375
## 3                 Anaheim Angels 0.4518891
## 4              Toronto Blue Jays 0.4464372
## 5               Milwaukee Braves 0.4455139
## 6           Arizona Diamondbacks 0.4451966
## 7                  Texas Rangers 0.4423956
## 8  Los Angeles Angels of Anaheim 0.4419985
## 9               New York Yankees 0.4374726
## 10                Tampa Bay Rays 0.4357002
## 11          Tampa Bay Devil Rays 0.4327704
## 12               Florida Marlins 0.4325307
## 13              Seattle Mariners 0.4319518
## 14               Minnesota Twins 0.4238259
## 15             Oakland Athletics 0.4234512
#Making a total SLG percentage table while filtering out years less than 1969. 

TS69 <- Teams1 %>% 
  select(teamID, name, SLG, yearID) %>%
  group_by(name) %>% 
  filter(yearID>1968) %>%
  mutate(MeanSLG = mean(SLG, na.rm =TRUE)) %>% 
  select(name, MeanSLG) %>%
  arrange(desc(MeanSLG), name) %>% 
  unique()

TS69[1:15, 1:2]
## Source: local data frame [15 x 2]
## Groups: name [15]
## 
##                             name   MeanSLG
##                            <chr>     <dbl>
## 1               Colorado Rockies 0.4751648
## 2                 Boston Red Sox 0.4563293
## 3                 Anaheim Angels 0.4518891
## 4               New York Yankees 0.4471042
## 5              Toronto Blue Jays 0.4464372
## 6           Arizona Diamondbacks 0.4451966
## 7                  Texas Rangers 0.4423956
## 8  Los Angeles Angels of Anaheim 0.4419985
## 9                 Detroit Tigers 0.4411975
## 10             Baltimore Orioles 0.4379894
## 11                Tampa Bay Rays 0.4357002
## 12             Chicago White Sox 0.4331833
## 13          Tampa Bay Devil Rays 0.4327704
## 14               Florida Marlins 0.4325307
## 15              Seattle Mariners 0.4319518

Problem 4: Looking at the Angels Most Successful Seasons

The Angles have at times been called the California Angles (CAL), the Anaheim Angels (ANA), and the Los Angeles Angels (LAA). Find the 10 most successful seasons in Angels history. Have they ever won the world series?

#Making a data table of Win/Loss Ratio for Angels 

Angels <- Teams %>% 
  select(yearID, teamID, W, L, WSWin)%>% 
  filter(teamID == "CAL" | teamID == "ANA" | teamID == "LAA") %>% 
  mutate(WinLoss = W - L) %>% 
  arrange(desc(WinLoss)) %>% 
  select(yearID, W, L, WinLoss, WSWin)

Angels[1:10, 1:5]
##    yearID   W  L WinLoss WSWin
## 1    2008 100 62      38     N
## 2    2002  99 63      36     Y
## 3    2014  98 64      34     N
## 4    2009  97 65      32     N
## 5    2005  95 67      28     N
## 6    2007  94 68      26     N
## 7    1982  93 69      24     N
## 8    1986  92 70      22     N
## 9    2004  92 70      22     N
## 10   1989  91 71      20     N
#Determining the number of World Series the Angels won 

WST <- Angels %>% 
  select(WSWin) %>% 
  filter(WSWin == "Y") %>% 
  summarize(n())

WST
##   n()
## 1   1
#Yes, the Angels won the world series once in 2002.