Administrative

Please indicate

  • Roughly how much time you spent on this HW so far: 2 hours
  • The URL of the RPubs published URL here.
  • What gave you the most trouble: using group_by and summarise to make the plot.
  • Any comments you have:
library(tidyverse)
library(Lahman)

Problem 1.

Define two new variables in the Teams data frame: batting average (BA) and slugging percentage (SLG). Batting average is the ratio of hits (H) to at-bats (AB), and slugging percentage is the total bases divided by at-bats. To compute the total bases, you get 1 for a single, 2 for a double, 3 for a triple, and 4 for a home run.

Teams <- 
  Teams%>%
    mutate(BA = H/AB) %>%
    mutate(SLG = (H+2*X2B+3*X3B+4*HR)/AB)

Problem 2.

Plot a time series of SLG since 1954 by league (lgID). Is slugging percentage typically higher in the American League (AL) or the National League?

library(ggthemes)
Teams %>%
  filter(yearID >= 1954) %>%
  group_by(yearID, lgID) %>%
  summarise(avg_slg = mean(SLG)) %>%
  ggplot(aes(x = yearID, y = avg_slg*100, color = lgID))+
    geom_point()+
    labs(x = "Year", y = "Slugging %", title = "Slugging Percentage Since 1954")+
    theme_linedraw()+
    theme(legend.title = element_blank())+
    geom_smooth(se = FALSE)+
    scale_color_brewer(type = "qual", palette = "Dark2")

Problem 3.

Display the top 15 teams ranked in terms of slugging percentage in MLB history. Repeat this using teams since 1969.

Top 15 teams in MLB history by SLG

Teams %>%
  select(yearID, teamID, SLG) %>%
  arrange(desc(SLG)) %>%
  head(15)
##    yearID teamID       SLG
## 1    2003    BOS 0.6033975
## 2    1927    NYA 0.5922947
## 3    1997    SEA 0.5908443
## 4    1996    SEA 0.5906845
## 5    1930    NYA 0.5904919
## 6    1994    CLE 0.5900050
## 7    2001    COL 0.5880492
## 8    1936    NYA 0.5871937
## 9    2009    NYA 0.5818021
## 10   2004    BOS 0.5807692
## 11   1995    CLE 0.5799523
## 12   2000    HOU 0.5797127
## 13   1930    CHN 0.5791077
## 14   2003    ATL 0.5790123
## 15   1999    TEX 0.5783047

Top 15 teams in MLB since 1969 by SLG

Teams %>%
  filter(yearID >= 1969) %>%
  select(yearID, teamID, SLG) %>%
  arrange(desc(SLG)) %>%
  head(15)
##    yearID teamID       SLG
## 1    2003    BOS 0.6033975
## 2    1997    SEA 0.5908443
## 3    1996    SEA 0.5906845
## 4    1994    CLE 0.5900050
## 5    2001    COL 0.5880492
## 6    2009    NYA 0.5818021
## 7    2004    BOS 0.5807692
## 8    1995    CLE 0.5799523
## 9    2000    HOU 0.5797127
## 10   2003    ATL 0.5790123
## 11   1999    TEX 0.5783047
## 12   1996    CLE 0.5766590
## 13   2000    SFN 0.5760101
## 14   1997    COL 0.5755845
## 15   2001    TEX 0.5753738

Problem 4.

The Angles have at times been called the California Angles (CAL), the Anaheim Angels (ANA), and the Los Angeles Angels (LAA). Find the 10 most successful seasons in Angels history. Have they ever won the world series?

10 most successful seasons in Angels history by win percentage

Teams %>%
  filter(teamID %in% c("CAL", "ANA", "LAA")) %>%
  mutate(wpct = W/(W+L)) %>%
  select(yearID, teamID, wpct) %>%
  arrange(desc(wpct)) %>%
  head(10)
##    yearID teamID      wpct
## 1    2008    LAA 0.6172840
## 2    2002    ANA 0.6111111
## 3    2014    LAA 0.6049383
## 4    2009    LAA 0.5987654
## 5    2005    LAA 0.5864198
## 6    2007    LAA 0.5802469
## 7    1982    CAL 0.5740741
## 8    1986    CAL 0.5679012
## 9    2004    ANA 0.5679012
## 10   1989    CAL 0.5617284

Have the Angels ever won the world series?

Teams %>%
  filter(teamID %in% c("CAL", "ANA", "LAA") & WSWin == "Y") %>%
  select(yearID, teamID, WSWin)
##   yearID teamID WSWin
## 1   2002    ANA     Y