Administrative

Please indicate

  • Roughly how much time you spent on this HW so far: 3 hours
  • The URL of the RPubs published URL http://rpubs.com/sfairops/hw3.
  • What gave you the most trouble: Problem 3
  • Any comments you have: No comments

Problem 1.

Define two new variables in the Teams data frame: batting average (BA) and slugging percentage (SLG). Batting average is the ratio of hits (H) to at-bats (AB), and slugging percentage is the total bases divided by at-bats. To compute the total bases, you get 1 for a single, 2 for a double, 3 for a triple, and 4 for a home run.

library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
library(Lahman)
library(ggthemes)
data(Teams)
Teams1 <- Teams %>%
  mutate(BA = H / AB,
         SLG = (H + X2B + 2 * X3B + 3 * HR) / AB)

Problem 2.

Plot a time series of SLG since 1954 by league (lgID). Is slugging percentage typically higher in the American League (AL) or the National League?

library(ggthemes)
Teams1 %>%  
  filter(yearID >= 1954) %>%
  ggplot(aes(x = yearID, y = SLG)) +
  geom_point(size = 0.5, color = "firebrick", alpha = 0.5) +
  geom_smooth(method = "lm", se = TRUE) +
  facet_wrap(~ lgID) +
  labs(x = "Year (post-1954)", y = "Slugging percentage") +
  theme_economist()

ANSWER: Slugging Percentage (SLG) is typically higher in the American league (AL)

Problem 3.

Display the top 15 teams ranked in terms of slugging percentage in MLB history. Repeat this using teams since 1969.

Teams1 %>%
  tail(sort(SLG, decreasing = TRUE), n = 15) %>%
  ggplot(aes(x = reorder(teamID, SLG), y = SLG, fill = lgID)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  ylim(0, 0.45) +
  labs(x = NULL, y = NULL, 
       title = "Top 15 teams in MLB history",
       subtitle = "Based on slugging percentage") +
  theme_fivethirtyeight() +
  theme(legend.position = "none")

Teams1 %>%
  filter(yearID >= 1969) %>%
  tail(sort(SLG, decreasing = TRUE), n = 15) %>%
  ggplot(aes(x = reorder(teamID, SLG), y = SLG, fill = lgID)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  ylim(0, 0.45) +
  labs(x = NULL, y = NULL, 
       title = "Top 15 MLB teams post-1969",
       subtitle = "Based on slugging percentage") +
  theme_fivethirtyeight() +
  theme(legend.position = "none")

Problem 4.

The Angles have at times been called the California Angles (CAL), the Anaheim Angels (ANA), and the Los Angeles Angels (LAA). Find the 10 most successful seasons in Angels history. Have they ever won the world series?

Teams %>%
  mutate(Success = W / L) %>%
  filter(teamID %in% c("CAL", "ANA", "LAA")) %>%
  tail(sort(Success, decreasing = TRUE), n = 10) %>%
  arrange(-Success)
##    yearID lgID teamID franchID divID Rank   G Ghome   W  L DivWin WCWin
## 1    2008   AL    LAA      ANA     W    1 162    81 100 62      Y     N
## 2    2014   AL    LAA      ANA     W    1 162    81  98 64      Y     N
## 3    2009   AL    LAA      ANA     W    1 162    81  97 65      Y     N
## 4    2007   AL    LAA      ANA     W    1 162    81  94 68      Y     N
## 5    2006   AL    LAA      ANA     W    2 162    81  89 73      N     N
## 6    2012   AL    LAA      ANA     W    3 162    81  89 73      N     N
## 7    2011   AL    LAA      ANA     W    2 162    81  86 76      N     N
## 8    2015   AL    LAA      ANA     W    3 162    81  85 77      N     N
## 9    2010   AL    LAA      ANA     W    3 162    81  80 82      N     N
## 10   2013   AL    LAA      ANA     W    3 162    81  78 84      N     N
##    LgWin WSWin   R   AB    H X2B X3B  HR  BB   SO  SB CS HBP SF  RA  ER
## 1      N     N 765 5540 1486 274  25 159 481  987 129 48  52 50 697 644
## 2      N     N 773 5652 1464 304  31 155 492 1266  81 39  60 54 630 590
## 3      N     N 883 5622 1604 293  33 173 547 1054 148 63  41 52 761 715
## 4      N     N 822 5554 1578 324  23 123 507  883 139 55  40 65 731 674
## 5      N     N 766 5609 1539 309  29 159 486  914 148 57  42 53 732 652
## 6      N     N 767 5536 1518 273  22 187 449 1113 134 33  47 41 699 640
## 7      N     N 667 5513 1394 289  34 155 442 1086 135 52  51 32 633 581
## 8      N     N 661 5417 1331 243  21 176 435 1150  52 34  58 40 675 630
## 9      N     N 681 5488 1363 276  19 155 466 1070 104 52  52 37 702 651
## 10     N     N 733 5588 1476 270  39 164 523 1221  82 34  48 64 737 685
##     ERA CG SHO SV IPouts   HA HRA BBA  SOA   E  DP    FP
## 1  3.99  7  10 66   4354 1455 160 457 1106  91 159 0.985
## 2  3.58  3  13 46   4448 1307 126 504 1342  83 127 0.986
## 3  4.45  9  13 51   4335 1513 180 523 1062  85 174 0.986
## 4  4.23  5   9 43   4305 1480 151 477 1156 101 154 0.983
## 5  4.04  5  12 50   4358 1410 158 471 1164 124 154 0.979
## 6  4.02  6  16 38   4300 1339 186 483 1157  98 141 0.984
## 7  3.57 12  11 39   4395 1388 142 476 1058  93 157 0.985
## 8  3.94  2  12 46   4322 1355 166 466 1221  93 108 0.984
## 9  4.04 10   9 39   4348 1422 148 565 1130 113 116 0.981
## 10 4.23  4  12 41   4373 1475 167 533 1200 112 135 0.981
##                             name                     park attendance BPF
## 1  Los Angeles Angels of Anaheim            Angel Stadium    3336747 103
## 2  Los Angeles Angels of Anaheim Angel Stadium of Anaheim    3095935  96
## 3  Los Angeles Angels of Anaheim            Angel Stadium    3240386  99
## 4  Los Angeles Angels of Anaheim            Angel Stadium    3365632 101
## 5  Los Angeles Angels of Anaheim            Angel Stadium    3406790 100
## 6  Los Angeles Angels of Anaheim Angel Stadium of Anaheim    3061770  92
## 7  Los Angeles Angels of Anaheim            Angel Stadium    3166321  93
## 8  Los Angeles Angels of Anaheim Angel Stadium of Anaheim    3012765  94
## 9  Los Angeles Angels of Anaheim            Angel Stadium    3250816  98
## 10 Los Angeles Angels of Anaheim Angel Stadium of Anaheim    3019505  94
##    PPF teamIDBR teamIDlahman45 teamIDretro   Success
## 1  102      LAA            ANA         ANA 1.6129032
## 2   95      LAA            ANA         ANA 1.5312500
## 3   98      LAA            ANA         ANA 1.4923077
## 4  100      LAA            ANA         ANA 1.3823529
## 5  100      LAA            ANA         ANA 1.2191781
## 6   92      LAA            ANA         ANA 1.2191781
## 7   93      LAA            ANA         ANA 1.1315789
## 8   95      LAA            ANA         ANA 1.1038961
## 9   98      LAA            ANA         ANA 0.9756098
## 10  94      LAA            ANA         ANA 0.9285714

ANSWER: Top 10 most successful seasons have been determined based on the team’s highest Wins to Losses ratios.

Teams %>%
  filter(WSWin == "Y" & teamID %in% c("CAL", "ANA", "LAA"))
##   yearID lgID teamID franchID divID Rank   G Ghome  W  L DivWin WCWin
## 1   2002   AL    ANA      ANA     W    2 162    81 99 63      N     Y
##   LgWin WSWin   R   AB    H X2B X3B  HR  BB  SO  SB CS HBP SF  RA  ER  ERA
## 1     Y     Y 851 5678 1603 333  32 152 462 805 117 51  74 64 644 595 3.69
##   CG SHO SV IPouts   HA HRA BBA SOA  E  DP    FP           name
## 1  7  14 54   4357 1345 169 509 999 87 151 0.986 Anaheim Angels
##                         park attendance BPF PPF teamIDBR teamIDlahman45
## 1 Edison International Field    2305547 100  99      ANA            ANA
##   teamIDretro
## 1         ANA

ANSWER: Yes, the Angels have won the World Series once in 2002.