Administrative

Please indicate

  • Roughly how much time you spent on this HW so far: 1 hour
  • The URL of the RPubs published URL here.
  • What gave you the most trouble: the graphic
  • Any comments you have: thanks for the extension!

Problem 1.

Define two new variables in the Teams data frame: batting average (BA) and slugging percentage (SLG). Batting average is the ratio of hits (H) to at-bats (AB), and slugging percentage is the total bases divided by at-bats. To compute the total bases, you get 1 for a single, 2 for a double, 3 for a triple, and 4 for a home run.

library(Lahman)
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
Teams_new <- mutate(Teams, BA = H/AB, SLG = ((H + (2*X2B) + (3*X3B) + (4*HR))/AB))
head(Teams_new)
##   yearID lgID teamID franchID divID Rank  G Ghome  W  L DivWin WCWin LgWin
## 1   1871   NA    BS1      BNA  <NA>    3 31    NA 20 10   <NA>  <NA>     N
## 2   1871   NA    CH1      CNA  <NA>    2 28    NA 19  9   <NA>  <NA>     N
## 3   1871   NA    CL1      CFC  <NA>    8 29    NA 10 19   <NA>  <NA>     N
## 4   1871   NA    FW1      KEK  <NA>    7 19    NA  7 12   <NA>  <NA>     N
## 5   1871   NA    NY2      NNA  <NA>    5 33    NA 16 17   <NA>  <NA>     N
## 6   1871   NA    PH1      PNA  <NA>    1 28    NA 21  7   <NA>  <NA>     Y
##   WSWin   R   AB   H X2B X3B HR BB SO SB CS HBP SF  RA  ER  ERA CG SHO SV
## 1  <NA> 401 1372 426  70  37  3 60 19 73 NA  NA NA 303 109 3.55 22   1  3
## 2  <NA> 302 1196 323  52  21 10 60 22 69 NA  NA NA 241  77 2.76 25   0  1
## 3  <NA> 249 1186 328  35  40  7 26 25 18 NA  NA NA 341 116 4.11 23   0  0
## 4  <NA> 137  746 178  19   8  2 33  9 16 NA  NA NA 243  97 5.17 19   1  0
## 5  <NA> 302 1404 403  43  21  1 33 15 46 NA  NA NA 313 121 3.72 32   1  0
## 6  <NA> 376 1281 410  66  27  9 46 23 56 NA  NA NA 266 137 4.95 27   0  0
##   IPouts  HA HRA BBA SOA   E DP   FP                    name
## 1    828 367   2  42  23 225 NA 0.83    Boston Red Stockings
## 2    753 308   6  28  22 218 NA 0.82 Chicago White Stockings
## 3    762 346  13  53  34 223 NA 0.81  Cleveland Forest Citys
## 4    507 261   5  21  17 163 NA 0.80    Fort Wayne Kekiongas
## 5    879 373   7  42  22 227 NA 0.83        New York Mutuals
## 6    747 329   3  53  16 194 NA 0.84  Philadelphia Athletics
##                           park attendance BPF PPF teamIDBR teamIDlahman45
## 1          South End Grounds I         NA 103  98      BOS            BS1
## 2      Union Base-Ball Grounds         NA 104 102      CHI            CH1
## 3 National Association Grounds         NA  96 100      CLE            CL1
## 4               Hamilton Field         NA 101 107      KEK            FW1
## 5     Union Grounds (Brooklyn)         NA  90  88      NYU            NY2
## 6     Jefferson Street Grounds         NA 102  98      ATH            PH1
##   teamIDretro        BA       SLG
## 1         BS1 0.3104956 0.5021866
## 2         CH1 0.2700669 0.4431438
## 3         CL1 0.2765599 0.4603710
## 4         FW1 0.2386059 0.3324397
## 5         NY2 0.2870370 0.3960114
## 6         PH1 0.3200625 0.5144418
#this is assuming that H gives singles Teams_new

Problem 2.

Plot a time series of SLG since 1954 by league (lgID). Is slugging percentage typically higher in the American League (AL) or the National League?

Teams_new %>%
ggplot(aes(x = yearID, y = SLG, color = lgID)) +
  geom_smooth(se = FALSE) +
  xlim(1954, 2020)
## `geom_smooth()` using method = 'gam'
## Warning: Removed 1247 rows containing non-finite values (stat_smooth).

#SLG is typically higher in the American League 

Problem 3.

Display the top 15 teams ranked in terms of slugging percentage in MLB history. Repeat this using teams since 1969.

Teams_new %>%
  arrange(desc(SLG)) %>%
  head(n = 15)
##    yearID lgID teamID franchID divID Rank   G Ghome   W  L DivWin WCWin
## 1    2003   AL    BOS      BOS     E    2 162    81  95 67      N     Y
## 2    1927   AL    NYA      NYY  <NA>    1 155    77 110 44   <NA>  <NA>
## 3    1997   AL    SEA      SEA     W    1 162    81  90 72      Y     N
## 4    1996   AL    SEA      SEA     W    2 161    81  85 76      N     N
## 5    1930   AL    NYA      NYY  <NA>    3 154    76  86 68   <NA>  <NA>
## 6    1994   AL    CLE      CLE     C    2 113    51  66 47   <NA>  <NA>
## 7    2001   NL    COL      COL     W    5 162    81  73 89      N     N
## 8    1936   AL    NYA      NYY  <NA>    1 155    77 102 51   <NA>  <NA>
## 9    2009   AL    NYA      NYY     E    1 162    81 103 59      Y     N
## 10   2004   AL    BOS      BOS     E    2 162    81  98 64      N     Y
## 11   1995   AL    CLE      CLE     C    1 144    72 100 44      Y     N
## 12   2000   NL    HOU      HOU     C    4 162    81  72 90      N     N
## 13   1930   NL    CHN      CHC  <NA>    2 156    79  90 64   <NA>  <NA>
## 14   2003   NL    ATL      ATL     E    1 162    81 101 61      Y     N
## 15   1999   AL    TEX      TEX     W    1 162    81  95 67      Y     N
##    LgWin WSWin    R   AB    H X2B X3B  HR  BB   SO  SB CS HBP SF  RA  ER
## 1      N     N  961 5769 1667 371  40 238 620  943  88 35  53 64 809 729
## 2      Y     Y  975 5347 1644 291 103 158 635  605  90 64  NA NA 599 494
## 3      N     N  925 5614 1574 312  21 264 626 1110  89 40  NA NA 833 770
## 4      N     N  993 5668 1625 343  19 245 670 1052  90 39  NA NA 895 828
## 5      N     N 1062 5448 1683 298 110 152 644  569  91 60  NA NA 898 741
## 6   <NA>  <NA>  679 4022 1165 240  20 167 382  629 131 48  NA NA 562 493
## 7      N     N  923 5690 1663 324  61 213 511 1027 132 54  61 50 906 841
## 8      Y     Y 1065 5591 1676 315  83 182 700  594  77 40  NA NA 731 649
## 9      Y     Y  915 5660 1604 325  21 244 663 1014 111 28  54 39 753 687
## 10     Y     Y  949 5720 1613 373  25 222 659 1189  68 30  69 55 768 674
## 11     Y     N  840 5028 1461 279  23 207 542  766 132 53  NA NA 607 554
## 12     N     N  938 5570 1547 289  36 249 673 1129 114 52  83 61 944 865
## 13     N     N  998 5581 1722 305  72 171 588  635  70 NA  NA NA 870 748
## 14     N     N  907 5670 1608 321  31 235 545  933  68 22  49 49 740 663
## 15     N     N  945 5651 1653 304  29 230 611  937 111 54  NA NA 859 809
##     ERA CG SHO SV IPouts   HA HRA BBA  SOA   E  DP    FP              name
## 1  4.48  5   6 36   4394 1503 153 488 1141 113 130 0.982    Boston Red Sox
## 2  3.20 82  11 20   4167 1403  42 409  431 196 123 0.960  New York Yankees
## 3  4.79  9   8 38   4341 1500 192 598 1207 123 143 0.980  Seattle Mariners
## 4  5.21  4   4 34   4293 1562 216 605 1000 110 155 0.980  Seattle Mariners
## 5  4.88 65   7 15   4101 1566  93 524  572 208 132 0.960  New York Yankees
## 6  4.36 17   5 21   3054 1097  94 404  666  90 119 0.980 Cleveland Indians
## 7  5.29  8   8 26   4290 1522 239 598 1058  96 167 0.984  Colorado Rockies
## 8  4.17 77   6 21   4200 1474  84 663  624 163 148 0.970  New York Yankees
## 9  4.26  3   8 51   4350 1386 181 574 1260  86 131 0.985  New York Yankees
## 10 4.18  4  12 36   4354 1430 159 447 1132 118 129 0.981    Boston Red Sox
## 11 3.83 10  10 50   3903 1261 135 445  926 101 142 0.980 Cleveland Indians
## 12 5.42  8   2 30   4313 1596 234 598 1064 133 149 0.978    Houston Astros
## 13 4.80 67   6 12   4209 1642 111 528  601 170 167 0.970      Chicago Cubs
## 14 4.10  4   7 51   4369 1425 147 555  992 121 166 0.981    Atlanta Braves
## 15 5.07  6   9 47   4306 1626 186 509  979 119 167 0.980     Texas Rangers
##                         park attendance BPF PPF teamIDBR teamIDlahman45
## 1             Fenway Park II    2724165 105 103      BOS            BOS
## 2           Yankee Stadium I    1164015  98  94      NYY            NYA
## 3                   Kingdome    3192237  98  98      SEA            SEA
## 4                   Kingdome    2723850 100  99      SEA            SEA
## 5           Yankee Stadium I    1169230  96  93      NYY            NYA
## 6               Jacobs Field    1995174  99  97      CLE            CLE
## 7                Coors Field    3166821 122 122      COL            COL
## 8           Yankee Stadium I     976913  97  93      NYY            NYA
## 9         Yankee Stadium III    3719358 105 103      NYY            NYA
## 10            Fenway Park II    2837294 106 105      BOS            BOS
## 11              Jacobs Field    2842745 101  99      CLE            CLE
## 12               Enron Field    3056139 107 107      HOU            HOU
## 13             Wrigley Field    1463624 101  98      CHC            CHN
## 14              Turner Field    2401084 101 100      ATL            ATL
## 15 The Ballpark at Arlington    2771469 105 105      TEX            TEX
##    teamIDretro        BA       SLG
## 1          BOS 0.2889582 0.6033975
## 2          NYA 0.3074621 0.5922947
## 3          SEA 0.2803705 0.5908443
## 4          SEA 0.2866972 0.5906845
## 5          NYA 0.3089207 0.5904919
## 6          CLE 0.2896569 0.5900050
## 7          COL 0.2922671 0.5880492
## 8          NYA 0.2997675 0.5871937
## 9          NYA 0.2833922 0.5818021
## 10         BOS 0.2819930 0.5807692
## 11         CLE 0.2905728 0.5799523
## 12         HOU 0.2777379 0.5797127
## 13         CHN 0.3085469 0.5791077
## 14         ATL 0.2835979 0.5790123
## 15         TEX 0.2925146 0.5783047
Teams_new %>%
  filter(yearID >= 1969) %>%
  arrange(desc(SLG)) %>% 
  head(n = 15)
##    yearID lgID teamID franchID divID Rank   G Ghome   W  L DivWin WCWin
## 1    2003   AL    BOS      BOS     E    2 162    81  95 67      N     Y
## 2    1997   AL    SEA      SEA     W    1 162    81  90 72      Y     N
## 3    1996   AL    SEA      SEA     W    2 161    81  85 76      N     N
## 4    1994   AL    CLE      CLE     C    2 113    51  66 47   <NA>  <NA>
## 5    2001   NL    COL      COL     W    5 162    81  73 89      N     N
## 6    2009   AL    NYA      NYY     E    1 162    81 103 59      Y     N
## 7    2004   AL    BOS      BOS     E    2 162    81  98 64      N     Y
## 8    1995   AL    CLE      CLE     C    1 144    72 100 44      Y     N
## 9    2000   NL    HOU      HOU     C    4 162    81  72 90      N     N
## 10   2003   NL    ATL      ATL     E    1 162    81 101 61      Y     N
## 11   1999   AL    TEX      TEX     W    1 162    81  95 67      Y     N
## 12   1996   AL    CLE      CLE     C    1 161    80  99 62      Y     N
## 13   2000   NL    SFN      SFG     W    1 162    81  97 65      Y     N
## 14   1997   NL    COL      COL     W    3 162    81  83 79      N     N
## 15   2001   AL    TEX      TEX     W    4 162    82  73 89      N     N
##    LgWin WSWin   R   AB    H X2B X3B  HR  BB   SO  SB CS HBP SF  RA  ER
## 1      N     N 961 5769 1667 371  40 238 620  943  88 35  53 64 809 729
## 2      N     N 925 5614 1574 312  21 264 626 1110  89 40  NA NA 833 770
## 3      N     N 993 5668 1625 343  19 245 670 1052  90 39  NA NA 895 828
## 4   <NA>  <NA> 679 4022 1165 240  20 167 382  629 131 48  NA NA 562 493
## 5      N     N 923 5690 1663 324  61 213 511 1027 132 54  61 50 906 841
## 6      Y     Y 915 5660 1604 325  21 244 663 1014 111 28  54 39 753 687
## 7      Y     Y 949 5720 1613 373  25 222 659 1189  68 30  69 55 768 674
## 8      Y     N 840 5028 1461 279  23 207 542  766 132 53  NA NA 607 554
## 9      N     N 938 5570 1547 289  36 249 673 1129 114 52  83 61 944 865
## 10     N     N 907 5670 1608 321  31 235 545  933  68 22  49 49 740 663
## 11     N     N 945 5651 1653 304  29 230 611  937 111 54  NA NA 859 809
## 12     N     N 952 5681 1665 335  23 218 671  844 160 50  NA NA 769 702
## 13     N     N 925 5519 1535 304  44 226 709 1032  79 39  51 66 747 675
## 14     N     N 923 5603 1611 269  40 239 562 1060 137 65  NA NA 908 835
## 15     N     N 890 5685 1566 326  23 246 548 1093  97 32  75 55 968 913
##     ERA CG SHO SV IPouts   HA HRA BBA  SOA   E  DP    FP
## 1  4.48  5   6 36   4394 1503 153 488 1141 113 130 0.982
## 2  4.79  9   8 38   4341 1500 192 598 1207 123 143 0.980
## 3  5.21  4   4 34   4293 1562 216 605 1000 110 155 0.980
## 4  4.36 17   5 21   3054 1097  94 404  666  90 119 0.980
## 5  5.29  8   8 26   4290 1522 239 598 1058  96 167 0.984
## 6  4.26  3   8 51   4350 1386 181 574 1260  86 131 0.985
## 7  4.18  4  12 36   4354 1430 159 447 1132 118 129 0.981
## 8  3.83 10  10 50   3903 1261 135 445  926 101 142 0.980
## 9  5.42  8   2 30   4313 1596 234 598 1064 133 149 0.978
## 10 4.10  4   7 51   4369 1425 147 555  992 121 166 0.981
## 11 5.07  6   9 47   4306 1626 186 509  979 119 167 0.980
## 12 4.35 13   9 46   4356 1530 173 484 1033 124 156 0.980
## 13 4.21  9  15 47   4333 1452 151 623 1076  93 173 0.985
## 14 5.25  9   5 38   4296 1697 196 566  870 111 202 0.980
## 15 5.71  4   3 37   4315 1670 222 596  951 114 167 0.981
##                    name                      park attendance BPF PPF
## 1        Boston Red Sox            Fenway Park II    2724165 105 103
## 2      Seattle Mariners                  Kingdome    3192237  98  98
## 3      Seattle Mariners                  Kingdome    2723850 100  99
## 4     Cleveland Indians              Jacobs Field    1995174  99  97
## 5      Colorado Rockies               Coors Field    3166821 122 122
## 6      New York Yankees        Yankee Stadium III    3719358 105 103
## 7        Boston Red Sox            Fenway Park II    2837294 106 105
## 8     Cleveland Indians              Jacobs Field    2842745 101  99
## 9        Houston Astros               Enron Field    3056139 107 107
## 10       Atlanta Braves              Turner Field    2401084 101 100
## 11        Texas Rangers The Ballpark at Arlington    2771469 105 105
## 12    Cleveland Indians              Jacobs Field    3318174  99  98
## 13 San Francisco Giants              PacBell Park    3318800  93  92
## 14     Colorado Rockies               Coors Field    3888453 122 123
## 15        Texas Rangers The Ballpark at Arlington    2831021 104 105
##    teamIDBR teamIDlahman45 teamIDretro        BA       SLG
## 1       BOS            BOS         BOS 0.2889582 0.6033975
## 2       SEA            SEA         SEA 0.2803705 0.5908443
## 3       SEA            SEA         SEA 0.2866972 0.5906845
## 4       CLE            CLE         CLE 0.2896569 0.5900050
## 5       COL            COL         COL 0.2922671 0.5880492
## 6       NYY            NYA         NYA 0.2833922 0.5818021
## 7       BOS            BOS         BOS 0.2819930 0.5807692
## 8       CLE            CLE         CLE 0.2905728 0.5799523
## 9       HOU            HOU         HOU 0.2777379 0.5797127
## 10      ATL            ATL         ATL 0.2835979 0.5790123
## 11      TEX            TEX         TEX 0.2925146 0.5783047
## 12      CLE            CLE         CLE 0.2930822 0.5766590
## 13      SFG            SFN         SFN 0.2781301 0.5760101
## 14      COL            COL         COL 0.2875245 0.5755845
## 15      TEX            TEX         TEX 0.2754617 0.5753738

Problem 4.

The Angles have at times been called the California Angles (CAL), the Anaheim Angels (ANA), and the Los Angeles Angels (LAA). Find the 10 most successful seasons in Angels history. Have they ever won the world series?

Teams_new %>%
  filter(teamID == c("CAL", "ANA", "LAA")) %>%
  arrange(desc(W)) %>%
  head(n = 10)
##   yearID lgID teamID franchID divID Rank   G Ghome  W  L DivWin WCWin
## 1   1982   AL    CAL      ANA     W    1 162    81 93 69      Y  <NA>
## 2   1985   AL    CAL      ANA     W    2 162    79 90 72      N  <NA>
## 3   1979   AL    CAL      ANA     W    1 162    81 88 74      Y  <NA>
## 4   1991   AL    CAL      ANA     W    7 162    81 81 81      N  <NA>
## 5   1995   AL    CAL      ANA     W    2 145    72 78 67      N     N
## 6   1988   AL    CAL      ANA     W    4 162    81 75 87      N  <NA>
## 7   1961   AL    LAA      ANA  <NA>    8 162    82 70 91   <NA>  <NA>
## 8   1963   AL    LAA      ANA  <NA>    9 161    81 70 91   <NA>  <NA>
## 9   1968   AL    CAL      ANA  <NA>    8 162    81 67 95   <NA>  <NA>
##   LgWin WSWin   R   AB    H X2B X3B  HR  BB   SO  SB CS HBP SF  RA  ER
## 1     N     N 814 5532 1518 268  26 186 613  760  55 53  NA NA 670 621
## 2     N     N 732 5442 1364 215  31 153 648  902 106 51  NA NA 703 633
## 3     N     N 866 5550 1563 242  43 164 589  843 100 53  NA NA 768 692
## 4     N     N 653 5470 1396 245  29 115 448  928  94 56  NA NA 649 591
## 5     N     N 801 5019 1390 252  25 186 564  889  58 39  NA NA 697 645
## 6     N     N 714 5582 1458 258  31 124 469  819  86 52  NA NA 771 698
## 7     N     N 744 5424 1331 218  22 189 681 1068  37 28  NA NA 784 689
## 8     N     N 597 5506 1378 208  38  95 448  916  43 30  NA NA 660 569
## 9     N     N 498 5331 1209 170  33  83 447 1080  62 50  NA NA 615 548
##    ERA CG SHO SV IPouts   HA HRA BBA SOA   E  DP   FP               name
## 1 3.82 40  10 27   4392 1436 124 482 728 106 171 0.98  California Angels
## 2 3.91 22   8 41   4371 1453 171 514 767 112 202 0.98  California Angels
## 3 4.34 46   9 33   4308 1463 131 573 820 135 172 0.97  California Angels
## 4 3.69 18  10 50   4323 1351 141 543 990 102 156 0.98  California Angels
## 5 4.52  8   9 42   3852 1310 163 486 901  95 120 0.98  California Angels
## 6 4.32 26   9 33   4365 1503 135 568 817 135 175 0.97  California Angels
## 7 4.31 25   5 34   4314 1391 180 713 973 192 154 0.96 Los Angeles Angels
## 8 3.52 30  13 31   4365 1317 120 578 889 163 155 0.97 Los Angeles Angels
## 9 3.43 29  11 31   4311 1234 131 519 869 140 156 0.97  California Angels
##                 park attendance BPF PPF teamIDBR teamIDlahman45
## 1    Anaheim Stadium    2807360 100  99      CAL            CAL
## 2    Anaheim Stadium    2567427 100 100      CAL            CAL
## 3    Anaheim Stadium    2523575  96  96      CAL            CAL
## 4    Anaheim Stadium    2416236  99 100      CAL            CAL
## 5    Anaheim Stadium    1748680  99  99      CAL            CAL
## 6    Anaheim Stadium    2340925  97  97      CAL            CAL
## 7 Wrigley Field (LA)     603510 111 112      LAA            LAA
## 8     Dodger Stadium     821015  94  94      LAA            LAA
## 9    Anaheim Stadium    1025956  95  97      CAL            CAL
##   teamIDretro        BA       SLG
## 1         CAL 0.2744035 0.5198843
## 2         CAL 0.2506431 0.4592062
## 3         CAL 0.2816216 0.5102703
## 4         CAL 0.2552102 0.4447898
## 5         CAL 0.2769476 0.5405459
## 6         CAL 0.2611967 0.4591544
## 7         LAA 0.2453909 0.4773230
## 8         LAA 0.2502724 0.4155467
## 9         CAL 0.2267867 0.3714125
Teams_new %>%
  filter(teamID == c("CAL", "ANA", "LAA"), WSWin == "Y")
##  [1] yearID         lgID           teamID         franchID      
##  [5] divID          Rank           G              Ghome         
##  [9] W              L              DivWin         WCWin         
## [13] LgWin          WSWin          R              AB            
## [17] H              X2B            X3B            HR            
## [21] BB             SO             SB             CS            
## [25] HBP            SF             RA             ER            
## [29] ERA            CG             SHO            SV            
## [33] IPouts         HA             HRA            BBA           
## [37] SOA            E              DP             FP            
## [41] name           park           attendance     BPF           
## [45] PPF            teamIDBR       teamIDlahman45 teamIDretro   
## [49] BA             SLG           
## <0 rows> (or 0-length row.names)
#The Angels never won the world series