Please indicate
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
library(Lahman)
library(plyr)
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following object is masked from 'package:purrr':
##
## compact
library(ggplot2)
data("Teams")
head(Teams)
## yearID lgID teamID franchID divID Rank G Ghome W L DivWin WCWin LgWin
## 1 1871 NA BS1 BNA <NA> 3 31 NA 20 10 <NA> <NA> N
## 2 1871 NA CH1 CNA <NA> 2 28 NA 19 9 <NA> <NA> N
## 3 1871 NA CL1 CFC <NA> 8 29 NA 10 19 <NA> <NA> N
## 4 1871 NA FW1 KEK <NA> 7 19 NA 7 12 <NA> <NA> N
## 5 1871 NA NY2 NNA <NA> 5 33 NA 16 17 <NA> <NA> N
## 6 1871 NA PH1 PNA <NA> 1 28 NA 21 7 <NA> <NA> Y
## WSWin R AB H X2B X3B HR BB SO SB CS HBP SF RA ER ERA CG SHO SV
## 1 <NA> 401 1372 426 70 37 3 60 19 73 NA NA NA 303 109 3.55 22 1 3
## 2 <NA> 302 1196 323 52 21 10 60 22 69 NA NA NA 241 77 2.76 25 0 1
## 3 <NA> 249 1186 328 35 40 7 26 25 18 NA NA NA 341 116 4.11 23 0 0
## 4 <NA> 137 746 178 19 8 2 33 9 16 NA NA NA 243 97 5.17 19 1 0
## 5 <NA> 302 1404 403 43 21 1 33 15 46 NA NA NA 313 121 3.72 32 1 0
## 6 <NA> 376 1281 410 66 27 9 46 23 56 NA NA NA 266 137 4.95 27 0 0
## IPouts HA HRA BBA SOA E DP FP name
## 1 828 367 2 42 23 225 NA 0.83 Boston Red Stockings
## 2 753 308 6 28 22 218 NA 0.82 Chicago White Stockings
## 3 762 346 13 53 34 223 NA 0.81 Cleveland Forest Citys
## 4 507 261 5 21 17 163 NA 0.80 Fort Wayne Kekiongas
## 5 879 373 7 42 22 227 NA 0.83 New York Mutuals
## 6 747 329 3 53 16 194 NA 0.84 Philadelphia Athletics
## park attendance BPF PPF teamIDBR teamIDlahman45
## 1 South End Grounds I NA 103 98 BOS BS1
## 2 Union Base-Ball Grounds NA 104 102 CHI CH1
## 3 National Association Grounds NA 96 100 CLE CL1
## 4 Hamilton Field NA 101 107 KEK FW1
## 5 Union Grounds (Brooklyn) NA 90 88 NYU NY2
## 6 Jefferson Street Grounds NA 102 98 ATH PH1
## teamIDretro
## 1 BS1
## 2 CH1
## 3 CL1
## 4 FW1
## 5 NY2
## 6 PH1
Define two new variables in the Teams data frame: batting average (BA) and slugging percentage (SLG). Batting average is the ratio of hits (H) to at-bats (AB), and slugging percentage is the total bases divided by at-bats. To compute the total bases, you get 1 for a single, 2 for a double, 3 for a triple, and 4 for a home run.
slg_pct <-
Teams %>%
mutate(BA = H/AB, SLG = (X2B*2 + X3B*3 + HR*4 + (H-X2B-X3B-HR)) / AB)
head(slg_pct)
## yearID lgID teamID franchID divID Rank G Ghome W L DivWin WCWin LgWin
## 1 1871 NA BS1 BNA <NA> 3 31 NA 20 10 <NA> <NA> N
## 2 1871 NA CH1 CNA <NA> 2 28 NA 19 9 <NA> <NA> N
## 3 1871 NA CL1 CFC <NA> 8 29 NA 10 19 <NA> <NA> N
## 4 1871 NA FW1 KEK <NA> 7 19 NA 7 12 <NA> <NA> N
## 5 1871 NA NY2 NNA <NA> 5 33 NA 16 17 <NA> <NA> N
## 6 1871 NA PH1 PNA <NA> 1 28 NA 21 7 <NA> <NA> Y
## WSWin R AB H X2B X3B HR BB SO SB CS HBP SF RA ER ERA CG SHO SV
## 1 <NA> 401 1372 426 70 37 3 60 19 73 NA NA NA 303 109 3.55 22 1 3
## 2 <NA> 302 1196 323 52 21 10 60 22 69 NA NA NA 241 77 2.76 25 0 1
## 3 <NA> 249 1186 328 35 40 7 26 25 18 NA NA NA 341 116 4.11 23 0 0
## 4 <NA> 137 746 178 19 8 2 33 9 16 NA NA NA 243 97 5.17 19 1 0
## 5 <NA> 302 1404 403 43 21 1 33 15 46 NA NA NA 313 121 3.72 32 1 0
## 6 <NA> 376 1281 410 66 27 9 46 23 56 NA NA NA 266 137 4.95 27 0 0
## IPouts HA HRA BBA SOA E DP FP name
## 1 828 367 2 42 23 225 NA 0.83 Boston Red Stockings
## 2 753 308 6 28 22 218 NA 0.82 Chicago White Stockings
## 3 762 346 13 53 34 223 NA 0.81 Cleveland Forest Citys
## 4 507 261 5 21 17 163 NA 0.80 Fort Wayne Kekiongas
## 5 879 373 7 42 22 227 NA 0.83 New York Mutuals
## 6 747 329 3 53 16 194 NA 0.84 Philadelphia Athletics
## park attendance BPF PPF teamIDBR teamIDlahman45
## 1 South End Grounds I NA 103 98 BOS BS1
## 2 Union Base-Ball Grounds NA 104 102 CHI CH1
## 3 National Association Grounds NA 96 100 CLE CL1
## 4 Hamilton Field NA 101 107 KEK FW1
## 5 Union Grounds (Brooklyn) NA 90 88 NYU NY2
## 6 Jefferson Street Grounds NA 102 98 ATH PH1
## teamIDretro BA SLG
## 1 BS1 0.3104956 0.4220117
## 2 CH1 0.2700669 0.3737458
## 3 CL1 0.2765599 0.3912310
## 4 FW1 0.2386059 0.2935657
## 5 NY2 0.2870370 0.3497151
## 6 PH1 0.3200625 0.4348165
Plot a time series of SLG since 1954 by league (lgID). Is slugging percentage typically higher in the American League (AL) or the National League?
year1954 <-
filter(slg_pct, yearID >= 1954)
avg_slg <- ddply(year1954, .(yearID, lgID), summarize, AvgSlg = mean(SLG))
head(avg_slg)
## yearID lgID AvgSlg
## 1 1954 AL 0.3732352
## 2 1954 NL 0.4067245
## 3 1955 AL 0.3810866
## 4 1955 NL 0.4068172
## 5 1956 AL 0.3935263
## 6 1956 NL 0.4008650
ggplot(avg_slg, mapping = aes(x = yearID, y = AvgSlg, color = lgID)) + geom_point() + labs(x = "Year", y = "Average Slugging Percantage", title = "Average Slugging Percantage vs. Year")
##tried using dplyr functions but kept getting "error in order(yearID): object 'yearID' not found. Through google found out how to use function above and used that instead.
##slg_pct %>%
##group_by(lgID, yearID) %>%
##filter(yearID >= 1954) %>%
##select(yearID, SLG) %>%
##summarize(AvgSlg = mean(SLG, na.rm = TRUE)) %>%
##arrange(yearID)
It looks like up until right around after 1970 the National League has a higher slugging percantage, after that point the AL seems to have higher. The DH was introduced in 1973, which is probably why the AL has a higher SLG pct after around that year.
Display the top 15 teams ranked in terms of slugging percentage in MLB history. Repeat this using teams since 1969.
##all time
slg_pct %>%
group_by(teamIDBR, yearID) %>%
select(SLG) %>%
arrange(desc(SLG)) %>%
head(n=15)
## Adding missing grouping variables: `teamIDBR`, `yearID`
## Source: local data frame [15 x 3]
## Groups: teamIDBR, yearID [15]
##
## teamIDBR yearID SLG
## <chr> <int> <dbl>
## 1 BOS 2003 0.4908996
## 2 NYY 1927 0.4890593
## 3 NYY 1930 0.4877019
## 4 SEA 1997 0.4845030
## 5 BSN 1894 0.4843345
## 6 CLE 1994 0.4838389
## 7 SEA 1996 0.4835921
## 8 NYY 1936 0.4834556
## 9 COL 2001 0.4829525
## 10 BLN 1894 0.4828089
## 11 CHC 1930 0.4809174
## 12 CLE 1995 0.4787192
## 13 TEX 1999 0.4786763
## 14 COL 1997 0.4777798
## 15 NYY 2009 0.4775618
##since 1969
slg_pct %>%
group_by(teamIDBR, yearID) %>%
filter(yearID >= 1969) %>%
select(SLG) %>%
arrange(desc(SLG)) %>%
head(n=15)
## Adding missing grouping variables: `teamIDBR`, `yearID`
## Source: local data frame [15 x 3]
## Groups: teamIDBR, yearID [15]
##
## teamIDBR yearID SLG
## <chr> <int> <dbl>
## 1 BOS 2003 0.4908996
## 2 SEA 1997 0.4845030
## 3 CLE 1994 0.4838389
## 4 SEA 1996 0.4835921
## 5 COL 2001 0.4829525
## 6 CLE 1995 0.4787192
## 7 TEX 1999 0.4786763
## 8 COL 1997 0.4777798
## 9 NYY 2009 0.4775618
## 10 HOU 2000 0.4766607
## 11 ATL 2003 0.4754850
## 12 CLE 1996 0.4752684
## 13 ANA 2000 0.4724591
## 14 COL 1996 0.4724508
## 15 BOS 2004 0.4723776
The Angles have at times been called the California Angles (CAL), the Anaheim Angels (ANA), and the Los Angeles Angels (LAA). Find the 10 most successful seasons in Angels history. Have they ever won the world series?
##10 most successful seasons
slg_pct %>%
group_by(teamIDBR) %>%
filter(teamIDBR == "CAL" | teamIDBR == "ANA" | teamIDBR == "LAA") %>%
mutate(WinPct = W / (W+L)) %>%
select(W, L, WinPct, WSWin) %>%
arrange(desc(WinPct)) %>%
head(n=10)
## Adding missing grouping variables: `teamIDBR`
## Source: local data frame [10 x 5]
## Groups: teamIDBR [3]
##
## teamIDBR W L WinPct WSWin
## <chr> <int> <int> <dbl> <chr>
## 1 LAA 100 62 0.6172840 N
## 2 ANA 99 63 0.6111111 Y
## 3 LAA 98 64 0.6049383 N
## 4 LAA 97 65 0.5987654 N
## 5 LAA 95 67 0.5864198 N
## 6 LAA 94 68 0.5802469 N
## 7 CAL 93 69 0.5740741 N
## 8 CAL 92 70 0.5679012 N
## 9 ANA 92 70 0.5679012 N
## 10 CAL 91 71 0.5617284 N
##see if the have any other WS wins
slg_pct %>%
group_by(teamIDBR) %>%
filter(teamIDBR == "CAL" | teamIDBR == "ANA" | teamIDBR == "LAA") %>%
mutate(WinPct = W / (W+L)) %>%
select(yearID, W, L, WinPct, WSWin) %>%
arrange(desc(WinPct)) %>%
filter(WSWin == "Y")
## Adding missing grouping variables: `teamIDBR`
## Source: local data frame [1 x 6]
## Groups: teamIDBR [1]
##
## teamIDBR yearID W L WinPct WSWin
## <chr> <int> <int> <int> <dbl> <chr>
## 1 ANA 2002 99 63 0.6111111 Y
They have only won world series once, as the Anaheim Angels, in 2002.