Please indicate
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(Lahman)
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
Define two new variables in the Teams data frame: batting average (BA) and slugging percentage (SLG). Batting average is the ratio of hits (H) to at-bats (AB), and slugging percentage is the total bases divided by at-bats. To compute the total bases, you get 1 for a single, 2 for a double, 3 for a triple, and 4 for a home run.
Teams1 <- Teams %>%
mutate(BA = H / AB) %>%
mutate(SLG = ((H - X2B - X3B - HR) + 2 * X2B + 3 * X3B + 4 * HR) / AB)
Plot a time series of SLG since 1954 by league (lgID). Is slugging percentage typically higher in the American League (AL) or the National League?
Teams2 <- Teams1 %>%
select(SLG, yearID, lgID, teamID) %>%
filter(!is.na(yearID) & yearID > 1954) %>%
filter(lgID == "NL" | lgID == "AL") %>%
group_by(lgID, yearID) %>%
summarize(avg_SLG = mean(SLG))
head(Teams2)
## Source: local data frame [6 x 3]
## Groups: lgID [1]
##
## lgID yearID avg_SLG
## <fctr> <int> <dbl>
## 1 AL 1955 0.3810866
## 2 AL 1956 0.3935263
## 3 AL 1957 0.3823205
## 4 AL 1958 0.3829396
## 5 AL 1959 0.3839940
## 6 AL 1960 0.3874953
ggplot(Teams2, aes(x = yearID, y = avg_SLG, color = lgID)) +
geom_point() + geom_line()
Display the top 15 teams ranked in terms of slugging percentage in MLB history. Repeat this using teams since 1969.
Here’s the first one
for(i in 1969:1970){
print(
Teams1 %>%
filter(!is.na(yearID) & yearID == i) %>%
arrange(-SLG) %>%
select(yearID, lgID, name, SLG) %>%
head(15))}
## yearID lgID name SLG
## 1 1969 NL Cincinnati Reds 0.4222577
## 2 1969 AL Boston Red Sox 0.4149982
## 3 1969 AL Baltimore Orioles 0.4135556
## 4 1969 AL Minnesota Twins 0.4084904
## 5 1969 NL Pittsburgh Pirates 0.3977959
## 6 1969 AL Detroit Tigers 0.3874288
## 7 1969 NL Chicago Cubs 0.3835443
## 8 1969 NL Atlanta Braves 0.3796703
## 9 1969 AL Washington Senators 0.3781898
## 10 1969 AL Oakland Athletics 0.3758461
## 11 1969 NL Philadelphia Phillies 0.3720414
## 12 1969 NL San Francisco Giants 0.3609792
## 13 1969 NL St. Louis Cardinals 0.3592847
## 14 1969 NL Los Angeles Dodgers 0.3588214
## 15 1969 NL Montreal Expos 0.3585532
## yearID lgID name SLG
## 1 1970 NL Cincinnati Reds 0.4357401
## 2 1970 AL Boston Red Sox 0.4276423
## 3 1970 NL Chicago Cubs 0.4146786
## 4 1970 NL San Francisco Giants 0.4091072
## 5 1970 NL Pittsburgh Pirates 0.4057123
## 6 1970 NL Atlanta Braves 0.4035341
## 7 1970 AL Minnesota Twins 0.4028816
## 8 1970 AL Baltimore Orioles 0.4010821
## 9 1970 AL Cleveland Indians 0.3935567
## 10 1970 AL Oakland Athletics 0.3919271
## 11 1970 NL San Diego Padres 0.3911540
## 12 1970 NL Houston Astros 0.3905633
## 13 1970 NL Los Angeles Dodgers 0.3822690
## 14 1970 NL St. Louis Cardinals 0.3789770
## 15 1970 AL Detroit Tigers 0.3736284
Teams1 %>%
filter(!is.na(yearID)) %>%
group_by(name) %>%
summarize(mean_SLG = mean(SLG)) %>%
arrange(-mean_SLG) %>%
head(15)
## # A tibble: 15 × 2
## name mean_SLG
## <chr> <dbl>
## 1 Colorado Rockies 0.4425149
## 2 Anaheim Angels 0.4223234
## 3 Cincinnati Redlegs 0.4199253
## 4 Boston Red Stockings 0.4165024
## 5 Toronto Blue Jays 0.4164657
## 6 Arizona Diamondbacks 0.4152269
## 7 Milwaukee Braves 0.4135638
## 8 Los Angeles Angels of Anaheim 0.4133163
## 9 Texas Rangers 0.4132939
## 10 New York Yankees 0.4104905
## 11 Tampa Bay Devil Rays 0.4059601
## 12 Tampa Bay Rays 0.4052192
## 13 Florida Marlins 0.4051400
## 14 Seattle Mariners 0.4037377
## 15 Minnesota Twins 0.3993335
Teams1 %>%
filter(!is.na(yearID)) %>%
filter(yearID > 1969) %>%
group_by(yearID) %>%
arrange(-SLG) %>%
select(yearID, lgID, name, SLG) %>%
head(15)
## Source: local data frame [15 x 4]
## Groups: yearID [10]
##
## yearID lgID name SLG
## <int> <fctr> <chr> <dbl>
## 1 2003 AL Boston Red Sox 0.4908996
## 2 1997 AL Seattle Mariners 0.4845030
## 3 1994 AL Cleveland Indians 0.4838389
## 4 1996 AL Seattle Mariners 0.4835921
## 5 2001 NL Colorado Rockies 0.4829525
## 6 1995 AL Cleveland Indians 0.4787192
## 7 1999 AL Texas Rangers 0.4786763
## 8 1997 NL Colorado Rockies 0.4777798
## 9 2009 AL New York Yankees 0.4775618
## 10 2000 NL Houston Astros 0.4766607
## 11 2003 NL Atlanta Braves 0.4754850
## 12 1996 AL Cleveland Indians 0.4752684
## 13 2000 AL Anaheim Angels 0.4724591
## 14 1996 NL Colorado Rockies 0.4724508
## 15 2004 AL Boston Red Sox 0.4723776
Teams_before_1969 <-
Teams1 %>%
select(yearID, teamID) %>%
filter(yearID < 1969)
Teams1 %>%
anti_join(Teams_before_1969, by = "teamID") %>%
group_by(yearID) %>%
arrange(-SLG) %>%
select(yearID, lgID, teamID, name, SLG) %>%
head(15)
## Source: local data frame [15 x 5]
## Groups: yearID [9]
##
## yearID lgID teamID name SLG
## <int> <fctr> <fctr> <chr> <dbl>
## 1 1997 AL SEA Seattle Mariners 0.4845030
## 2 1996 AL SEA Seattle Mariners 0.4835921
## 3 2001 NL COL Colorado Rockies 0.4829525
## 4 1999 AL TEX Texas Rangers 0.4786763
## 5 1997 NL COL Colorado Rockies 0.4777798
## 6 2000 AL ANA Anaheim Angels 0.4724591
## 7 1996 NL COL Colorado Rockies 0.4724508
## 8 1999 NL COL Colorado Rockies 0.4716585
## 9 1995 NL COL Colorado Rockies 0.4707649
## 10 2001 AL TEX Texas Rangers 0.4707124
## 11 2000 AL TOR Toronto Blue Jays 0.4692619
## 12 1996 AL TEX Texas Rangers 0.4686075
## 13 2005 AL TEX Texas Rangers 0.4683345
## 14 1998 AL SEA Seattle Mariners 0.4676617
## 15 2006 AL TOR Toronto Blue Jays 0.4628306
For the second method, we see younger teams, such as Seattle Mariners (data available from 1977), Texas Rangers (from 1972), Colorado Rockies (from 1993), and Toronto Blue Jays (from 1977)
The Angles have at times been called the California Angles (CAL), the Anaheim Angels (ANA), and the Los Angeles Angels (LAA). Find the 10 most successful seasons in Angels history. Have they ever won the world series?
Teams %>%
filter(teamID == "CAL" | teamID == "ANA" | teamID == "LAA") %>%
mutate(win_rate = W / L) %>%
select(yearID, teamID, Rank, W, L, win_rate, WSWin) %>%
arrange(Rank, -win_rate) %>%
head(10)
## yearID teamID Rank W L win_rate WSWin
## 1 2008 LAA 1 100 62 1.612903 N
## 2 2014 LAA 1 98 64 1.531250 N
## 3 2009 LAA 1 97 65 1.492308 N
## 4 2005 LAA 1 95 67 1.417910 N
## 5 2007 LAA 1 94 68 1.382353 N
## 6 1982 CAL 1 93 69 1.347826 N
## 7 1986 CAL 1 92 70 1.314286 N
## 8 2004 ANA 1 92 70 1.314286 N
## 9 1979 CAL 1 88 74 1.189189 N
## 10 2002 ANA 2 99 63 1.571429 Y
Teams %>%
filter(teamID == "CAL" | teamID == "ANA" | teamID == "LAA") %>%
select(yearID, teamID, Rank, W, L, DivWin, WSWin) %>%
arrange(desc(WSWin)) %>%
head(10)
## yearID teamID Rank W L DivWin WSWin
## 1 2002 ANA 2 99 63 N Y
## 2 1961 LAA 8 70 91 <NA> N
## 3 1962 LAA 3 86 76 <NA> N
## 4 1963 LAA 9 70 91 <NA> N
## 5 1964 LAA 5 82 80 <NA> N
## 6 1965 CAL 7 75 87 <NA> N
## 7 1966 CAL 6 80 82 <NA> N
## 8 1967 CAL 5 84 77 <NA> N
## 9 1968 CAL 8 67 95 <NA> N
## 10 1969 CAL 3 71 91 N N
The 10 most successful seasons for the Angels are shown in the table above. The win_rate is calculated by dividing total wins W over total loses L in that seaseon. All the rows are arranged first by Rank, than by the win_rate to see which season was more successful, for every year of same Rank.
The year 2008 was the most successful, with Rank 1 and win rate of 1.6129032, followed by 2014, 2009, and so on as shown on the table. In total, they have won the division 9 times.
The second table shows that the Angels won the World Series only once in 2002.