Please indicate
Any comments you have: Nope!
Load the Lahman R package to gain access to the data tables necessary to complete this assignment. Remember that help files are available for each data table. For example, ?Teams will pull up the help file for the Teams data table.
library(tidyverse)
library(Lahman)
Teams <- Teams
Define two new variables in the Teams data frame: batting average (BA) and slugging percentage (SLG). Batting average is the ratio of hits (H) to at-bats (AB), and slugging percentage is the total bases divided by at-bats. To compute the total bases, you get 1 for a single, 2 for a double, 3 for a triple, and 4 for a home run.
#Making a variable for number of single bases
B <- Teams$H - Teams$X2B - Teams$X3B - Teams$HR
#Making a table for BA and SLG
Teams1 <- Teams %>%
mutate(BA = H / AB) %>%
mutate(SLG = (B+ HR + 2* X2B + 3* X3B + 4 * HR )/ AB)
Plot a time series of SLG since 1954 by league (lgID). Is slugging percentage typically higher in the American League (AL) or the National League?
#Making a specific table for the time series visualization: select relevant columns, filter to AL, NL for league, and yearID to be greater than 1953
LeagueTable <- Teams1 %>%
select(SLG, lgID, yearID) %>%
filter(lgID == "AL" | lgID == "NL") %>%
filter(yearID > 1953)
#Making time series visualization with geom_line()
SLine <- LeagueTable %>%
ggplot(mapping = aes(x=yearID, y=SLG))+
geom_line(size = .5) +
facet_wrap(~lgID)+
labs(title="SLG by League from 1954-2015",
subtitle = "AL represents American League, NL represents National League",
x = "Year",
y= "SLG")
SLine
#Using the summarize function to safely answer the question "typically" seems to denote average
ALslg <- LeagueTable %>%
group_by(lgID) %>%
filter(lgID == "AL") %>%
summarize(mean(SLG))
NLslg <- LeagueTable %>%
group_by(lgID) %>%
filter(lgID == "NL") %>%
summarize(mean(SLG))
ALslg #(0.427) average SLG
## # A tibble: 1 × 2
## lgID `mean(SLG)`
## <fctr> <dbl>
## 1 AL 0.4279684
NLslg #(0.417) average SLG
## # A tibble: 1 × 2
## lgID `mean(SLG)`
## <fctr> <dbl>
## 1 NL 0.4178288
#Solution: The American League had a higher overall Slugging Percentage compared to the National League, but this was extremely close.
Display the top 15 teams ranked in terms of slugging percentage in MLB history. Repeat this using teams since 1969.
#Making a table based on total SLG percentage throughout MLB history
TotalSLG <- Teams1 %>%
select(teamID, name, SLG, yearID) %>%
group_by(name) %>%
mutate(MeanSLG = mean(SLG, na.rm =TRUE)) %>% #MeanSLG is the average SLG for the team across MLB history
select(name, MeanSLG) %>%
arrange(desc(MeanSLG), name) %>%
unique()
TotalSLG[1:15,1:2]
## Source: local data frame [15 x 2]
## Groups: name [15]
##
## name MeanSLG
## <chr> <dbl>
## 1 Colorado Rockies 0.4751648
## 2 Cincinnati Redlegs 0.4520375
## 3 Anaheim Angels 0.4518891
## 4 Toronto Blue Jays 0.4464372
## 5 Milwaukee Braves 0.4455139
## 6 Arizona Diamondbacks 0.4451966
## 7 Texas Rangers 0.4423956
## 8 Los Angeles Angels of Anaheim 0.4419985
## 9 New York Yankees 0.4374726
## 10 Tampa Bay Rays 0.4357002
## 11 Tampa Bay Devil Rays 0.4327704
## 12 Florida Marlins 0.4325307
## 13 Seattle Mariners 0.4319518
## 14 Minnesota Twins 0.4238259
## 15 Oakland Athletics 0.4234512
#Making a total SLG percentage table while filtering out years less than 1969.
TS69 <- Teams1 %>%
select(teamID, name, SLG, yearID) %>%
group_by(name) %>%
filter(yearID>1968) %>%
mutate(MeanSLG = mean(SLG, na.rm =TRUE)) %>%
select(name, MeanSLG) %>%
arrange(desc(MeanSLG), name) %>%
unique()
TS69[1:15, 1:2]
## Source: local data frame [15 x 2]
## Groups: name [15]
##
## name MeanSLG
## <chr> <dbl>
## 1 Colorado Rockies 0.4751648
## 2 Boston Red Sox 0.4563293
## 3 Anaheim Angels 0.4518891
## 4 New York Yankees 0.4471042
## 5 Toronto Blue Jays 0.4464372
## 6 Arizona Diamondbacks 0.4451966
## 7 Texas Rangers 0.4423956
## 8 Los Angeles Angels of Anaheim 0.4419985
## 9 Detroit Tigers 0.4411975
## 10 Baltimore Orioles 0.4379894
## 11 Tampa Bay Rays 0.4357002
## 12 Chicago White Sox 0.4331833
## 13 Tampa Bay Devil Rays 0.4327704
## 14 Florida Marlins 0.4325307
## 15 Seattle Mariners 0.4319518
The Angles have at times been called the California Angles (CAL), the Anaheim Angels (ANA), and the Los Angeles Angels (LAA). Find the 10 most successful seasons in Angels history. Have they ever won the world series?
#Making a data table of Win/Loss Ratio for Angels
Angels <- Teams %>%
select(yearID, teamID, W, L, WSWin)%>%
filter(teamID == "CAL" | teamID == "ANA" | teamID == "LAA") %>%
mutate(WinLoss = W - L) %>%
arrange(desc(WinLoss)) %>%
select(yearID, W, L, WinLoss, WSWin)
Angels[1:10, 1:5]
## yearID W L WinLoss WSWin
## 1 2008 100 62 38 N
## 2 2002 99 63 36 Y
## 3 2014 98 64 34 N
## 4 2009 97 65 32 N
## 5 2005 95 67 28 N
## 6 2007 94 68 26 N
## 7 1982 93 69 24 N
## 8 1986 92 70 22 N
## 9 2004 92 70 22 N
## 10 1989 91 71 20 N
#Determining the number of World Series the Angels won
WST <- Angels %>%
select(WSWin) %>%
filter(WSWin == "Y") %>%
summarize(n())
WST
## n()
## 1 1
#Yes, the Angels won the world series once in 2002.