Setup and Load Packages:
knitr::opts_chunk$set(echo = TRUE)
# Load necessary packages:
library(dplyr)
library(ggplot2)
library(readr)
library(knitr)
library(retrosheet)
library(lubridate)
library(kableExtra)
library(bigrquery)
library(DBI)
Introduction
For my Google Data Analytics Capstone Project, I chose track 2. I used the six phases of data analysis for this project (ask, prepare, process, analyze, share, & act). I used several tools to complete this project, specifically spreadsheets (excel), SQL (BigQuery) and R ( R Studio).
Background
In this case study scenario, I have been hired to analyze Major League Baseball odds data for FantasyAce, a website that provides projections, software tools and advice for fantasy sports and sports betting.
Ask
Business Question: How accurate are the lines set by sports books on Major League Baseball games?
Sportsbooks employ skilled handicappers with access to huge amounts of historic data in order to create lines in advance of various games and events. Then they adjust these odds in reaction to the money wagered. For example, a book may post an opening line for runs scored in a game at 8.5. As money is wagered, the lines may begin to change. Because the books keep track of every wager placed by every player, they know who the skilled, winning players are, and they will adjust their lines in reaction to these players. If enough money is wagered on the over by winning professional bettors, the book may adjust the line up to 9. This process creates efficient lines that leverage the historical data generated by the sportsbook, as well as the experience and skill of the handicappers they employ and the professional gamblers who wager on these lines. For this reason, FantasyAce uses these numbers to identify profitable betting opportunities, create more accurate player projections, and generally advise their customers on the sports betting landscape. However, we need to better understand the strengths and weaknesses of the various odds in order to improve our understanding of their predictive value and communicate this information to the customer. Specifically, we want to answer the following questions:
For the purpose of this analysis, we will define a cover as follows:
Prepare
FantasyAce has provided me with Major League Baseball odds data from online sportsbook ActionSports.com. There are 10 separate csv files, one for each season from 2012 to 2021. The data has one row for each team involved in every regular season game. The columns consist of team, date, and the corresponding odds from the sports book. I uploaded each of these files to BigQuery. The data does not include the game results, so I needed to collect this information and attach it to the odds data. Specifically I needed the number of runs scored by each team. I also needed to calculate the total runs scored by both teams in a game, equal to runs plus opponent runs. I used an R package called “retrosheet” to obtain this game data. This package is a collection of tools used to import baseball data from the website retrosheet.org. The data was obtained free of charge from, and is copyrighted by, retrosheet.
Process
I used excel to open and work with the 10 odds data csv files. I created pivot tables and a plot for each file.
I then uploaded these files to BigQuery. I used SQL to join these tables together and then uploaded these into R Studio.
# Create a query to join odds files:
query <- "SELECT *
FROM
(
SELECT * FROM `vast-logic-393202.baseballOdds.odds2012`
UNION ALL
SELECT * FROM `vast-logic-393202.baseballOdds.odds2013`
UNION ALL
SELECT * FROM `vast-logic-393202.baseballOdds.odds2014`
UNION ALL
SELECT * FROM `vast-logic-393202.baseballOdds.odds2015`
UNION ALL
SELECT * FROM `vast-logic-393202.baseballOdds.odds2016`
UNION ALL
SELECT * FROM `vast-logic-393202.baseballOdds.odds2017`
UNION ALL
SELECT * FROM `vast-logic-393202.baseballOdds.odds2018`
UNION ALL
SELECT * FROM `vast-logic-393202.baseballOdds.odds2019`
UNION ALL
SELECT * FROM `vast-logic-393202.baseballOdds.odds2020`
UNION ALL
SELECT * FROM `vast-logic-393202.baseballOdds.odds2021`
)"
# Run query
oddsData <- dbGetQuery(con, query)
head(oddsData)
## # A tibble: 6 × 10
## date at team gameNumber line runLine runLineOdds total overOdds
## <date> <chr> <chr> <int> <int> <dbl> <int> <dbl> <int>
## 1 2021-05-27 V COL 1 115 1.5 -220 5 -120
## 2 2021-05-27 H NYM 1 -125 -1.5 195 5 -120
## 3 2021-05-29 V COL 1 124 1.5 -230 5 -120
## 4 2021-05-29 H PIT 1 -134 -1.5 200 5 -120
## 5 2021-06-18 V TB 1 -101 1.5 -215 7.5 -120
## 6 2021-06-18 H SEA 1 -109 -1.5 190 7.5 -120
## # ℹ 1 more variable: underOdds <int>
I then used the “retrosheet” package to obtain the necessary game data. I saved this as “gameLogs”.
# Define the range of years to get data for:
years <- 2012:2021
# Create an empty data frame to store the data for all years:
combined_gamelog <- data.frame()
# Loop through each year and fetch the game log data:
for (year in years) {
gamelog <- getPartialGamelog(year, glFields = c('Date', 'DblHdr', 'VisTm', 'HmTm', 'VisRuns',
'HmRuns', 'ParkID', 'VisH', 'VisD', 'VisT',
'VisHR', 'VisRBI', 'VisHBP', 'VisBB', 'VisSB',
'VisER', 'HmH', 'HmD', 'HmT', 'HmHR',
'HmRBI', 'HmHBP', 'HmBB', 'HmSB', 'HmER',
'VisStPchID', 'VisStPchNm', 'HmStPchID', 'HmStPchNm'))
# Append the current year's data to the combined data frame:
combined_gamelog <- rbind(combined_gamelog, gamelog)
}
Analyze
The game logs have data for both teams in each game in a single row. The odds data contains a separate row for each team in every game, so I had to divide the game logs into separate rows for each team. In order to merge this data with the odds data, I completed the following steps:
# Create table for visiting teams:
visitor.gl <- combined_gamelog
# Rename the remaining columns:
colnames(visitor.gl) <- c('date', 'DblHdr', 'team', 'opponent', 'runs',
'oppRuns', 'ParkID', 'hits', 'doubles', 'triples',
'HR', 'RBI', 'HBP', 'BB', 'SB',
'pitcherERs', 'oppHits', 'oppDoubles', 'oppTriples', 'oppHR',
'oppRBI', 'oppHBP', 'oppBB', 'oppSB', 'oppPitcherERs',
'pitcherID', 'pitcherName', 'oppPitcherID', 'oppPitcherName')
# Create table for home teams:
home.gl <- combined_gamelog
# Rename the remaining columns:
colnames(home.gl) <- c('date', 'DblHdr', 'opponent', 'team', 'oppRuns',
'runs', 'ParkID', 'oppHits', 'oppDoubles', 'oppTriples',
'oppHR', 'oppRBI', 'oppHBP', 'oppBB', 'oppSB',
'oppPitcherERs', 'hits', 'doubles', 'triples', 'HR',
'RBI', 'HBP', 'BB', 'SB', 'pitcherERs',
'oppPitcherID', 'oppPitcherName', 'pitcherID', 'pitcherName')
# Reorder the columns to match visitor.gl:
home.gl <- home.gl[, c('date', 'DblHdr', 'team', 'opponent', 'runs',
'oppRuns', 'ParkID', 'hits', 'doubles', 'triples',
'HR', 'RBI', 'HBP', 'BB', 'SB',
'pitcherERs', 'oppHits', 'oppDoubles', 'oppTriples', 'oppHR',
'oppRBI', 'oppHBP', 'oppBB', 'oppSB', 'oppPitcherERs',
'pitcherID', 'pitcherName', 'oppPitcherID', 'oppPitcherName')]
# Combine home and visitor to create full table of games:
gameLogs <- rbind(visitor.gl, home.gl)
# Change team abbreviations to match odds data:
Find <- c('WAS', 'TOR', 'TBA', 'SLN', 'SFN', 'SEA', 'PIT', 'PHI', 'NYA', 'MIN',
'MIL', 'MIA', 'LAN', 'KCA', 'COL', 'CHA', 'BOS', 'ATL', 'ANA', 'ARI',
'CIN', 'DET', 'CLE', 'TEX', 'OAK', 'BAL', 'SDN', 'HOU', 'NYN', 'CHN')
Replace <- c('WSH', 'TOR', 'TB', 'STL', 'SF', 'SEA', 'PIT', 'PHI', 'NYY', 'MIN',
'MIL', 'MIA', 'LAD', 'KC', 'COL', 'CWS', 'BOS', 'ATL', 'LAA', 'ARI',
'CIN', 'DET', 'CLE', 'TEX', 'OAK', 'BAL', 'SD', 'HOU', 'NYM', 'CHC')
# Change team abbreviations:
gameLogs$team <- Replace[match(gameLogs$team, Find)]
gameLogs$opponent <- Replace[match(gameLogs$opponent, Find)]
# Change date column from integer to proper date format:
gameLogs$date <- as.Date(as.character(gameLogs$date), format = "%Y%m%d")
# Create gameNumber column (needed to join with odds data) from double header information:
gameLogs <- gameLogs %>%
mutate(gameNumber = ifelse(DblHdr %in% c(0, 1), 1, 2))
# Create 'season' column by extracting year from date column:
gameLogs$season <- year(gameLogs$date)
# Select columns:
gameLogs <- gameLogs %>%
select(date, team, opponent, runs, oppRuns, gameNumber, season)
# Remove unnecessary items:
rm(home.gl, visitor.gl, gamelog, combined_gamelog)
# View gameLogs:
head(gameLogs)
## date team opponent runs oppRuns gameNumber season
## 1 2012-03-28 SEA OAK 3 1 1 2012
## 2 2012-03-29 SEA OAK 1 4 1 2012
## 3 2012-04-04 STL MIA 4 1 1 2012
## 4 2012-04-05 TOR CLE 7 4 1 2012
## 5 2012-04-05 BOS DET 2 3 1 2012
## 6 2012-04-05 WSH CHC 2 1 1 2012
I then merged this data with our odds data. I also created separate columns for:
# Merge gameLogs and oddsData:
data <- merge(gameLogs, oddsData, by = c("date", "team", "gameNumber"), all = TRUE)
# Create temporary data frame to add opponent line data:
oddsDataTemp <- data[, c("date", "team", "gameNumber", "opponent", "line", "runLine", "runLineOdds")]
# Merge with original data, matching opponent in original data to team in temporary data:
dataTemp <- merge(data, oddsDataTemp,
by.x = c("date", "opponent", "gameNumber"),
by.y = c("date", "team", "gameNumber"))
# Rename columns:
names(dataTemp)[names(dataTemp) %in% c("line.x", "runLine.x", "runLineOdds.x")] <- c("moneyLine", "runLine", "runLineOdds")
names(dataTemp)[names(dataTemp) %in% c("line.y", "runLine.y", "runLineOdds.y")] <- c("oppMoneyLine", "oppRunLine", "oppRunLineOdds")
# Select columns and assign dataTemp to data:
data <- dataTemp %>%
select(
date, season, team, opponent, runs, oppRuns,
moneyLine, runLine, runLineOdds, total, overOdds, underOdds,
oppMoneyLine, oppRunLine, oppRunLineOdds
)
# Create 'totalRuns' column :
data$totalRuns <- data$runs + data$oppRuns
# Delete oddsDataTemp:
rm(oddsDataTemp)
# Delete dataTemp:
rm(dataTemp)
# View data:
str(data)
## 'data.frame': 45530 obs. of 16 variables:
## $ date : Date, format: "2012-03-28" "2012-03-28" ...
## $ season : num 2012 2012 2012 2012 2012 ...
## $ team : chr "SEA" "OAK" "SEA" "OAK" ...
## $ opponent : chr "OAK" "SEA" "OAK" "SEA" ...
## $ runs : int 3 1 1 4 4 1 1 3 2 0 ...
## $ oppRuns : int 1 3 4 1 1 4 0 2 1 4 ...
## $ moneyLine : int -140 120 -108 -112 160 -180 -111 -144 -139 110 ...
## $ runLine : num -1.5 1.5 1.5 -1.5 1.5 -1.5 -1.5 -1.5 -1.5 1.5 ...
## $ runLineOdds : int NA NA NA NA NA NA NA NA NA NA ...
## $ total : num 7 7 8.5 8.5 7 7 7 7 6.5 7.5 ...
## $ overOdds : int -110 -110 105 105 -125 -125 -105 -120 110 -125 ...
## $ underOdds : int -110 -110 -125 -125 105 105 -115 100 -130 105 ...
## $ oppMoneyLine : int 120 -140 -112 -108 -180 160 -109 124 119 -130 ...
## $ oppRunLine : num 1.5 -1.5 -1.5 1.5 -1.5 1.5 1.5 1.5 1.5 -1.5 ...
## $ oppRunLineOdds: int NA NA NA NA NA NA NA NA NA NA ...
## $ totalRuns : int 4 4 5 5 5 5 1 5 3 4 ...
Before I started my analysis, I used the summary() function to check the data for missing values. There are some NA values in the runLine and runLineOdds columns that I need to account for later on.
summary(data)
## date season team opponent
## Min. :2012-03-28 Min. :2012 Length:45530 Length:45530
## 1st Qu.:2014-06-01 1st Qu.:2014 Class :character Class :character
## Median :2016-08-08 Median :2016 Mode :character Mode :character
## Mean :2016-10-08 Mean :2016
## 3rd Qu.:2019-04-01 3rd Qu.:2019
## Max. :2021-10-03 Max. :2021
##
## runs oppRuns moneyLine runLine
## Min. : 0.000 Min. : 0.000 Min. :-550.00 Min. :-1.5000
## 1st Qu.: 2.000 1st Qu.: 2.000 1st Qu.:-141.00 1st Qu.:-1.5000
## Median : 4.000 Median : 4.000 Median :-106.00 Median : 1.5000
## Mean : 4.425 Mean : 4.425 Mean : -19.42 Mean : 0.0001
## 3rd Qu.: 6.000 3rd Qu.: 6.000 3rd Qu.: 126.00 3rd Qu.: 1.5000
## Max. :29.000 Max. :29.000 Max. : 425.00 Max. : 1.5000
## NA's :438
## runLineOdds total overOdds underOdds
## Min. :-270.00 Min. : 4.500 Min. :-142.0 Min. :-142.00
## 1st Qu.:-155.00 1st Qu.: 7.500 1st Qu.:-115.0 1st Qu.:-115.00
## Median :-110.00 Median : 8.500 Median :-110.0 Median :-110.00
## Mean : -20.62 Mean : 8.394 Mean : -67.9 Mean : -71.31
## 3rd Qu.: 135.00 3rd Qu.: 9.000 3rd Qu.:-105.0 3rd Qu.:-105.00
## Max. : 235.00 Max. :15.000 Max. : 122.0 Max. : 122.00
## NA's :9722
## oppMoneyLine oppRunLine oppRunLineOdds totalRuns
## Min. :-550.00 Min. :-1.5000 Min. :-270.00 Min. : 1.000
## 1st Qu.:-141.00 1st Qu.:-1.5000 1st Qu.:-155.00 1st Qu.: 5.000
## Median :-106.00 Median : 1.5000 Median :-110.00 Median : 8.000
## Mean : -19.42 Mean : 0.0001 Mean : -20.62 Mean : 8.849
## 3rd Qu.: 126.00 3rd Qu.: 1.5000 3rd Qu.: 135.00 3rd Qu.:11.000
## Max. : 425.00 Max. : 1.5000 Max. : 235.00 Max. :38.000
## NA's :438 NA's :9722
This gave me a sample of 45,530 rows of data representing each team involved in 22,765 games. I selected a sample row from our data to help explain each type of line. I then started my analysis by calculating the count of each type of line, how often the favorite covered this line, and the percentage of time the line covered. I used this data to create a summary table and a plot for each of our line types.
# Select money line example and create table:
moneyLineExample <- kable(data[10, c(3, 4, 7:15)], format = "pipe")
# Perform calculations for money line and generate HTML table:
moneyLineSumTbl <- data %>%
reframe(
Money_Line = c("favorites", "dogs", "even"),
Count = c(
sum(moneyLine < oppMoneyLine),
sum(moneyLine > oppMoneyLine),
sum(moneyLine == oppMoneyLine)
),
Cover = c(
sum(moneyLine < oppMoneyLine & runs > oppRuns),
sum(moneyLine > oppMoneyLine & runs > oppRuns),
"-"
),
Percentage = c(
round(
sum(runs > oppRuns & moneyLine < oppMoneyLine, na.rm = TRUE) /
sum(moneyLine < oppMoneyLine, na.rm = TRUE) * 100,
1
),
round(
sum(runs > oppRuns & moneyLine > oppMoneyLine, na.rm = TRUE) /
sum(moneyLine > oppMoneyLine, na.rm = TRUE) * 100,
1
),
"-"
)
) %>%
kable("html") %>%
kable_styling(full_width = FALSE, bootstrap_options = "striped", position = "center") %>%
row_spec(0, extra_css = "border-bottom: 2px solid black;") %>%
add_header_above(c("Money Line Summary" = 4))
# Generate money line plot (win percentage by money line):
moneyLinePlot1 <- data %>%
group_by(moneyLine) %>%
summarise(
Count = n(),
Avg_Difference = round(mean(runs - oppRuns), 2),
Std_Deviation = round(sd(runs - oppRuns), 2),
Percentage = mean(runs > oppRuns, na.rm = TRUE)
) %>%
filter(Count >= 20) %>%
ggplot(aes(factor(moneyLine), Percentage)) +
geom_point(color = "dodgerblue") +
labs(x = "Money Line", y = "Win Percentage") +
ggtitle("Win Percentage by Money Line")+
theme_minimal() +
theme(panel.background = element_rect(fill = "honeydew", color = "black", linewidth = 2))+
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10) ) +
scale_x_discrete(breaks = c(-350, -260, -210, -191, -180, -170, -160, -150,
-140, -130, -120, -110, 100, 110, 120, 130, 140,
150, 160, 173, 195, 245))
#######################################################################################
# Select game total example and create table:
totalExample <- kable(data[10021, c(3, 4, 7:15)], format = "pipe")
# game total summary:
totalSummary <- data %>%
filter(runs > oppRuns) %>%
summarise(count = sum(!is.na(total)),
totalOverFav = sum(overOdds < underOdds),
totalUnderFav = sum(overOdds > underOdds),
totalEven = sum(overOdds == underOdds),
overCover = sum(overOdds < underOdds & totalRuns > total),
underCover = sum(underOdds < overOdds & totalRuns < total),
totalCover = sum((overOdds < underOdds & totalRuns > total) | (underOdds < overOdds & totalRuns < total)),
notCover = sum((overOdds < underOdds & totalRuns < total) | (underOdds < overOdds & totalRuns > total)),
push = sum(total == totalRuns),
overCoverPct = round((overCover / totalOverFav)*100, 1),
underCoverPct = round((underCover / totalUnderFav)*100, 1),
coverPct = round((totalCover / (totalCover + notCover + push))*100, 1),
notCoverPct = round((notCover / (totalCover + notCover + push))*100, 1),
pushPct = round((push / (totalCover + notCover + push))*100, 1),
totalLines = totalOverFav + totalUnderFav
)
# Create game total summary table:
totalSummaryTable <- data.frame(
Total = c("over", "under", "even", "total*"),
Count = c(totalSummary$totalOverFav, totalSummary$totalUnderFav,
totalSummary$totalEven, totalSummary$totalLines),
Cover = c(totalSummary$overCover, totalSummary$underCover, "-", totalSummary$totalCover),
Percentage = c(totalSummary$overCoverPct, totalSummary$underCoverPct, "-", totalSummary$coverPct)
)
# Generate html table:
totalSumTbl <- totalSummaryTable %>%
kable("html") %>%
kable_styling(full_width = FALSE, bootstrap_options = "striped", position = "center") %>%
row_spec(0, extra_css = "border-bottom: 2px solid black;") %>%
add_header_above(c("Total Summary" = 4))%>%
add_footnote("excludes even odds.", escape = FALSE)
# Generate total odds plot (cover percentage by total odds)
totalPlot1 <- data %>%
group_by(overOdds) %>%
summarise(
count = n(),
overCover = sum(totalRuns > total),
overMiss = sum(totalRuns < total),
overPush = sum(totalRuns == total)
) %>%
filter(count >= 20) %>%
transmute(
"Odds" = overOdds,
"count" = count,
"Percentage Covered" = round(overCover / (overCover + overMiss + overPush), 2),
"Cover" = overCover,
"Non Cover" = overMiss,
"Push" = overPush,
"Favorite" = "Over" # Add a new column to identify the favorite
) %>%
bind_rows(data %>%
group_by(underOdds) %>%
summarise(
count = n(),
underCover = sum(totalRuns < total),
underMiss = sum(totalRuns > total),
underPush = sum(totalRuns == total)
) %>%
filter(count >= 20) %>%
transmute(
"Odds" = underOdds,
"count" = count,
"Percentage Covered" = round(underCover / (underCover + underMiss + underPush), 2),
"Cover" = underCover,
"Non Cover" = underMiss,
"Push" = underPush,
"Favorite" = "Under"
)) %>%
ggplot(aes(as.factor(Odds), `Percentage Covered`, color = Favorite)) +
geom_point() +
labs(title = "Cover Percentage by Total Odds") +
scale_color_manual(values = c("Over" = "dodgerblue", "Under" = "red")) +
theme_minimal() +
theme(panel.background = element_rect(fill = "honeydew", color = "black", linetype = 1))+
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10))+
scale_x_discrete(labels = function(x) ifelse(seq_along(x) %% 2 == 0, x, ""))+
xlab("Total Odds") +
theme(legend.position = c(0.9, 0.8),
legend.box.background = element_rect(color = "black", linewidth = 1)
)
#######################################################################################
# Select run line example and create table:
runLineExample <- kable(data[20187, c(3, 4, 7:15)], format = "pipe")
runLineSummary <-data %>%
filter(!is.na(runLineOdds)) %>%
summarise(runLineFavCount = sum(runLine < oppRunLine),
count = n(),
runLineFavRunLineOddsFavCount = sum(runLine < oppRunLine & runLineOdds < oppRunLineOdds),
runLineFavRunLineOddsDogCount = sum(runLine < oppRunLine & runLineOdds > oppRunLineOdds),
runLineFavRunLineOddsFavCov = sum(runLine < oppRunLine & runLineOdds < oppRunLineOdds &
((oppRuns - runs) < runLine)),
runLineFavRunLineOddsDogCov = sum(runLine < oppRunLine & runLineOdds > oppRunLineOdds &
((oppRuns - runs) < runLine)),
runLineFavRunLineOddsFavCovPct = round(runLineFavRunLineOddsFavCov / runLineFavRunLineOddsFavCount * 100, 1),
runLineFavRunLineOddsDogCovPct = round(runLineFavRunLineOddsDogCov / runLineFavRunLineOddsDogCount * 100, 1),
runLineFavTotalCoverPct = round(((runLineFavRunLineOddsFavCov + runLineFavRunLineOddsDogCov) /
(runLineFavRunLineOddsFavCount + runLineFavRunLineOddsDogCount)) * 100, 1),
runLineDogCount = sum(runLine > oppRunLine),
runLineDogRunLineOddsFavCount = sum(runLine > oppRunLine & runLineOdds < oppRunLineOdds),
runLineDogRunLineOddsDogCount = sum(runLine > oppRunLine & runLineOdds > oppRunLineOdds),
runLineDogRunLineOddsFavCov = sum(runLine > oppRunLine & runLineOdds < oppRunLineOdds &
((oppRuns - runs) < runLine)),
runLineDogRunLineOddsDogCov = sum(runLine > oppRunLine & runLineOdds > oppRunLineOdds &
((oppRuns - runs) < runLine)),
runLineDogRunLineOddsFavCovPct = round(runLineDogRunLineOddsFavCov / runLineDogRunLineOddsFavCount * 100, 1),
runLineDogRunLineOddsDogCovPct = round(runLineDogRunLineOddsDogCov / runLineDogRunLineOddsDogCount * 100, 1),
runLineDogTotalCoverPct = round(((runLineDogRunLineOddsFavCov + runLineDogRunLineOddsDogCov) /
(runLineDogRunLineOddsFavCount + runLineDogRunLineOddsDogCount)) * 100, 1),
runLineOddsCount = sum(!is.na(runLineOdds)),
runLineOddsFavCount = sum(runLineOdds < oppRunLineOdds),
runLineOddsDogCount = sum(runLineOdds > oppRunLineOdds),
runLineOddsCoverCount = sum((runLineOdds < oppRunLineOdds & (oppRuns - runs) < runLine) |
(runLineOdds > oppRunLineOdds & (oppRuns - runs) < runLine)),
totalRunLineOddsFavCov = sum(runLineOdds < oppRunLineOdds & (oppRuns - runs) < runLine),
totalRunLineOddsDogCov = sum(runLineOdds > oppRunLineOdds & (oppRuns - runs) < runLine),
totalRunLineOddsDogCovPct = round((totalRunLineOddsDogCov / runLineOddsDogCount) * 100, 1),
totalRunLineOddsFavCovPct = round((totalRunLineOddsFavCov / runLineOddsFavCount) * 100, 1),
runLineOddsDogCoverPct = round((runLineFavRunLineOddsDogCov + runLineDogRunLineOddsDogCov) /
runLineOddsDogCount * 100, 1)
)
# Create run line summary table:
runLineSummaryTable <- data.frame(
Run_Line = c("-1.5", "1.5", "-"),
count = c(runLineSummary$runLineFavCount, runLineSummary$runLineDogCount, runLineSummary$runLineOddsCount),
oddsFavorite = c(runLineSummary$runLineFavRunLineOddsFavCount, runLineSummary$runLineDogRunLineOddsFavCount,
(runLineSummary$runLineFavRunLineOddsFavCount + runLineSummary$runLineDogRunLineOddsFavCount)),
oddsFavCovers = c(runLineSummary$runLineFavRunLineOddsFavCov, runLineSummary$runLineDogRunLineOddsFavCov,
(runLineSummary$runLineFavRunLineOddsFavCov + runLineSummary$runLineDogRunLineOddsFavCov)),
oddsFavCoverPct = c(runLineSummary$runLineFavRunLineOddsFavCovPct, runLineSummary$runLineDogRunLineOddsFavCovPct,
runLineSummary$totalRunLineOddsFavCovPct),
oddsDog = c(runLineSummary$runLineFavRunLineOddsDogCount, runLineSummary$runLineDogRunLineOddsDogCount,
(runLineSummary$runLineFavRunLineOddsDogCount + runLineSummary$runLineDogRunLineOddsDogCount)),
oddsDogCovers = c(runLineSummary$runLineFavRunLineOddsDogCov, runLineSummary$runLineDogRunLineOddsDogCov,
(runLineSummary$runLineFavRunLineOddsDogCov + runLineSummary$runLineDogRunLineOddsDogCov)),
oddsDogCoverPct = c(runLineSummary$runLineFavRunLineOddsDogCovPct, runLineSummary$runLineDogRunLineOddsDogCovPct,
runLineSummary$runLineOddsDogCoverPct),
totalCoversPct = c(
round((((runLineSummary$runLineFavRunLineOddsFavCov + runLineSummary$runLineFavRunLineOddsDogCov) /
(runLineSummary$runLineFavRunLineOddsFavCount + runLineSummary$runLineFavRunLineOddsDogCount)) * 100), 1),
round(((runLineSummary$runLineDogRunLineOddsFavCov + runLineSummary$runLineDogRunLineOddsDogCov) /
(runLineSummary$runLineDogRunLineOddsFavCount + runLineSummary$runLineDogRunLineOddsDogCount)) * 100, 1),
round(((runLineSummary$runLineDogRunLineOddsFavCov + runLineSummary$runLineDogRunLineOddsDogCov +
runLineSummary$runLineFavRunLineOddsFavCov + runLineSummary$runLineFavRunLineOddsDogCov) /
(runLineSummary$runLineDogRunLineOddsFavCount + runLineSummary$runLineDogRunLineOddsDogCount +
runLineSummary$runLineDogRunLineOddsFavCount + runLineSummary$runLineDogRunLineOddsDogCount)) *100, 1)
)
)
# Rename columns:
colnames(runLineSummaryTable) <- c("Run Line", "Count", "Favorites", "Cover", "Percentage",
"Dogs", "Cover", "Percentage", "Total Percentage")
# Create html table:
runLineSumTbl <- runLineSummaryTable %>%
kable("html") %>%
kable_styling(full_width = FALSE, bootstrap_options = "striped", position = "center") %>%
row_spec(0, extra_css = "border-bottom: 2px solid black;") %>%
add_header_above(c("Run Line Summary" = 9))
# Plot percentage of run lines covered by run line odds:
runLinePlot1 <- data %>%
filter(!is.na(runLineOdds) & runLine < 0) %>%
group_by(runLineOdds, runLine) %>%
summarise(
Count = n(),
Percentage = mean(oppRuns - runs < runLine, na.rm = TRUE)) %>%
filter(Count >= 20) %>%
transmute(
"Odds" = runLineOdds,
"Count" = Count,
"Percentage Covered" = Percentage,
"Run Line" = "Favorite"
) %>%
bind_rows(data %>%
filter(!is.na(runLineOdds) & runLine > 0) %>%
group_by(runLineOdds, runLine) %>%
summarise(
Count = n(),
Percentage = mean(oppRuns - runs < runLine, na.rm = TRUE)) %>%
filter(Count >= 20) %>%
transmute(
"Odds" = runLineOdds,
"Count" = Count,
"Percentage Covered" = Percentage,
"Run Line" = "Dog"
)) %>%
ggplot(aes(as.factor(Odds), `Percentage Covered`, color = `Run Line`)) +
geom_point() +
labs(title = "Cover Percentage by Run Line Odds") +
scale_color_manual(values = c("Favorite" = "dodgerblue", "Dog" = "red")) +
theme_minimal() +
theme(panel.background = element_rect(fill = "honeydew", color = "black", linetype = 1))+
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10))+
scale_x_discrete(labels = function(x) ifelse(seq_along(x) %% 5 == 0, x, ""))+
xlab("Run Line Odds") +
theme(legend.position = c(0.9, 0.8),
legend.box.background = element_rect(color = "black", linetype = 1)
)
Share
I can then present the results of my analysis in a flexdashboard.
Act
My recommendations for the client are included in the ‘Summary’ tab of my dashboard.