This is part of a series of pages related to EPL Away Wins:

After the exploratory data analysis, we will now investigate modelling for EPL data (using tidymodels). The available data set contains 6508 matches. We will remove the latest 20% of matches for use in assessing performance as a test set. Here are the first 10 rows of the data.

We will remove rows with missing data from selected features. We lose 2% of the data by removing missing rows.

Feature Creation

We create new features for Summed Features, i.e. sum of the last 4 shots on target, corners, fouls, goals scored, goals conceded for the Home team, Away team and their opponents.

Take out Test Data

We remove the final 20% of results to act as the test data. We also randomly define the stratified 5-fold crossvalidation splits for the training data.

Training data rows = 5114 
Proportion of Away Wins = 0.279 

Set up Pre-treatment Recipes

Firstly, we define the variables and model formula.

We have a formula of [AwayWin ~ sum_HST + sum_HC + sum_HGS + sum_HGC + sum_AST + sum_AC + sum_AF + sum_AGS + sum_AGC + sum_HoppST + sum_HoppC + sum_AoppST + sum_AoppC + winpc_H + winpc_A + top6_perfH + top6_perfA + Dist + Date + HomeFin1 + HomeFin2 + AwayFin1 + AwayFin2].

Then we define the pretreatment recipes. This includes removing correlated numeric variables, normalizing all numeric variables and turning categorical variables into dummy variables. As well as Month, we will create dummies for Day of Week.

recipe <-
  train %>%
  recipe(formula) %>%
  step_corr(all_numeric()) %>%
  step_normalize(all_numeric()) %>%
  step_date(Date, features = c("dow", "month")) %>%
  step_rm(Date) %>%
  step_dummy(Date_dow, Date_month)

What does the subsequent training frame look like?

Feature Importance

To estimate the relative importance of the features, we fit an unoptimised random forest model to the training data and determine variable importance from the model.

We can see that all the Date variables have relatively little importance when predicting Away wins. We will remove the Date variables going forward.

Generalized Linear Model

We will create a tuning workflow in order to optimise hyperparameters for our logistic regression model. As explained above, we will first redefine our pretreatment recipe to remove the Date features.

recipe <-
  train %>%
  recipe(formula) %>%
  step_corr(all_numeric()) %>%
  step_normalize(all_numeric()) %>%
  step_rm(Date)

We will be optimising mixture and penalty values for our glmnet model, using crossvalidation results of our training data. AUC for the ROC curve and accuracy metrics will be evaluated for each set of predictions.

model_glm <- 
  logistic_reg(penalty = tune(), mixture = tune()) %>% 
  set_engine("glmnet")

grid_glm <- 
  grid_max_entropy(
    penalty(range = c(0.001, 0.1), trans = NULL),
    mixture(range = c(0, 1)),
    size = 30)

my_metrics <- metric_set(roc_auc, accuracy)

After crossvalidating for a range of mixture and penalty values, let’s plot values of ROC AUC and list the best models.

Now let’s look at the accuracy metrics.

We see some differences between the models. Lower mixture and penalty values seem to give more consistently high AUC results. To obtain the most general model, it is better to choose a higher penalty value that gives a performance similar to the best performance. In terms of AUC, a penalty of around 0.03 would seem to be suitable. Now we will choose values of mixture = 0.3 and penalty = 0.03, train a model using the training data and fit the model to the test data. How does the ROC curve look for the test set and what is the AUC of the fit?

AUC value for best model is 0.756412

Potential Betting Strategies

There are two options for betting on matches, either Bet the Away win (predict it will happen) or Lay the Away win (predict it won’t happen).

Betting Probability Limits

One strategy for betting is to define a betting limit, where bets are placed when the predicted probability of success is greater than the defined limit. If we look at the test data, we can see what the outcome would have been if we had used a particular betting limit for this data. Note that there is no guarantee at all that the same outcome would be achieved for future data.

Away Win Bets

Looking at a range of betting limits for Away Win Bets gives the following results for the test set.

For example, if we had used a prediction limit of 0.55, we would have made 86 bets with resulting precision of 79% correct and a total profit of 9.38 with a one dollar stake per bet. Total stake would have been 86, so profit ratio would have been 10.9% of total stake.

For this data, the best profit would have been made with a betting limit of 0.56 (profit 16.8% of stake).

Away Win Lays

Looking at a range of betting limits for Away Win Lays gives the following results for the test set. Note that historic Lay odds are not generally available, so a correction has been made to the average Away win odds Lay odds (e.g. those available at Betfair) are generally higher than the average Away win odds.

A limit of 0.91 would have given profit of 49.97 from 64 bets with a precision of 98% correct. The total required stake for this profit would have been 1417, so profit ratio would have been 3.5%.

Prediction to Odds Probability Ratio

If we calculate the probability of the result implied by the betting odds and compare with the probability determined by the model, we can examine the ratio of model probability / implied odds probability. This has the potential to show where we may have an advantage over the bookmaker odds.

Away Win Bets

Let’s have a look at how the ratio of my prediction / implied bet probability affects the Bet metrics.

For this data, if we had used a ratio limit of 1.8, we would have made 75 bets for a profit of 36.73, with a precision of 11%. So a total stake of 75 and profit ratio of 49%.

Away Win Lays

Let’s have a look at how the ratio of my prediction / implied lay probability affects the Lay metrics.

if we had used a ratio limit of 1.00, we would have made 593 bets for a profit of 61.35, with a precision of 59% correct. This would have required a total stake of 1017 and profit ratio of 6.0%.


End

---
title: "EPL Data Modelling for Away Win Bets and Lays using Logistic Regression"
output: html_notebook
---

This is part of a series of pages related to EPL Away Wins:  

* [Exploratory Data Analysis](https://rpubs.com/GarethChad/EPL_EDA)
* Data Modelling (this page)
* [Betting Strategies](https://rpubs.com/GarethChad/EPL_strategy)


After the exploratory data analysis, we will now investigate modelling for EPL data (using tidymodels). The available data set contains 6508 matches. We will remove the latest 20% of matches for use in assessing performance as a test set. Here are the first 10 rows of the data.  

```{r Initialise and Load Data, include=FALSE}
rm(list = ls())

library(tidyverse)
library(tidymodels)
library(tune)
library(workflows)

options(tidymodels.dark = TRUE)

load(here::here("data/interim/En_features.RData"))

orig_rows <- nrow(En_features)
En_features <- En_features %>%
  filter(is.na(top6_perfH) == FALSE,
         is.na(top6_perfA) == FALSE)

# val_set <- En_features[6001:6508, ]
# En_features <- En_features[1:6000, ]

```

```{r Functions, include=FALSE}
add_features <- function(df){
  
  # add Dist <50 and >310
  # df <- df %>%
  #   mutate(Dist_50 = as.numeric(Dist < 50), 
  #          Dist_310 = as.numeric(Dist > 310))
  
  # summed features
  df <- df %>%
    mutate(sum_HST = LastHST1 + LastHST2 + LastHST3 + LastHST4,
           sum_HC = LastHC1 + LastHC2 + LastHC3 + LastHC4,
           sum_HF = LastHF1 + LastHF2 + LastHF3 + LastHF4,
           sum_HGS = LastHGS1 + LastHGS2 + LastHGS3 + LastHGS4,
           sum_HGC = LastHGC1 + LastHGC2 + LastHGC3 + LastHGC4,
           sum_AST = LastAST1 + LastAST2 + LastAST3 + LastAST4,
           sum_AC = LastAC1 + LastAC2 + LastAC3 + LastAC4,
           sum_AF = LastAF1 + LastAF2 + LastAF3 + LastAF4,
           sum_AGS = LastAGS1 + LastAGS2 + LastAGS3 + LastAGS4,
           sum_AGC = LastAGC1 + LastAGC2 + LastAGC3 + LastAGC4,
           sum_HoppST = LastHoppST1 + LastHoppST2 + LastHoppST3 + LastHoppST4,
           sum_HoppC = LastHoppC1 + LastHoppC2 + LastHoppC3 + LastHoppC4,
           sum_HoppF = LastHoppF1 + LastHoppF2 + LastHoppF3 + LastHoppF4,
           sum_AoppST = LastAoppST1 + LastAoppST2 + LastAoppST3 + LastAoppST4,
           sum_AoppC = LastAoppC1 + LastAoppC2 + LastAoppC3 + LastAoppC4,
           sum_AoppF = LastAoppF1 + LastAoppF2 + LastAoppF3 + LastAoppF4)
  
  return(df)
}

metrics_plot <- function(model_fit, metric){
  
  data <- model_fit$.metrics
  params <- names(data[[1]])[!(names(data[[1]]) %in% names(data[[1]])[grep("^\\.", names(data[[1]]))])]
  
  fit_metrics <- unique(data[[1]]$.metric)
  
  if (!(metric %in% fit_metrics)){
    return("no metric by that name")
  }
  
  mean_data <- data[[1]] %>% select(-.estimate)
  
  mean_data$.estimate <- data %>%
    map(".estimate") %>%
    pmap(.f = mean) %>%
    unlist()
  
  plot_data <- mean_data %>%
    filter(.metric == metric) %>%
    pivot_longer(cols = all_of(params), names_to = "param", values_to = "value")
  
  plot <- ggplot(plot_data, aes(value, .estimate)) + geom_point() +
    theme_bw() + facet_wrap(~ param, scales = "free_x") + labs(y = metric)
  
  return(plot)
}


```

```{r echo=FALSE}
head(En_features %>% select(HomeTeam, AwayTeam, Date, FTHG, FTAG, HS, HST, HF, HC, HY, AS, AST, AF, AC, AY), 10)
```

We will remove rows with missing data from selected features. We lose `r round(((1 - (nrow(En_features) / orig_rows)) * 100), 0)`% of the data by removing missing rows.  

## <span style="color:teal;">Feature Creation</span>  

We create new features for **Summed Features**, i.e. sum of the last 4 shots on target, corners, fouls, goals scored, goals conceded for the Home team, Away team and their opponents.  


```{r Filter and New Features, echo=FALSE, message=FALSE, warning=FALSE}
En_features <- En_features %>%
  mutate(Season = index %/% 1000,
         AwayWin = as.factor(AwayWin))

En_features <- add_features(En_features)

```

## <span style="color:teal;">Take out Test Data</span>  

We remove the final 20% of results to act as the test data. We also randomly define the stratified 5-fold crossvalidation splits for the training data.    

```{r Remove Test Sets, echo=FALSE}
time_split <- En_features %>%
  initial_time_split(prop = 4/5, lag = 0)

test <- testing(time_split)
train <- training(time_split)
cv <- vfold_cv(train, v = 5, strata = "AwayWin")

cat("Training data rows =", nrow(train), "\n")
cat("Proportion of Away Wins =", round(mean(as.numeric(as.character(train$AwayWin))), 3), "\n")

```


## <span style="color:teal;">Set up Pre-treatment Recipes</span>  

Firstly, we define the variables and model formula.  

```{r Included Variables, include=FALSE}

var_set_1 <- c("sum_HST", "sum_HC", "sum_HGS", "sum_HGC", "sum_AST", "sum_AC", "sum_AF", "sum_AGS", "sum_AGC")

var_set_2 <- c("sum_HoppST", "sum_HoppC", "sum_AoppST", "sum_AoppC")

var_set_3 <- c("winpc_H", "winpc_A", "top6_perfH", "top6_perfA", "Dist", "Date", "HomeFin1", "HomeFin2", "AwayFin1", 
               "AwayFin2")

var_set <- c(var_set_1, var_set_2, var_set_3)

formula <- as.formula(paste("AwayWin", paste(var_set, collapse = " + "), sep = " ~ "))

```

We have a formula of [`r paste("AwayWin", paste(var_set, collapse = " + "), sep = " ~ ")`].  

Then we define the pretreatment recipes.  This includes removing correlated numeric variables, normalizing all numeric variables and turning categorical variables into dummy variables. As well as Month, we will create dummies for Day of Week.  

```{r Pretreat 1}
recipe <-
  train %>%
  recipe(formula) %>%
  step_corr(all_numeric()) %>%
  step_normalize(all_numeric()) %>%
  step_date(Date, features = c("dow", "month")) %>%
  step_rm(Date) %>%
  step_dummy(Date_dow, Date_month)

```

What does the subsequent training frame look like?  


```{r echo=FALSE}
train_prep <- recipe %>% prep() %>% juice()
head(train_prep, 10)
```

## <span style="color:teal;">Feature Importance</span>  

To estimate the relative importance of the features, we fit an unoptimised random forest model to the training data and determine variable importance from the model.  

```{r Variable Importance, echo=FALSE, message=FALSE}
simple_rf_model <- 
  rand_forest(trees = 200) %>% 
  set_engine("ranger", importance = "impurity") %>% 
  set_mode("classification")

rf_model_fit <- simple_rf_model %>%
  fit(AwayWin ~ ., data = train_prep)

imp_terms <- rf_model_fit$fit$variable.importance

imp_plot_df <- data.frame(variable = names(imp_terms), importance = imp_terms)
imp_plot_df$variable <- as.character(imp_plot_df$variable)

imp_plot <- ggplot(imp_plot_df, aes(variable, importance)) + geom_point() + coord_flip() +
  theme_bw() + ggtitle("Relative Importance of Variables from Simple Random Forest Model")
ggsave(here::here("plots/imp_plot.png"), dpi = 200)

knitr::include_graphics(here::here("plots/imp_plot.png"))
```

We can see that all the Date variables have relatively little importance when predicting Away wins. We will remove the Date variables going forward.  

## <span style="color:teal;">Generalized Linear Model</span>  

We will create a tuning workflow in order to optimise hyperparameters for our logistic regression model. As explained above, we will first redefine our pretreatment recipe to remove the Date features.  

```{r Pretreat 2}
recipe <-
  train %>%
  recipe(formula) %>%
  step_corr(all_numeric()) %>%
  step_normalize(all_numeric()) %>%
  step_rm(Date)

```

We will be optimising **mixture** and **penalty** values for our glmnet model, using crossvalidation results of our training data. AUC for the ROC curve and accuracy metrics will be evaluated for each set of predictions.  

```{r Workflow GLM}
model_glm <- 
  logistic_reg(penalty = tune(), mixture = tune()) %>% 
  set_engine("glmnet")

grid_glm <- 
  grid_max_entropy(
    penalty(range = c(0.001, 0.1), trans = NULL),
    mixture(range = c(0, 1)),
    size = 30)

my_metrics <- metric_set(roc_auc, accuracy)

```


```{r Tune GLM, include=FALSE}
workflow_glm <- 
  workflow() %>% 
  add_recipe(recipe) %>%
  add_model(model_glm)

glm_fit <- tune_grid(
  workflow_glm,
  resamples = cv,
  grid = grid_glm,
  metrics = my_metrics,
  control = control_grid(verbose = TRUE)
)
```

After crossvalidating for a range of **mixture** and **penalty** values, let's plot values of ROC AUC and list the best models.

```{r echo=FALSE, message=FALSE}
roc_plot <- metrics_plot(glm_fit, metric = "roc_auc")
ggsave(here::here("plots/plot_1.png"), height = 5, dpi = 200)

knitr::include_graphics(here::here("plots/plot_1.png"))

show_best(glm_fit, metric = "roc_auc")

```

Now let's look at the accuracy metrics.  

```{r echo=FALSE, message=FALSE}
acc_plot <- metrics_plot(glm_fit, metric = "accuracy")
ggsave(here::here("plots/plot_2.png"), height = 5, dpi = 200)

knitr::include_graphics(here::here("plots/plot_2.png"))
show_best(glm_fit, metric = "accuracy")

```

We see some differences between the models. Lower **mixture** and **penalty** values seem to give more consistently high AUC results. To obtain the most general model, it is better to choose a higher penalty value that gives a performance similar to the best performance. In terms of AUC, a penalty of around 0.03 would seem to be suitable. Now we will choose values of mixture = 0.3 and penalty = 0.03, train a model using the training data and fit the model to the test data. How does the ROC curve look for the test set and what is the AUC of the fit?  


```{r echo=FALSE, message=FALSE}
tuned_glm <-
  workflow_glm %>% 
  # finalize_workflow(select_best(glm_fit, metric = "roc_auc")) %>% 
  finalize_workflow(tibble(penalty = 0.03, mixture = 0.3)) %>% 
  fit(data = train)

pred_glm <- predict(tuned_glm, test, type = "prob") %>%
  bind_cols(predict(tuned_glm, test)) %>%
  bind_cols(test %>% select(AwayWin, index, BbAvA))

metrics_glm <- pred_glm %>%
  metrics(truth = AwayWin, estimate = .pred_class, .pred_1)

roc_results <- pred_glm %>% 
  roc_curve(truth = AwayWin, .pred_1) %>%
  mutate(false_pos = 1 - specificity)

roc_plot <- ggplot(roc_results, aes(false_pos, sensitivity)) + geom_point(size = 0.5) +
  theme_bw() + ggtitle("ROC Curve for Test Data") + labs(x = "1 - specificity") +
  geom_abline(intercept = c(0, 0), slope = 1, linetype = 2)
ggsave(here::here("plots/plot_3.png"), height = 5, width = 5)

cat("AUC value for best model is", metrics_glm$.estimate[which(metrics_glm$.metric == "roc_auc")])

```
<p align="center">
![](D:/Documents/work/Football/plots/plot_3.png){width=500px}
</p>

## <span style="color:teal;">Potential Betting Strategies</span>  

There are two options for betting on matches, either *Bet* the Away win (predict it will happen) or *Lay* the Away win (predict it won't happen).  

### <span style="color:steelblue;">Betting Probability Limits</span>  

One strategy for betting is to define a betting limit, where bets are placed when the predicted probability of success is greater than the defined limit. If we look at the test data, we can see what the outcome would have been if we had used a particular betting limit for this data.  Note that there is no guarantee at all that the same outcome would be achieved for future data.  

**Away Win Bets** 

Looking at a range of betting limits for Away Win Bets gives the following results for the test set.  

```{r echo=FALSE, message=FALSE}
pred_glm_summary <- pred_glm %>%
  mutate(AwayWin = as.numeric(as.character(AwayWin)),
         AwayLayOdds = (((BbAvA ^ 2) * 0.0050587) + (BbAvA * 1.1828475) - 0.3787671),
         bet_stake = 1,
         impl_prob_bet = 1 / BbAvA,
         bet_prob_ratio = .pred_1 / impl_prob_bet, 
         lay_stake = AwayLayOdds - 1,
         impl_prob_lay = 1 / AwayLayOdds,
         impl_lay = 1 - impl_prob_lay,
         lay_prob_ratio = .pred_0 / impl_lay, 
         bet_profit = (AwayWin * (BbAvA - 1)) - (1 - AwayWin), 
         lay_profit = ((1 - AwayWin) * 0.95) - (AwayWin * (AwayLayOdds - 1)))

glm_bets_summary <- pred_glm_summary %>%
  arrange(-.pred_1) %>%
  mutate(total_correct = cumsum(AwayWin),
         total_bets = seq_len(nrow(pred_glm)),
         cum_precision = total_correct / total_bets,
         cum_profit = cumsum(bet_profit),
         cum_stake = cumsum(bet_stake),
         prof_per_bet = cum_profit / total_bets)

glm_lays_summary <- pred_glm_summary %>%
  arrange(-.pred_0) %>%
  mutate(total_correct = cumsum(1 - AwayWin),
         total_bets = seq_len(nrow(pred_glm)),
         cum_precision = total_correct / total_bets,
         cum_profit = cumsum(lay_profit),
         cum_stake = cumsum(lay_stake),
         prof_per_bet = cum_profit / total_bets)

base_lays_summary <- pred_glm_summary %>%
  arrange(-impl_lay) %>%
  mutate(total_correct = cumsum(1 - AwayWin),
         total_bets = seq_len(nrow(pred_glm)),
         cum_precision = total_correct / total_bets,
         cum_profit = cumsum(lay_profit),
         prof_per_bet = cum_profit / total_bets)


glm_bets_plot_df <- glm_bets_summary %>%
  pivot_longer(cols = c(total_bets, cum_precision, cum_profit, prof_per_bet), names_to = "metric", values_to = "value")

glm_lays_plot_df <- glm_lays_summary %>%
  pivot_longer(cols = c(total_bets, cum_precision, cum_profit, prof_per_bet), names_to = "metric", values_to = "value")

base_lays_plot_df <- base_lays_summary %>%
  pivot_longer(cols = c(total_bets, cum_precision, cum_profit, prof_per_bet), names_to = "metric", values_to = "value")


```

```{r echo=FALSE, message=FALSE}

glm_bets_plot <- ggplot(glm_bets_plot_df, aes(.pred_1, value)) + geom_point(size = 1) + 
  facet_wrap(~ metric, scales = "free") + theme_bw() + ggtitle("Betting Limit for Bets: Logistic Regression") + 
  labs(x = "Betting Limit")
ggsave(here::here("plots/plot_4.png"), dpi = 200)

knitr::include_graphics(here::here("plots/plot_4.png"))

```

For example, if we had used a prediction limit of 0.55, we would have made 86 bets with resulting precision of 79% correct and a total profit of 9.38 with a one dollar stake per bet. Total stake would have been 86, so profit ratio would have been 10.9% of total stake.  

For this data, the best profit would have been made with a betting limit of 0.56 (profit 16.8% of stake).  


**Away Win Lays**  

Looking at a range of betting limits for Away Win Lays gives the following results for the test set. Note that historic Lay odds are not generally available, so a correction has been made to the average Away win odds Lay odds (e.g. those available at Betfair) are generally higher than the average Away win odds.  

```{r echo=FALSE, message=FALSE}
glm_lays_plot <- ggplot(glm_lays_plot_df, aes(.pred_0, value)) + geom_point(size = 1) + 
  facet_wrap(~ metric, scales = "free") + theme_bw() + ggtitle("Betting Limit for Lays: Logistic Regression") +
  labs(x = "Betting Limit")
ggsave(here::here("plots/plot_5.png"), dpi = 200)

# base_lays_plot <- ggplot(base_lays_plot_df, aes(impl_lay, value)) + geom_point(size = 1) + 
#   facet_wrap(~ metric, scales = "free") + theme_bw() + ggtitle("Prediction Limit for Lays: Using Odds Only")
# ggsave(here::here("plots/plot_3.png"), dpi = 600)

knitr::include_graphics(here::here("plots/plot_5.png"))
# knitr::include_graphics(here::here("plots/plot_3.png"))

```

A limit of 0.91 would have given profit of 49.97 from 64 bets with a precision of 98% correct.  The total required stake for this profit would have been 1417, so profit ratio would have been 3.5%.

### <span style="color:steelblue;">Prediction to Odds Probability Ratio</span>  

If we calculate the probability of the result implied by the betting odds and compare with the probability determined by the model, we can examine the ratio of model probability / implied odds probability. This has the potential to show where we may have an advantage over the bookmaker odds.  

**Away Win Bets**  

Let's have a look at how the ratio of my prediction / implied bet probability affects the Bet metrics.

```{r echo=FALSE, message=FALSE}

ratio_bets_summary <- pred_glm_summary %>%
  arrange(-bet_prob_ratio) %>%
  mutate(total_correct = cumsum(AwayWin),
         total_bets = seq_len(nrow(pred_glm)),
         cum_precision = total_correct / total_bets,
         cum_profit = cumsum(bet_profit),
         prof_per_bet = cum_profit / total_bets)

ratio_bets_plot_df <- ratio_bets_summary %>%
  pivot_longer(cols = c(total_bets, cum_precision, cum_profit, prof_per_bet), names_to = "metric", values_to = "value")

ratio_bets_plot <- ggplot(ratio_bets_plot_df, aes(bet_prob_ratio, value)) + geom_point(size = 1) + 
  facet_wrap(~ metric, scales = "free") + theme_bw() + xlim(0, 4) + 
  ggtitle("Probability Ratio Limit for Bets: Logistic Regression") + labs(x = "Probability Ratio")
ggsave(here::here("plots/plot_6.png"), dpi = 200)

knitr::include_graphics(here::here("plots/plot_6.png"))

```

For this data, if we had used a ratio limit of 1.8, we would have made 75 bets for a profit of 36.73, with a precision of 11%. So a total stake of 75 and profit ratio of 49%.  

**Away Win Lays**  

Let's have a look at how the ratio of my prediction / implied lay probability affects the Lay metrics.

```{r echo=FALSE, message=FALSE}

ratio_lays_summary <- pred_glm_summary %>%
  arrange(-lay_prob_ratio) %>%
  mutate(total_correct = cumsum(1 - AwayWin),
         total_bets = seq_len(nrow(pred_glm)),
         cum_precision = total_correct / total_bets,
         cum_profit = cumsum(lay_profit),
         cum_stake = cumsum(lay_stake),
         prof_per_bet = cum_profit / total_bets)

ratio_lays_plot_df <- ratio_lays_summary %>%
  pivot_longer(cols = c(total_bets, cum_precision, cum_profit, prof_per_bet), names_to = "metric", values_to = "value")

ratio_lays_plot <- ggplot(ratio_lays_plot_df, aes(lay_prob_ratio, value)) + geom_point(size = 1) + 
  facet_wrap(~ metric, scales = "free") + theme_bw() + xlim(0, 4) + 
  ggtitle("Probability Ratio Limit for Lays: Logistic Regression") + labs(x = "Probability Ratio")
ggsave(here::here("plots/plot_7.png"), dpi = 200)

knitr::include_graphics(here::here("plots/plot_7.png"))

```

if we had used a ratio limit of 1.00, we would have made 593 bets for a profit of 61.35, with a precision of 59% correct. This would have required a total stake of 1017 and profit ratio of 6.0%.  


***

End


