According to an article published by Front Office Sports, the market for sports betting doubled in 2021 and has surpassed $52 billion.
Sports betting is new and lucrative right now: Many states are still in the process of legislating on sports betting, with 11 states legalizing sports betting in the last year.
So, what’s a good betting strategy?
Let’s look at historic college basketball data, specifically 2013-2021 Division I basketball teams.
High rank correlates with more wins, right? BARTHAG: Power Rating (Chance of beating an average Division I team)
A risky, but profitable bet would be to vote on a volatile team, one that unexpectedly wins.
| TEAM | CONF | G | W | ADJOE | ADJDE | BARTHAG | EFG_O | EFG_D | TOR | TORD | ORB | DRB | FTR | FTRD | X2P_O | X2P_D | X3P_O | X3P_D | ADJ_T | WAB | POSTSEASON | SEED | YEAR |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| North Carolina | ACC | 40 | 33 | 123.3 | 94.9 | 0.9531 | 52.6 | 48.1 | 15.4 | 18.2 | 40.7 | 30.0 | 32.3 | 30.4 | 53.9 | 44.6 | 32.7 | 36.2 | 71.7 | 8.6 | 2ND | 1 | 2016 |
| Wisconsin | B10 | 40 | 36 | 129.1 | 93.6 | 0.9758 | 54.8 | 47.7 | 12.4 | 15.8 | 32.1 | 23.7 | 36.2 | 22.4 | 54.8 | 44.7 | 36.5 | 37.5 | 59.3 | 11.3 | 2ND | 1 | 2015 |
| Michigan | B10 | 40 | 33 | 114.4 | 90.4 | 0.9375 | 53.9 | 47.7 | 14.0 | 19.5 | 25.5 | 24.9 | 30.7 | 30.0 | 54.7 | 46.8 | 35.2 | 33.2 | 65.9 | 6.9 | 2ND | 3 | 2018 |
| Texas Tech | B12 | 38 | 31 | 115.2 | 85.2 | 0.9696 | 53.5 | 43.0 | 17.7 | 22.8 | 27.4 | 28.7 | 32.9 | 36.6 | 52.8 | 41.9 | 36.5 | 29.7 | 67.5 | 7.0 | 2ND | 3 | 2019 |
| Gonzaga | WCC | 39 | 37 | 117.8 | 86.3 | 0.9728 | 56.6 | 41.1 | 16.2 | 17.1 | 30.0 | 26.2 | 39.0 | 26.9 | 56.3 | 40.0 | 38.2 | 29.0 | 71.5 | 7.7 | 2ND | 1 | 2017 |
| Kentucky | SEC | 40 | 29 | 117.2 | 96.2 | 0.9062 | 49.9 | 46.0 | 18.1 | 16.1 | 42.0 | 29.7 | 51.8 | 36.8 | 50.0 | 44.9 | 33.2 | 32.2 | 65.9 | 3.9 | 2ND | 8 | 2014 |
We can use power rating from year to year to calculate the volatility of each team. Does their power rating, ie performance, vary from year to year? If yes, these are the teams we want to bet on. These teams are the underdogs, which will earn you the $!
Calculate the volatility by team: sqrt(variance)sqrt(years)
Classify as high or low volatility based on the third quartile
| TEAM | YEAR | BARTHAG | VOL | volatility_index |
|---|---|---|---|---|
| North Carolina | 2016 | 0.9531 | 0.0841434 | Low |
| Wisconsin | 2015 | 0.9758 | 0.1919160 | Low |
| Michigan | 2018 | 0.9375 | 0.1622063 | Low |
| Texas Tech | 2019 | 0.9696 | 0.5517687 | High |
| Gonzaga | 2017 | 0.9728 | 0.0722673 | Low |
| Kentucky | 2014 | 0.9062 | 0.1531929 | Low |
Can we predict high volatility teams well via a random forest?
Five Number Summary of Volatility
| Statistic | Value |
|---|---|
| Min. | 0.0360298 |
| 1st Qu. | 0.2165824 |
| Median | 0.3010490 |
| Mean | 0.3153891 |
| 3rd Qu. | 0.4039958 |
| Max. | 0.7807641 |
Prevalence of Target Variable: Volatility (High or Low) Recall High Volatility is considered the top quartile of volatility scores.
| Var1 | Freq |
|---|---|
| High | 620 |
| Low | 1833 |
Prevalence = 25.27%
We want to build a random forest that is really good at accurately predicting highly volatile teams
Random forest classifer with volatility index (High or Low) as the response.
Use this model to predict and pinpoint highly volatile teams, which would make for risky and successful bets.
Our metric here is true positive rate: we want to be really good at predicting highly volatile teams (low error rate in the positive class).
Metrics of Interest:
| High | Low | class.error | |
|---|---|---|---|
| High | 0 | 434 | 1 |
| Low | 0 | 1284 | 0 |
74.6% accuracy: misleading OOB Error: 25.26% Want to adjust the threshold.
Random Forest Hyperparameters:
| High | Low | class.error | |
|---|---|---|---|
| High | 157 | 277 | 0.6382488 |
| Low | 172 | 1112 | 0.1339564 |
randomForest(formula = as.factor(volatility_index) ~ ., data = train, ntree = 1000, mtry = 4, replace = TRUE, sampsize = 100, nodesize = 5, importance = TRUE, cutoff = c(0.3, 1 - 0.3))
Problematic becuase we predict all as high volatility
Positive class error rate minimized, but not useful for betting
| High | Low | class.error | |
|---|---|---|---|
| High | 430 | 4 | 0.0092166 |
| Low | 1236 | 48 | 0.9626168 |
## Confusion Matrix and Statistics
##
## Actual
## Prediction High Low
## High 30 46
## Low 63 228
##
## Accuracy : 0.703
## 95% CI : (0.6534, 0.7493)
## No Information Rate : 0.7466
## P-Value [Acc > NIR] : 0.9747
##
## Kappa : 0.1646
##
## Mcnemar's Test P-Value : 0.1254
##
## Sensitivity : 0.32258
## Specificity : 0.83212
## Pos Pred Value : 0.39474
## Neg Pred Value : 0.78351
## Precision : 0.39474
## Recall : 0.32258
## F1 : 0.35503
## Prevalence : 0.25341
## Detection Rate : 0.08174
## Detection Prevalence : 0.20708
## Balanced Accuracy : 0.57735
##
## 'Positive' Class : High
##
Our accuracy is the same as the prevalence.
Betting and team success isn’t predictable, and that’s why it’s fun!
[]()
https://www.kaggle.com/datasets/andrewsundberg/college-basketball-dataset https://rpubs.com/phamdinhkhanh/389752 https://stackoverflow.com/questions/57939453/building-a-randomforest-with-caret https://stackoverflow.com/questions/25433805/how-do-you-change-the-cutoff-parameter-in-rs-randomforest https://www.hackerearth.com/practice/machine-learning/machine-learning-algorithms/tutorial-random-forest-parameter-tuning-r/tutorial/