Since we’ll not be meeting for a couple of weeks, I wanted to provide
a couple of questions to further embed some of the recent
tidy
tools we’ve demonstrated.
The following is table summarizing college football games. It includes every game between 1869 and 2022.
library(tidyverse)
library(magrittr)
t1 <- "https://github.com/thomasjwood/code_lab/raw/main/data/cfb_tab_1869_2022.rds" %>%
url %>%
readRDS
Most important for our purposes – it includes a suite of variables we can use to predict outcomes–the S&P ratings, the teams’ rankings, and betting spreads and moneylines.
Consider dear Ohio State and the ummmm, just meh University of Tennessee. Generate a variable which groups games by decade. For both schools, report their biggest victories and worst losses (by margin), by decade.
Take the variables home_rate
and away_rate
.
For every year since 2013, report which is more correlated with results:
the difference in these ratings, or the bookies’ point spread
(indicated by home_spread
).
Take the variables away_rank_aptop25
and
home_rank_aptop25
. For games where both teams are
ranked–how well does the difference in rankings predict
results? Disregard the actual margin–instead, simply report how often
the more highly ranked team wins, by different magnitudes in the ranking
differences.