Review exercises

Since we’ll not be meeting for a couple of weeks, I wanted to provide a couple of questions to further embed some of the recent tidy tools we’ve demonstrated.

The following is table summarizing college football games. It includes every game between 1869 and 2022.

library(tidyverse)
library(magrittr)

t1 <- "https://github.com/thomasjwood/code_lab/raw/main/data/cfb_tab_1869_2022.rds" %>% 
  url %>% 
  readRDS

Most important for our purposes – it includes a suite of variables we can use to predict outcomes–the S&P ratings, the teams’ rankings, and betting spreads and moneylines.

1. Tennessee is mid

Consider dear Ohio State and the ummmm, just meh University of Tennessee. Generate a variable which groups games by decade. For both schools, report their biggest victories and worst losses (by margin), by decade.

2. How well does S&P predict outcomes?

Take the variables home_rate and away_rate. For every year since 2013, report which is more correlated with results: the difference in these ratings, or the bookies’ point spread (indicated by home_spread).

3. How well does the AP Top 25 predict outcomes?

Take the variables away_rank_aptop25 and home_rank_aptop25. For games where both teams are ranked–how well does the difference in rankings predict results? Disregard the actual margin–instead, simply report how often the more highly ranked team wins, by different magnitudes in the ranking differences.