First, I want to remind you what the pnorm function does – it converts z-scores it’s probabilities. Here’s a graph of it.
x <- seq(-4, 4, 0.5)
plot(x, pnorm(x))
Let’s use that to create something called Margin Wins. What we’re aiming for it metric that will count a 1 point win as being something less than a 20 point victory.
MarginWins <- function(margin){pnorm(margin/10)}
Now, let’s check it out:
MarginWins(0)
MarginWins(5)
MarginWins(-5)
MarginWins(20)
MarginWins(10000000)
Next, let’s make a new RatingsAdjust function that takes into account MarginWins in addition to actual wins:
RatingAdjust <- function(rating, opp.rating, wins, K=32, loc, HFA=100, margin=0, Kmargin=0){
rating + K*(wins - Ewins(rating, opp.rating, loc, HFA)) +
Kmargin*(MarginWins(margin) - Ewins(rating, opp.rating, loc, HFA) )
}
Notice that by adjusting K and Kmargin we can alter how much weight we given to actual wins and how much weight we give to margin of victory.
It’s worth trying this out as well. Let’s see how a 1 point victory and a 20 point victory compare if we set both K and Kmargin equal to 20:
RatingAdjust(1500, 1500, wins=1, K=20, loc=0,margin=1, Kmargin=20 )
## [1] 1510.797
RatingAdjust(1500, 1500, wins=1, K=20, loc=0,margin=20, Kmargin=20 )
## [1] 1519.545
Next, we need to update our UpdateELO function since I forgot to include Kmargin as one of the parameters:
updateELO <- function(df, K = 25, Kmargin=0, start.elos=NA, HFA=100){
d <- df
print(nrow(d))
d$elo.start <- NA
d$elo.end <- NA
d$opp.elo.start <- NA
d$opp.elo.end <- NA
d <- d %>% arrange(Daynum)
if (!is.data.frame(start.elos)) {
d[!duplicated(d$team),"elo.start"] <- 1500
d[!duplicated(d$team),"opp.elo.start"] <- 1500
} else {
n<-start.elos[match(d[is.na(d$elo.end),][!duplicated(d[is.na(d$elo.end),]$team),"team"],start.elos$team),]$elo.end
n[is.na(n)] <- 1500
d[is.na(d$elo.end),][!duplicated(d[is.na(d$elo.end),]$team),]$elo.start <- n
n<-start.elos[match(d[is.na(d$elo.end),][!duplicated(d[is.na(d$elo.end),]$team),"opp.team"],start.elos$team),]$elo.end
n[is.na(n)] <- 1500
d[is.na(d$opp.elo.end),][!duplicated(d[is.na(d$opp.elo.end),]$team),]$opp.elo.start<-n
}
d <- d %>% mutate(elo.end=RatingAdjust(elo.start, opp.elo.start, wins=win, K=K, margin=score-opp.score, loc=loc, Kmargin=Kmargin))
d <- d %>% mutate(opp.elo.end=RatingAdjust(opp.elo.start, elo.start, wins=1-win, K=K, margin=score-opp.score,loc=loc, Kmargin=Kmargin))
temp.elos <- d %>% filter(!is.na(elo.end)) %>% group_by(team) %>% top_n(1, Daynum) %>% select(elo.end)
while(sum(is.na(d$elo.end))>0){
print(sum(is.na(d$elo.end)))
n<-temp.elos[match(d[is.na(d$elo.end),][!duplicated(d[is.na(d$elo.end),]$team),"team"],temp.elos$team),]$elo.end
d[is.na(d$elo.end),][!duplicated(d[is.na(d$elo.end),]$team),]$elo.start <- n[!is.na(n)]
n<-temp.elos[match(d[is.na(d$elo.end),][!duplicated(d[is.na(d$elo.end),]$team),"opp.team"],temp.elos$team),]$elo.end
d[is.na(d$opp.elo.end),][!duplicated(d[is.na(d$opp.elo.end),]$team),]$opp.elo.start<-n[!is.na(n)]
d <- d %>% mutate(elo.end=RatingAdjust(elo.start, opp.elo.start, wins=win, K=K, margin=score-opp.score, loc=loc, HFA=HFA, Kmargin=Kmargin))
d <- d %>% mutate(opp.elo.end=RatingAdjust(opp.elo.start, elo.start, wins=1-win, K=K, margin=score-opp.score,loc=loc, HFA=HFA, Kmargin=Kmargin))
temp.elos <- d %>% filter(!is.na(elo.end)) %>% group_by(team) %>% top_n(1, Daynum) %>% select(elo.end)
}
final.elos <- d %>% group_by(team) %>% top_n(1, Daynum) %>% select(elo.end)
return(final.elos %>% arrange(desc(elo.end)))
}
Let’s also recreate EvaluateELO but with the correct spelling:
EvaluateELO <- function(elo_df, tourney_df, season){
t_df <- tourney_df %>% filter(Season==season) %>% mutate(win=1, loc=0)
t_df <- left_join(t_df, elo_df, by=c("Wteam"="team"))
t_df <- left_join(t_df, elo_df, by=c("Lteam"="team"))
t_df <- t_df %>% mutate(prediction = Ewins(elo.end.x, elo.end.y))
print(t_df %>% summarize(rmse = RMSE(prediction, win), logloss = LogLoss(prediction,win)))
}
Next, check your Environment window (upper right) to see that you have the following functions:
If they don’t exist, you’ll need to go back to the last lab and run the code to recreate them.
Let’s now load dplyr and create a function that will help us check out predictions faster:
library(dplyr)
CheckELO <- function(K=20, Kmargin=20, R=0.1, start.season=2012, end.season=2016){
results <- data.frame(season=numeric(), RMSE=numeric(), LogLoss=numeric())
f <- updateELO(reg %>% filter(Season==start.season), K=K, Kmargin=Kmargin)
res <- EvaluateELO(f, tourney, season=start.season)
results[1,] <- as.numeric(c(start.season, res))
if(length(start.season:end.season)>1){
for (i in seq_along((start.season:end.season)[-1])){
f <- RegressElos(f, R)
f <- updateELO(reg %>% filter(Season==start.season+i), K=K, Kmargin=Kmargin, start.elos=f)
res <- EvaluateELO(f, tourney, season=start.season+i)
results[1+i,] <- as.numeric(c(start.season+i, res))
}
}
return(results)
}
We can use this to create predictions using the 2012 to 2015 season as we did before. The ELO’s in this function build on themselves, meaning that the ELO ratings for one season (after being regressed towards 1500) act as the starting values for the next season.
eloresults <- CheckELO(K=20, Kmargin=25, R=0, start.season=2012, end.season=2016)
eloresults
We could also go back considerably further, let’s say to the year 1985 (the first year in our data set):
eloresults <- CheckELO(K=20, Kmargin=25, R=0, start.season=1985, end.season=2016)
Since, ELO values depend on prior years they may need a couple of years to build on before showing their quality so we can average them starting in the year 1988.
eloresults %>% filter(season>=1988) %>% summarize(rmse=mean(RMSE), logloss=mean(LogLoss))
We could also graph our results:
library(ggplot2)
eloresults %>% filter(season>=1988) %>% ggplot(aes(season, LogLoss)) + geom_point()
When we’re reading to try a new combination of parameters, we can run this procedure again:
eloresults <- CheckELO(K=40, Kmargin=10, R=0.1, start.season=1985, end.season=2016)
eloresults %>% filter(season>=1988) %>% summarize(rmse=mean(RMSE), logloss=mean(LogLoss))
Find the best combination of parameters.
When comparing the accuracy of different sets of parameters, make sure that you are comparing them using the same set of years (some years tournaments have been easier to predict than others).
While it makes sense to test your models on multiple years (and more data is often better) it could well be that college basketball has changed considerably since 1985 and the model that worked best then isn’t the best model now. You will also save time by using some but not all of the years in the data set.
Be scientific, try change one (or at most two) parameters at a type and recording the results. You may even want to create a Google Spreadsheet where you track different combinations of parameters and the average RMSE and Log Loss. Again, remember to use a consistent set of years.