First, let’s recreate the functions we created in class to estimate expected wins and to adjust ratings based on scores.
Ewins <- function(rating, opp.rating)
{ 1/(1 + 10^((opp.rating-rating)/400))}
Ewins(1600, 1400)
## [1] 0.7597469
RatingAdjust <- function(rating, opp.rating, wins, K=32){
rating + K*(wins - Ewins(rating, opp.rating))
}
RatingAdjust(1500, 1500, wins=1, K=16)
## [1] 1508
Now, let’s read in the data:
reg <- read.csv('/home/rstudioshared/shared_files/data/RegularSeasonCompactResults.csv')
View(reg)
This data set isn’t in quite the right format to compute ELO. Instead we’ll create a data set that has each game listed twice - once from the perspective of each team - and add a win column:
library(dplyr)
reg1 <- reg %>% rename(team = Wteam, opp.team=Lteam) %>% mutate(win=1)
reg2 <- reg %>% rename(team= Lteam, opp.team=Wteam) %>% mutate(win=0)
reg <- rbind(reg1, reg2)
View(reg)
Let’s limit our data to the 2016 season and add columns for the starting and ending ELO of both teams.
reg2016 <- reg %>% filter(Season==2016)
reg2016$elo.start <- NA
reg2016$elo.end <- NA
reg2016$opp.elo.start <- NA
reg2016$opp.elo.end <- NA
reg2016 <- reg2016 %>% arrange(Daynum)
Here’s the complicated part. This code runs through each line in the data set in order (note that we needed to order the data from earliest games to latest games in order for this to work) and calculates ELO’s for each row in turn.
First, it looks to see whether the team or their opponent have played any previous games. If they have, it find their ending ELO ratings from their most recent past game and using it as that teams starting ELO value. If they have not played any previous games, it gives them a starting ELO rating of 1500. Next it uses the RatingAdjust function to calculate the new ELO ratings for each team that result from this game. Notice that we use a K value of 25 in this code. This may not be the right value and our team ratings may suffer as a result.
The final line in the for loop gives us an update on the progress of our ELO calculation.
for(i in 1:nrow(reg2016)){
start.elo <- reg2016 %>% filter(team==reg2016[i, "team"], Daynum < reg2016[i, "Daynum"]) %>%
top_n(1, Daynum) %>% select(elo.end) %>% as.numeric()
opp.start.elo <- reg2016 %>% filter(opp.team==reg2016[i, "opp.team"], Daynum<reg2016[i, "Daynum"]) %>%
top_n(1, Daynum) %>% select(elo.end) %>% as.numeric()
if(is.na(start.elo)){start.elo <- 1500}
if(is.na(opp.start.elo)){opp.start.elo <- 1500}
reg2016[i, "elo.start"] <- start.elo
reg2016[i, "opp.elo.start"] <- opp.start.elo
reg2016[i, "elo.end"] <- RatingAdjust(start.elo, opp.start.elo, wins=reg2016[i, "win"], K=25)
reg2016[i, "opp.elo.end"] <- RatingAdjust(opp.start.elo, start.elo, wins=1-reg2016[i, "win"], K=25)
if(i %% 1000 == 0){print(paste("Completed", i, "rows of", nrow(reg2016), "total rows"))}
}
Now, let’s get the final ratings for each team, match them up with team names, and look at the top 10 and bottom 10 teams.
teams <- read.csv('/home/rstudioshared/shared_files/data/teams.csv')
final.elos <- reg2016 %>% group_by(team) %>% top_n(1, Daynum) %>% select(elo.end)
final.elos <- left_join(final.elos, teams, by=c("team"="Team_Id"))
final.elos %>% top_n(10, elo.end) %>% arrange(desc(elo.end))
final.elos %>% top_n(10, desc(elo.end)) %>% arrange(elo.end)
How does the top 10 by ELO compare to the top teams, entering last year’s tournament according to Vegas?
Using ELO, calculate the chance that Kansas would beat Oregon in a game.
Using ELO, calculate the chance that Kansas would beat Chicago St. in a game. Does this seem to high or too low?
Can you think of any way to improve our team ratings? Put another way, what do we know about these games or about these teams that ELO doesn’t know? And how could we tell it what we know?
Try either writing code to impliment one of your ideas or rerun this code with a different value of K.