The Elo rating system is a statistical method for rating and comparing the relative skill level of players in games such as chess. The difference in Elo ratings between 2 players can be used to predict the outcome of a game (ie, win or loss). The rating difference also determines the number of points each player gains from a win or loses from a loss.
The primary aim of this extra credit assignment is to identify chess players who over-performed or under-performed the most in a tournament, based on the difference between their expected and actual scores. The latter is the total points earned from games won (1 point), drawn (0.5 point), or lost (0 points).
I calculated the expected score using the official US Chess Federation (USCF) formula.1 Because this formula is used to calculate player ratings, I also attempted to reproduce the post-ratings for each player from their pre-rating and game outcomes shown in the tournament cross table.
I read the tournament cross table from my GitHub repository.
lines <- read_lines("https://raw.githubusercontent.com/alexandersimon1/Data607/main/Project1/tournamentinfo.txt", skip = 4)
Because this assignment is an extension of Project 1, I skip the detailed explanation of how I extracted data from the cross table text file and prepared it for analysis. If needed, expand the code block in the markdown file and/or see my Project 1 assignment.
To facilitate subsequent calculations, I created a named vector with player (pair) number and pre-rating as key-value pairs. This vector is used to look up a player’s pre-rating given the player number.
get_pre_rating <- data_wide$Player_pre_rating
names(get_pre_rating) <- data_wide$Pair
The actual score for each player was provided in the tournament cross table.
I used the formula for the expected score that was provided in the “Elo Rating System for Chess and Beyond” YouTube video referenced on Blackboard. This is the same formula that the US Chess Federation uses to calculate the “winning expectancy” (\(W_e\)) between a player with rating \(R\) and his/her \(i\)-th opponent with rating \(R_i\).
\[ W_e(R, R_i) = \frac{1}{1 + 10^{-(\frac{(R-R_i)}{400})}} \]
winning_expectancy <- function(rating1, rating2) {
expected_score <- 1 / ( 1 + 10^(-1 * ( (rating1 - rating2) / 400 ) ) )
return(expected_score)
}
data_wide <- data_wide %>%
rowwise %>%
mutate(
Total_expected_score =
round(
sum(across(
.cols = ends_with("_opponent_ID"),
.fns = ~ winning_expectancy(Player_pre_rating, unname(get_pre_rating[.x]))),
na.rm = TRUE),
digits = 2)
)
To identify the players who played better or worse than expected, I sorted the players by the difference between their actual and expected scores.
data_wide <- data_wide %>%
mutate(
Score_difference = round(Actual_score - Total_expected_score, digits = 2)) %>%
arrange(desc(Score_difference))
I then tidied up the columns.
data_wide <- data_wide %>%
select(Player_name, Player_pre_rating, Player_post_rating, Actual_score,
Total_expected_score, Score_difference) %>%
rename(Expected_score = Total_expected_score) %>%
rename(Pre_rating = Player_pre_rating) %>%
rename(Post_rating = Player_post_rating)
The 5 players who over-performed the most relative to their expected score are:
head(data_wide, 5)
## # A tibble: 5 × 6
## # Rowwise:
## Player_name Pre_rating Post_rating Actual_score Expected_score
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 " ADITYA BAJAJ … 1384 1640 6 1.95
## 2 " ZACHARY JAMES HOUGHTON … 1220 1416 4.5 1.37
## 3 " ANVIT RAO … 1365 1544 5 1.94
## 4 " JACOB ALEXANDER LAVALLEY… 377 1076 3 0.04
## 5 " AMIYATOSH PWNANANDAM … 980 1077 3.5 0.77
## # ℹ 1 more variable: Score_difference <dbl>
The 5 players who under-performed the most relative to their expected score are:
tail(data_wide, 5)
## # A tibble: 5 × 6
## # Rowwise:
## Player_name Pre_rating Post_rating Actual_score Expected_score
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 " JOSHUA DAVID LEE … 1438 1392 3.5 4.96
## 2 " RISHI SHETTY … 1494 1444 3.5 5.09
## 3 " JARED GE … 1332 1256 3 5.01
## 4 " GEORGE AVERY JONES … 1522 1444 3.5 6.02
## 5 " LOREN SCHWIEBERT … 1745 1681 3.5 6.28
## # ℹ 1 more variable: Score_difference <dbl>
The US Chess Federation uses several different algorithms to calculate player ratings. Here, I only implement the standard rating formula. A player’s post-rating \(R_s\) is given by
\[ R_s = R_0 + K\sum_{i=1}^{m}(S_i-E_i) + B \]
where \(R_0\) is the player’s pre-rating, \(K\) is the development coefficient, \(m\) is the number of games that the player completed in the current event, \(S\) is the player’s total score, \(E\) is the player’s total expected score, and \(B\) is a bonus amount.
From the tournament cross table, we know there were 7 rounds, so \(m = 7\), and we also know \(R_0\) and \(\sum{S}\) for each player. In addition, \(\sum{E}\) was calculated in the previous section.
For the remaining variables,
\[ K = \frac{800}{N' + m} \]
where \(N'\) is the “effective number of games”, which is a measure of the number of games that a player has previously played. It is defined as the smaller of \(N\) = number of tournament games the player has competed in, or
\[ N^* = \frac{50}{\sqrt{0.662 + 0.00000739(2569 - R_0)^2}} \]
Since \(N\) is not known, I made an assumption that all players in the tournament had played many games in the past (ie, \(N > N^*\)) and used \(N^*\) in the formula for \(K\).
data_wide <- data_wide %>%
mutate(
N_star = round(50 / (sqrt(0.662 + (0.00000739*(2569 - Pre_rating)^2))),
digits = 3),
K = round(800 / (N_star + 7), digits = 3)
)
\[ B = max(0, K(S - E) - 14\sqrt{m'}) \]
calc_bonus <- function(K, S, E) {
bonus <- K*(S - E) - 14*sqrt(7)
if (bonus > 0) {
return(bonus)
}
return(0)
}
data_wide <- data_wide %>%
mutate(
B = round(calc_bonus(K, Actual_score, Expected_score), digits = 3)
)
As expected, the players who over-performed the most relative to their bonus amount were the same as those identified based on the difference between the actual and expected scores (see previous section). However, the order of the players’ names were different.
overperformers_by_bonus <- data_wide %>%
select(Player_name, Score_difference, B) %>%
arrange(desc(B)) %>%
rename(USCF_Bonus = B)
head(overperformers_by_bonus, 5)
## # A tibble: 5 × 3
## # Rowwise:
## Player_name Score_difference USCF_Bonus
## <chr> <dbl> <dbl>
## 1 " JACOB ALEXANDER LAVALLEY " 2.96 118.
## 2 " ADITYA BAJAJ " 4.05 110.
## 3 " ZACHARY JAMES HOUGHTON " 3.13 86.2
## 4 " AMIYATOSH PWNANANDAM " 2.73 81.8
## 5 " ANVIT RAO " 3.06 75.1
Next, I calculated each player’s rating change and post-rating.
data_wide <- data_wide %>%
mutate(
Rating_change = round(K * (Actual_score - Expected_score) + B, digits = 1),
Calculated_post_rating = floor(Pre_rating + Rating_change)
)
To compare the calculated post-rating and the actual post-rating in the cross table, I calculated the difference between the two ratings.
data_wide <- data_wide %>%
mutate(
Rating_difference = Calculated_post_rating - Post_rating
)
A histogram shows that the calculated post-rating was close to the actual post-rating for most, but not all, players.
ggplot(data_wide, aes(x = Rating_difference)) +
geom_histogram(binwidth = 25) +
xlab("Calculated post-rating - Actual post-rating") +
theme(axis.title.x = element_text(face = "bold")) +
ylab("Count") +
theme(axis.title.y = element_text(face = "bold"))
The differences may have been due to the assumption that I made about the number of games that players had played before the tournament, which was used to calculate the \(K\)-factor. Because \(K\) is a multiplicative factor, changing its value could result in large differences in a players’ rating change.
To examine this possibility, I plotted the rating difference vs \(K\) for all players. I also colored the data points according to the bonus amount, \(B\).
ggplot(data_wide, aes(x = Rating_difference, y = K, color = B)) +
geom_point() +
geom_segment(aes(x = -427, y = 49.5, xend = -427, yend = 51.5),
arrow = arrow(length = unit(0.25, "cm")), linewidth=0.5, color = "red") +
geom_curve(aes(x = -20, y = 27, xend = -30, yend = 34),
curvature = -0.5, linetype = 2, color = "purple") +
geom_curve(aes(x = 10, y = 27, xend = 18, yend = 34),
curvature = 0.5, linetype = 2, color = "purple") +
xlab("Calculated post-rating - Actual post-rating") +
theme(axis.title.x = element_text(face = "bold")) +
ylab(expression("more experienced" %<-% "K" %->% "less experienced")) +
theme(axis.title.y = element_text(face = "bold"))
The first observation from this plot is that the player with the most underestimated post-rating (red arrow) had the highest \(K\)-factor. Because \(K\) is higher for less experienced players, it makes sense that the assumption that the tournament players had played many games in the past was not correct for this player.
In addition, this player had a relatively large bonus. These observations suggest that this player was relatively inexperienced but got lucky during the tournament (ie, performed much better than expected). Among more experienced players who performed as expected, the difference between calculated and actual post-ratings is much smaller (data points in the purple oval).
These results suggest that the post-ratings in the tournament cross table were derived from the USCF standard rating formula, but there is insufficient information about the players’ experience to reproduce all post-ratings from the tournament cross table alone.
I saved the results of my analyses to a CSV file. The results are sorted by the difference between players’ actual and expected scores since this was the original aim of this assignment.
final_results <- data_wide %>%
select(Player_name, Pre_rating, Actual_score, Expected_score, Score_difference,
Calculated_post_rating, Post_rating, Rating_difference) %>%
rename(Actual_post_rating = Post_rating) %>%
arrange(desc(Score_difference))
write.csv(final_results, file='tournamentinfo_analysis.csv',
row.names = FALSE, quote = FALSE)
I successfully implemented the USCF winning expectancy formula to calculate the expected score for the chess tournament players. Comparing players’ expected score with their actual score showed which players played better or worse than expected.
I also attempted to reproduce the post-ratings shown in the tournament cross table using the USCF standard rating formula. Most of the calculated ratings were close to the actual ratings, but there were some discrepancies that were likely due to incomplete information about the amount of experience players had before the tournament.
Glickman ME and Doan T. The US Chess Rating system. September 2, 2020. Available at: https://new.uschess.org/sites/default/files/media/documents/the-us-chess-rating-system-revised-september-2020.pdf↩︎