Assignment 5B: Elo Calculations

Author

Emily El Mouaquite

Approach

To evaluate player performance relative to expectation for the chess players from the Project 1 data, I will use the Elo expected score formula based on how it is implemented here: https://mattmazzola.medium.com/implementing-the-elo-rating-system-a085f178e065. I will loop through each player to calculate their expected score per game. Then, I will be able to calculate their total expected score, as well as calculate the difference between their expected and actual scores. To conclude, I will be able to sort the players by performance difference to see who the most overperformed and underperformed players are.

Code Base

# load dplyr library 
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

The data needed for this assignment can be found in the .CSV file generated from last week’s Project 1.

# read CSV
df <- read.csv("tournament_info.csv")

Elo Formula:

The loop that I mentioned in my approach is not necessary as I already have the average opponent rating of each player calculated, and included within the dataset. Thus, the Elo formula can be applied directly through the creation of a new expected score column.

#Elo formula (results in expected score)
df$expected_score <- 1 / (1 +10^((df$avg_opps_rating - df$pre_rating)/400)) * 7

head(df)
                 name state total_points pre_rating avg_opps_rating
1            GARY HUA    ON          6.0       1794        1605.286
2     DAKSHESH DARURI    MI          6.0       1553        1469.286
3        ADITYA BAJAJ    MI          6.0       1384        1563.571
4 PATRICK H SCHILLING    MI          5.5       1716        1573.571
5          HANSHI ZUO    MI          5.5       1655        1500.857
6         HANSEN SONG    OH          5.0       1686        1518.714
  expected_score
1       5.233826
2       4.327372
3       1.836577
4       4.859483
5       4.958354
6       5.066018

A new difference column is also needed in order to determine who the top 5 over and under performers were.

#difference 
df$difference <- df$total_points - df$expected_score

head(df)
                 name state total_points pre_rating avg_opps_rating
1            GARY HUA    ON          6.0       1794        1605.286
2     DAKSHESH DARURI    MI          6.0       1553        1469.286
3        ADITYA BAJAJ    MI          6.0       1384        1563.571
4 PATRICK H SCHILLING    MI          5.5       1716        1573.571
5          HANSHI ZUO    MI          5.5       1655        1500.857
6         HANSEN SONG    OH          5.0       1686        1518.714
  expected_score  difference
1       5.233826  0.76617424
2       4.327372  1.67262801
3       1.836577  4.16342302
4       4.859483  0.64051685
5       4.958354  0.54164582
6       5.066018 -0.06601795

Overperformers:

df %>%
  arrange(desc(difference)) %>%
  slice_head(n = 5)
                      name state total_points pre_rating avg_opps_rating
1             ADITYA BAJAJ    MI          6.0       1384        1563.571
2   ZACHARY JAMES HOUGHTON    MI          4.5       1220        1483.857
3                ANVIT RAO    MI          5.0       1365        1554.143
4 JACOB ALEXANDER LAVALLEY    MI          3.0        377        1357.714
5     AMIYATOSH PWNANANDAM    MI          3.5        980        1384.800
  expected_score difference
1     1.83657698   4.163423
2     1.25738160   3.242618
3     1.76291836   3.237082
4     0.02464793   2.975352
5     0.62055841   2.879442

Underperformers:

df %>%
  arrange(difference) %>%
  slice_head(n = 5)
                name state total_points pre_rating avg_opps_rating
1      ASHWIN BALAJI    MI          1.0       1530        1186.000
2   LOREN SCHWIEBERT    MI          3.5       1745        1363.286
3 GEORGE AVERY JONES    ON          3.5       1522        1144.143
4     GAURAV GIDWANI    MI          3.5       1552        1221.667
5   CHIEDOZIE OKORIE    MI          3.5       1602        1313.500
  expected_score difference
1       6.150935  -5.150935
2       6.300063  -2.800063
3       6.285951  -2.785951
4       6.090469  -2.590469
5       5.882361  -2.382361

Conclusion:

The Elo formula is an interesting method of predicting score outcomes that I did not know existed prior to this assignment. I did not expect the differences for the top over and under performers to be so large. It makes me wonder how accurate this rating system is when used in practice, and if such large margins of difference between expected scores and earned scores are common in chess tournaments. To extend this work, one might redo the Elo formula with a different scale factor. For instance, I would predict that using a larger number would mean that expected scores have less extreme values because the differences in points earned between opponents is reduced, and using a smaller scaling factor would do the opposite. It might be interesting to see how the values in the difference column would be affected by this.