Week 5B: Chess ELO Calculations

Author

Pascal Hermann Kouogang Tafo

INTRODUCTION

This assignment goal is to compare the chess player’s actual score with their expected score calculated using the ELO rating system’s probability formula. This analysis will then facilitate to classify players who most over performed and those who most under performed of course relative to their expected score.

APPROACH

To reach that goal:

1) Read the csv file that i obtained from project 1 as a Data frame in R Use the ELO rating system formula as followed: Expected_Score = 1/(1+10ˆ((Rb-Ra)/400)) where Rb represents the Average Opponents rating and Ra the player rating to calculate a “baseline” score for a 7-round chess tournament.

2) Use the mutate() function from Dplyr R-package to help us accomplish 2 tasks:

  • Calculate the players’ Expected score using the ELO formula and existing columns and Create a new column in our original data frame.
  • Calculate and Create a column that represents the difference from their actual score using a basic arithmetic calculation (Difference = actual score(Total Points) - Expected Score) .

3) Finally, I will either use the slice_max() and slice_min() functions or Sort the column of the score difference by ascending and descending to rank the five most over performed and most under performed players.

LOad Library

library(dplyr)
Warning: package 'dplyr' was built under R version 4.5.2

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Load the Data set

Here i will load the final result data table that i obtained from the Chess tournament analysis in Project1 from GitHub repo . The dataset contains the Average player’s opponent rating and the Pre tournament player rating that will be useful to calculate each player’s expected score.

url<- "https://raw.githubusercontent.com/Pascaltafo2025/Week-5B--DATA-607/refs/heads/main/Final_tournament_PlayersInfo.csv"

Chess_Players_ratings <- read.csv(url)
head(Chess_Players_ratings,10)
            PlayerName State TotalPoints PreRating AvgOpponentRating
1             GARY HUA    ON         6.0      1794              1605
2      DAKSHESH DARURI    MI         6.0      1553              1469
3         ADITYA BAJAJ    MI         6.0      1384              1564
4  PATRICK H SCHILLING    MI         5.5      1716              1574
5           HANSHI ZUO    MI         5.5      1655              1501
6          HANSEN SONG    OH         5.0      1686              1519
7    GARY DEE SWATHELL    MI         5.0      1649              1372
8     EZEKIEL HOUGHTON    MI         5.0      1641              1468
9          STEFANO LEE    ON         5.0      1411              1523
10           ANVIT RAO    MI         5.0      1365              1554

Compute each player’s expected score and the difference from their actual score.

To perform those calculations, we will Use the ELO rating system formula as followed: Expected_Score = 1/(1+10ˆ((Rb-Ra)/400)) where Rb represents the Average Opponents rating(AvgOpponentRating) and Ra the pre-tournament player rating (PreRating). Once we find the expected score for each player, we will then compute difference from their actual score using a basic arithmetic calculation (Difference = actual score(Total Points) - Expected Score). NB: We should keep in mind that tournament was madeof 7 rounds.

# Expected score over 7 games and round the result to 2 decimal places

Total_rounds<- 7

Expected_Score_Total = Total_rounds * (1 / (1 + 10^((Chess_Players_ratings$AvgOpponentRating - Chess_Players_ratings$PreRating) / 400)))

Expected_Score <- round(Expected_Score_Total,2)

head(Expected_Score,10)
 [1] 5.24 4.33 1.83 4.86 4.96 5.06 5.82 5.11 2.41 1.76
# Difference between player's actual score and expected score 

Difference <- Chess_Players_ratings$TotalPoints - Expected_Score

Difference <- round(Difference,2) #Round to 2 decimal places.

head(Difference,10)
 [1]  0.76  1.67  4.17  0.64  0.54 -0.06 -0.82 -0.11  2.59  3.24
# Add the expected score and difference columns to our initial players result dataframe. 

Chess_Players_ratings_Update <- Chess_Players_ratings %>%
  mutate(Expected_Score ,Difference 
  )

head(Chess_Players_ratings_Update,10)
            PlayerName State TotalPoints PreRating AvgOpponentRating
1             GARY HUA    ON         6.0      1794              1605
2      DAKSHESH DARURI    MI         6.0      1553              1469
3         ADITYA BAJAJ    MI         6.0      1384              1564
4  PATRICK H SCHILLING    MI         5.5      1716              1574
5           HANSHI ZUO    MI         5.5      1655              1501
6          HANSEN SONG    OH         5.0      1686              1519
7    GARY DEE SWATHELL    MI         5.0      1649              1372
8     EZEKIEL HOUGHTON    MI         5.0      1641              1468
9          STEFANO LEE    ON         5.0      1411              1523
10           ANVIT RAO    MI         5.0      1365              1554
   Expected_Score Difference
1            5.24       0.76
2            4.33       1.67
3            1.83       4.17
4            4.86       0.64
5            4.96       0.54
6            5.06      -0.06
7            5.82      -0.82
8            5.11      -0.11
9            2.41       2.59
10           1.76       3.24

List of the five players who most over performed relative to their expected score

First method

Here i will rank the players’ score difference in a descending order then extract the top 5 as my result.

# Sort players' Difference score in a descending order

Overperformers <- Chess_Players_ratings_Update [order(-Chess_Players_ratings_Update$Difference), ]

Top5_Overperformers <- head(Overperformers, 5)

# List of Final results

Top5_Overperformers[, -2]
                 PlayerName TotalPoints PreRating AvgOpponentRating
3              ADITYA BAJAJ         6.0      1384              1564
10                ANVIT RAO         5.0      1365              1554
15   ZACHARY JAMES HOUGHTON         4.5      1220              1484
46 JACOB ALEXANDER LAVALLEY         3.0       377              1358
37     AMIYATOSH PWNANANDAM         3.5       980              1385
   Expected_Score Difference
3            1.83       4.17
10           1.76       3.24
15           1.26       3.24
46           0.02       2.98
37           0.62       2.88

Second method

We will be extracting the five players who most over performed using the “slice_max()” function.

# Extract Top five Overperformers

Top5_Overperformers <- Chess_Players_ratings_Update %>%
  slice_max(Difference, n = 5)

Top5_Overperformers[, -2]
                PlayerName TotalPoints PreRating AvgOpponentRating
1             ADITYA BAJAJ         6.0      1384              1564
2                ANVIT RAO         5.0      1365              1554
3   ZACHARY JAMES HOUGHTON         4.5      1220              1484
4 JACOB ALEXANDER LAVALLEY         3.0       377              1358
5     AMIYATOSH PWNANANDAM         3.5       980              1385
  Expected_Score Difference
1           1.83       4.17
2           1.76       3.24
3           1.26       3.24
4           0.02       2.98
5           0.62       2.88

Interpretation

These players who most over performed significantly out punched their weight. For instance, ADITYA BAJAJ in particular Pre tournament rating was nearly 200 points below his average opponent rating. According to the logistic distribution implemented by Arpad Elo to make the ELO Rating system, if the difference in the ratings between two players is 200 points, then the expected score for the stronger player is 76%. However, ADITYA BAJAJ still scored 6/7, which borders on a huge statistical outlier.

Source: https://en.wikipedia.org/wiki/Elo_rating_system

List of the five players who most under performed relative to their expected score

First method

Here i will rank the players’ score difference in an ascending order then extract the top 5 as my result. Also, i can just use the descending order ranking and get the last five players using “tail”

# Extract top 5 of the players' score difference in an ascending order

underperformers <- Chess_Players_ratings_Update[order(Chess_Players_ratings_Update$Difference), ]
Top5_underperformers <- head(underperformers, 5)

Top5_underperformers[, -2]
           PlayerName TotalPoints PreRating AvgOpponentRating Expected_Score
62      ASHWIN BALAJI         1.0      1530              1186           6.15
25   LOREN SCHWIEBERT         3.5      1745              1363           6.30
30 GEORGE AVERY JONES         3.5      1522              1144           6.29
27     GAURAV GIDWANI         3.5      1552              1222           6.09
29   CHIEDOZIE OKORIE         3.5      1602              1314           5.88
   Difference
62      -5.15
25      -2.80
30      -2.79
27      -2.59
29      -2.38
# Extract the last 5 players' from the score difference in a descending order from the previous question

Top5_underperformers <- tail(Overperformers, 5)

Top5_underperformers[, -2]
           PlayerName TotalPoints PreRating AvgOpponentRating Expected_Score
35   JOSHUA DAVID LEE         3.5      1438              1150           5.88
27     GAURAV GIDWANI         3.5      1552              1222           6.09
30 GEORGE AVERY JONES         3.5      1522              1144           6.29
25   LOREN SCHWIEBERT         3.5      1745              1363           6.30
62      ASHWIN BALAJI         1.0      1530              1186           6.15
   Difference
35      -2.38
27      -2.59
30      -2.79
25      -2.80
62      -5.15

Second Method

We will be extracting the five players who most overperformed using the “slice_min()” function.

# Extract Top 5 Underperformers

Top5_underperformers <- Chess_Players_ratings_Update %>%
  slice_min(Difference, n = 5)

Top5_underperformers[, -2]
          PlayerName TotalPoints PreRating AvgOpponentRating Expected_Score
1      ASHWIN BALAJI         1.0      1530              1186           6.15
2   LOREN SCHWIEBERT         3.5      1745              1363           6.30
3 GEORGE AVERY JONES         3.5      1522              1144           6.29
4     GAURAV GIDWANI         3.5      1552              1222           6.09
5   CHIEDOZIE OKORIE         3.5      1602              1314           5.88
6   JOSHUA DAVID LEE         3.5      1438              1150           5.88
  Difference
1      -5.15
2      -2.80
3      -2.79
4      -2.59
5      -2.38
6      -2.38

Interpretation

The result here present a dramatic outlier called Ashwin Balaji . Indeed, his Pre tournament rating was 1530 whereas the average opponent rating he faced was 1186 which rating difference was 344. According to the “algorithm 400” that gives the relationship between the rating difference Vs Probability of success, Ashwin Balaji had closed to 90% chance of success against his opponent yet scored just 1/7.

Source: https://en.wikipedia.org/wiki/Performance_rating_(chess)

CONCLUSION:

These results showed that players coming in with modest Elo ratings consistently performed well above their weight racking up points against stronger opponents far more often than probability would predict. Meanwhile, many of the top-rated players who were expected to cruise through the lower-rated field failed unexpectedly, finishing with scores well below what their ratings suggested they were capable of.