For this assignment, I am working with the chess tournament data from Project 1. The dataset includes each player’ rating, their opponents in each round, and their final total score. The goal of this assignment is to calculate each player’s expected score based on the rating difference between them and their opponents. Then I will compare the expected score to the actual score and determine which players overperformed and which players underperformed the most.
2. Understanding the Elo Formula
To calculate expected score, I will use the standard Elo expected score formula. This formula is widely used in chess and is explained in the video. The formula calculates the probability that a player is expected to score against an opponent based on rating difference.
Link: https://www.youtube.com/watch?v=AsYfbmp0To0
If ratings are equal, expected score is 0.5. If a player is rated higher, expected score is greater than 0.5. If rated lower, expected score is less than 0.5.
3. Data Preparation Plan
From the tournament file, I will extract: 1. Player name 2. Player pre-rating 3. Opponents in each round 4. Actual total score
Then for each match: 1. Identify opponent’s rating. 2. Calculate expected score for that round. 3. Repeat for all rounds. 4. Sum all expected scores to get total expected score.
This will give me something like: 1. Expected score = 4.3 2. Actual score = 4.0 3. Difference = Actual – Expected
4. Overperformance and Underperformance
After calculating expected and actual scores, I will: 1. Calculate the difference between actual and expected score. 2. Sort players by this difference. 3. List: - Top 5 players who most overperformed. - Top 5 players who most underperformed.
Overperformed means: Actual score is higher than expected score.
Underperformed means: Actual score is lower than expected score.
5. Final Analysis Plan
In the final section, I will:
Present a table with expected score, actual score, and difference.
Highlight the top 5 overperformers.
Highlight the top 5 underperformers.
Briefly explain what this means.
Code Base
1. Load Libraries
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.1 ✔ tibble 3.3.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Warning in readLines(file_path): incomplete final line found on
'https://raw.githubusercontent.com/sinemkilicdere/Data607/refs/heads/main/Project1/tournamentinfo.txt'
# A tibble: 5 × 6
Pair Name Total_Pts Player_Rating Expected_Score Difference
<int> <chr> <dbl> <dbl> <dbl> <dbl>
1 62 ASHWIN BALAJI 1 1530 6.15 -5.15
2 25 LOREN SCHWIEBERT 3.5 1745 6.28 -2.78
3 30 GEORGE AVERY JONES 3.5 1522 6.02 -2.52
4 29 CHIEDOZIE OKORIE 3.5 1602 5.56 -2.06
5 42 JARED GE 3 1332 5.01 -2.01
Analysis
Using the standard Elo expected score formula, I calculated each player’s expected score based on the rating difference between them and their opponents. I then compared this expected score to their actual total points from the tournament.
Players with a positive difference performed better than expected. This means they scored more points than the Elo system predicted. Players with a negative difference underperformed, meaning they scored fewer points than expected based on rating.
The top five overperformers are the players with the highest positive differences. These players either defeated stronger opponents or performed more consistently than predicted.
The top five underperformers are the players with the largest negative differences. These players may have lost to lower-rated opponents or did not perform at the level their rating suggested.
This analysis shows how Elo ratings can estimate performance expectations, and how actual tournament results can differ from statistical predictions.