Overview

This project looks at whether the chess players in our tournament actually performed the way their ratings suggested they should. Using the Elo formula, I calculated what score each player was expected to get based on who they played, then compared that to what they actually scored. The players with the biggest positive gap overperformed — they beat the odds. The ones with the biggest negative gap underperformed — their rating said they should have done better.


The Elo Formula

The expected score for Player A against a single opponent Player B is:

\[E = \frac{1}{1 + 10^{(R_B - R_A) / 400}}\]

So if two players have the same rating, each expects 0.5 — a coin flip. A 400 point gap means the stronger player wins about 90% of the time. I add up the expected scores across all 7 rounds to get each player’s total expected score for the tournament. Byes and unplayed rounds get skipped since there’s no opponent to calculate against.

Formula source: https://www.chess.com/terms/elo-rating-chess


Step 1: Load Libraries and Parse Tournament Data

library(stringr)
library(dplyr)
raw_lines <- readLines("/Users/mark/Desktop/Coding/607/Project 1/tournamentinfo.txt", warn = FALSE)

# Re-read and parse the tournament file (same as Project 1)
raw_lines <- readLines("/Users/mark/Desktop/Coding/607/Project 1/tournamentinfo.txt", warn = FALSE)
player_lines <- raw_lines[!grepl("^-", raw_lines)]
player_lines <- player_lines[3:length(player_lines)]

row1 <- player_lines[seq(1, length(player_lines), by = 2)]
row2 <- player_lines[seq(2, length(player_lines), by = 2)]

# Extract fields
player_name <- str_trim(str_extract(row1, "(?<=\\|)[^|]+(?=\\|)"))
total_pts   <- as.numeric(str_trim(str_extract(row1, "(?<=\\|)\\s*\\d+\\.\\d+\\s*(?=\\|)")))
pre_rating  <- as.numeric(str_extract(str_extract(row2, "R:\\s*\\d+"), "\\d+"))
pair_nums   <- as.numeric(str_trim(str_extract(row1, "^\\s*\\d+")))

# Build lookup and opponents list
rating_lookup <- setNames(pre_rating, pair_nums)

get_opponents <- function(line) {
  as.numeric(na.omit(str_trim(str_extract_all(line, "(?<=[WLDX])\\s*(\\d+)")[[1]])))
}
opponents <- lapply(row1, get_opponents)

Step 2: Calculate Elo Expected Score

# Elo expected score formula
elo_expected <- function(rating_a, rating_b) {
  1 / (1 + 10^((rating_b - rating_a) / 400))
}

# Calculate expected score for each player across all their games
expected_score <- sapply(seq_along(opponents), function(i) {
  opp_ids <- opponents[[i]]
  opp_ratings <- rating_lookup[as.character(opp_ids)]
  sum(sapply(opp_ratings, function(r) elo_expected(pre_rating[i], r)))
})

expected_score <- round(expected_score, 2)

head(data.frame(player_name, total_pts, expected_score))
##           player_name total_pts expected_score
## 1            GARY HUA       6.0           5.16
## 2     DAKSHESH DARURI       6.0           3.78
## 3        ADITYA BAJAJ       6.0           1.95
## 4 PATRICK H SCHILLING       5.5           4.74
## 5          HANSHI ZUO       5.5           4.38
## 6         HANSEN SONG       5.0           4.94

Step 3: Compare Actual vs Expected and Find Top 5

# Build the comparison data frame
elo_df <- data.frame(
  Player_Name    = player_name,
  Actual_Score   = total_pts,
  Expected_Score = expected_score,
  Difference     = round(total_pts - expected_score, 2)
)

# Top 5 overperformers
overperformers <- elo_df %>% arrange(desc(Difference)) %>% head(5)

# Top 5 underperformers
underperformers <- elo_df %>% arrange(Difference) %>% head(5)

cat("TOP 5 OVERPERFORMERS\n")
## TOP 5 OVERPERFORMERS
print(overperformers)
##                Player_Name Actual_Score Expected_Score Difference
## 1             ADITYA BAJAJ          6.0           1.95       4.05
## 2   ZACHARY JAMES HOUGHTON          4.5           1.37       3.13
## 3                ANVIT RAO          5.0           1.94       3.06
## 4 JACOB ALEXANDER LAVALLEY          3.0           0.04       2.96
## 5     AMIYATOSH PWNANANDAM          3.5           0.77       2.73
cat("\nTOP 5 UNDERPERFORMERS\n")
## 
## TOP 5 UNDERPERFORMERS
print(underperformers)
##          Player_Name Actual_Score Expected_Score Difference
## 1   LOREN SCHWIEBERT          3.5           6.28      -2.78
## 2 GEORGE AVERY JONES          3.5           6.02      -2.52
## 3           JARED GE          3.0           5.01      -2.01
## 4       RISHI SHETTY          3.5           5.09      -1.59
## 5   JOSHUA DAVID LEE          3.5           4.96      -1.46

Discussion

Aditya Bajaj was the biggest surprise — rated 1384 going in but scoring a perfect 6 points against much stronger players. The formula only expected him to score about 2 points, so he overperformed by 4. Jacob Alexander Lavalley is even wilder — the math basically gave him a 0% chance but he still managed 3 points.

On the flip side, Loren Schwiebert had the highest rating of any underperformer at 1745 but only scored 3.5 — almost 3 points below what his rating predicted. That’s a rough tournament.

The takeaway is that Elo tells you probabilities, not results. Seven rounds isn’t enough to iron out the variance, which is exactly why upsets happen.