This assignment processes tournament data from a text file containing player information including names, states, points, pre-ratings, and opponent information. The goal is to calculate each player’s expected score using the ELO formula and identify which players outperformed and underperformed relative to their expectations.
ELO formula from the given assignment will be used: E = 1/(1 + 10^((R_opp - R_player)/400))
Load an definednd Read Data
This part is largely carried over from project 1. From the tournament txt file, we first identify separator lines and remove them.
# Read the data filefile_path <-"tournamentinfo.txt"data <-readLines(file_path, warn =FALSE) # Remove separator lines (dash lines)separator_indices <-grep("^-+", data) data_lines <- data[-separator_indices]# Remove empty linesdata_lines <- data_lines[data_lines !=""]data_lines <-trimws(data_lines)# Remove header linesheader_indices <-grep("Pair|Player Name|USCF ID|Num", data_lines, ignore.case =TRUE)data_lines <- data_lines[-header_indices]# Display first few lines to verifyhead(data_lines, 10)
Same way from project 1, I create couple for loops to extract player info such as name, state, total points, and opponents’ info. When handling provisional ratings, we only take digit after “R:” and end when digit ends.
# Initialize empty data frameplayers <-data.frame()# Parse data in pairs (each player has 2 lines of information)for (i inseq(1, length(data_lines), by =2)) {if (i +1<=length(data_lines)) { line1 <- data_lines[i] # Player info line line2 <- data_lines[i +1] # State/Rating line# Split by pipe delimiter parts1 <-strsplit(line1, "\\|")[[1]] parts2 <-strsplit(line2, "\\|")[[1]]# Extract player name player_name <-trimws(parts1[2])# Extract total points total_points <-trimws(parts1[3])# Extract opponent numbers opponents <-c()for (j in4:length(parts1)) { result <-trimws(parts1[j])if (result ==""|| result ==" ") {next } opp_num <-str_extract(result, "\\d+")if (!is.na(opp_num)) { opponents <-c(opponents, as.numeric(opp_num)) } }# Extract state (first two characters) state_info <-trimws(parts2[1]) state <-substr(state_info, 1, 2)# Extract pre-rating (handle "P" for provisional ratings) rating_text <- parts2[2] pattern <-"R:\\s*(\\d+)" match <-regexpr(pattern, rating_text)if (match !=-1) { pre_rating <-as.numeric(regmatches(rating_text, regexec(pattern, rating_text))[[1]][2]) } else { pre_rating <-NA }# Create player data frame player_data <-data.frame(Name = player_name,State = state,Points =as.numeric(total_points),PreRating = pre_rating,stringsAsFactors =FALSE )# Add opponents list player_data$Opponents <-list(opponents)# Append to players data frame players <-rbind(players, player_data) }}# Display structurestr(players)
'data.frame': 64 obs. of 5 variables:
$ Name : chr "GARY HUA" "DAKSHESH DARURI" "ADITYA BAJAJ" "PATRICK H SCHILLING" ...
$ State : chr "ON" "MI" "MI" "MI" ...
$ Points : num 6 6 6 5.5 5.5 5 5 5 5 5 ...
$ PreRating: num 1794 1553 1384 1716 1655 ...
$ Opponents:List of 64
..$ : num 39 21 18 14 7 12 4
..$ : num 63 58 4 17 16 20 7
..$ : num 8 61 25 21 11 13 12
..$ : num 23 28 2 26 5 19 1
..$ : num 45 37 12 13 4 14 17
..$ : num 34 29 11 35 10 27 21
..$ : num 57 46 13 11 1 9 2
..$ : num 3 32 14 9 47 28 19
..$ : num 25 18 59 8 26 7 20
..$ : num 16 19 55 31 6 25 18
..$ : num 38 56 6 7 3 34 26
..$ : num 42 33 5 38 1 3
..$ : num 36 27 7 5 33 3 32
..$ : num 54 44 8 1 27 5 31
..$ : num 19 16 30 22 54 33 38
..$ : num 10 15 39 2 36
..$ : num 48 41 26 2 23 22 5
..$ : num 47 9 1 32 19 38 10
..$ : num 15 10 52 28 18 4 8
..$ : num 40 49 23 41 28 2 9
..$ : num 43 1 47 3 40 39 6
..$ : num 64 52 28 15 17 40
..$ : num 4 43 20 58 17 37 46
..$ : num 28 47 43 25 60 44 39
..$ : num 9 53 3 24 34 10 47
..$ : num 49 40 17 4 9 32 11
..$ : num 51 13 46 37 14 6
..$ : num 24 4 22 19 20 8 36
..$ : num 50 6 38 34 52 48
..$ : num 52 64 15 55 31 61 50
..$ : num 58 55 64 10 30 50 14
..$ : num 61 8 44 18 51 26 13
..$ : num 60 12 50 36 13 15 51
..$ : num 6 60 37 29 25 11 52
..$ : num 46 38 56 6 57 52 48
..$ : num 13 57 51 33 16 28
..$ : num 5 34 27 23 61
..$ : num 11 35 29 12 18 15
..$ : num 1 54 40 16 44 21 24
..$ : num 20 26 39 59 21 56 22
..$ : num 59 17 58 20
..$ : num 12 50 57 60 61 64 56
..$ : num 21 23 24 63 59 46 55
..$ : num 14 32 53 39 24 59
..$ : num 5 51 60 56 63 55 58
..$ : num 35 7 27 50 64 43 23
..$ : num 18 24 21 61 8 51 25
..$ : num 17 63 52 29 35
..$ : num 26 20 63 64 58
..$ : num 29 42 33 46 31 30
..$ : num 27 45 36 57 32 47 33
..$ : num 30 22 19 48 29 35 34
..$ : num 25 44 57
..$ : num 14 39 61 15 59 64
..$ : num 62 31 10 30 45 43
..$ : num 11 35 45 40 42
..$ : num 7 36 42 51 35 53
..$ : num 31 2 41 23 49 45
..$ : num 41 9 40 43 54 44
..$ : num 33 34 45 42 24
..$ : num 32 3 54 47 42 30 37
..$ : num 55
..$ : num 2 48 49 43 45
..$ : num 22 30 31 49 46 42 54
Calculate Average Opponent Rating
Few functions are created to look up opponent, and take mean of all the opponent’s pre-ratings. This part is also largely carried over from project 1.
# Create lookup table for ratings by player numberrating_lookup <-setNames(players$PreRating, 1:nrow(players))# Function to calculate average opponent ratingcalc_avg_opponent_rating <-function(opponent_numbers, rating_lookup) {if (length(opponent_numbers) ==0) {return(NA) }# Get ratings for opponents opponent_ratings <-c()for (opp in opponent_numbers) { rating <- rating_lookup[as.character(opp)]# Handle "P" ratings (extract numeric part)if (is.character(rating) &&grepl("P", rating)) { rating <-as.numeric(gsub("P.*$", "", rating)) }if (!is.na(rating)) { opponent_ratings <-c(opponent_ratings, rating) } }# Calculate meanif (length(opponent_ratings) >0) {return(mean(opponent_ratings)) } else {return(NA) }}# Apply function to all playersplayers$AvgOppRating <-sapply(players$Opponents, calc_avg_opponent_rating, rating_lookup = rating_lookup)# Round to nearest integerplayers$AvgOppRating <-round(players$AvgOppRating, 0)# Create final data frame with required columnsfinal_data <- players[, c("Name", "State", "Points", "PreRating", "AvgOppRating")]# Format points to show one decimal placefinal_data$Points <-sprintf("%.1f", final_data$Points)# Display first few rowshead(final_data, 10)
Name State Points PreRating AvgOppRating
1 GARY HUA ON 6.0 1794 1605
2 DAKSHESH DARURI MI 6.0 1553 1469
3 ADITYA BAJAJ MI 6.0 1384 1564
4 PATRICK H SCHILLING MI 5.5 1716 1574
5 HANSHI ZUO MI 5.5 1655 1501
6 HANSEN SONG OH 5.0 1686 1519
7 GARY DEE SWATHELL MI 5.0 1649 1372
8 EZEKIEL HOUGHTON MI 5.0 1641 1468
9 STEFANO LEE ON 5.0 1411 1523
10 ANVIT RAO MI 5.0 1365 1554
ELO Score Calculation
From the ELO formula, we create function to plug in player and opponents to get their expected score. Performance difference is calculated between actual and expected. This different would provide key insight for the requirement of this assignment - identify top players who outperformed and under performed during this tournament.
elo_expected <-function(player_rating, opponent_rating) {1/ (1+10^((opponent_rating - player_rating) /400))}# Convert points back to numericplayers$ActualScore <-as.numeric(final_data$Points)# create columnsplayers$ExpectedScore <-NAplayers$GamesWithRatings <-0players$PerformanceDiff <-NA# Calculate expected score for each playerfor (i in1:nrow(players)) { player_rating <- players$PreRating[i] opponent_numbers <- players$Opponents[[i]]# for players without ratingif (is.na(player_rating) ||length(opponent_numbers) ==0) { players$ExpectedScore[i] <-NA players$PerformanceDiff[i] <-NAnext }# Calculate expected score for each opponent expected_sum <-0 valid_games <-0for (opp_num in opponent_numbers) {# opponent's pre-rating opp_rating <- rating_lookup[as.character(opp_num)]# Handle "P" ratings - use the numeric part onlyif (is.character(opp_rating) &&grepl("P", opp_rating)) { opp_rating <-as.numeric(gsub("P.*$", "", opp_rating)) }# Only count games where both players have ratingsif (!is.na(player_rating) &&!is.na(opp_rating)) { expected <-elo_expected(player_rating, opp_rating) expected_sum <- expected_sum + expected valid_games <- valid_games +1 } }# save resultsif (valid_games >0) { players$ExpectedScore[i] <-round(expected_sum, 2) players$GamesWithRatings[i] <- valid_games# Calculate performance difference (Actual - Expected) actual_score <- players$ActualScore[i]if (!is.na(actual_score)) { players$PerformanceDiff[i] <-round(actual_score - expected_sum, 2) } }}
# Create enhanced final data with ELO resultsfinal_with_elo <- final_datafinal_with_elo$ActualScore <-as.numeric(final_data$Points)final_with_elo$ExpectedScore <-NAfinal_with_elo$PerformanceDiff <-NA# Match with players datafor (i in1:nrow(final_with_elo)) { player_name <- final_with_elo$Name[i] player_idx <-which(players$Name == player_name)[1]if (!is.na(player_idx)) { final_with_elo$ExpectedScore[i] <- players$ExpectedScore[player_idx] final_with_elo$PerformanceDiff[i] <- players$PerformanceDiff[player_idx] }}# Reorder columnsfinal_with_elo <- final_with_elo[, c("Name", "State", "Points", "PreRating", "AvgOppRating", "ExpectedScore", "PerformanceDiff")]# Save CSV fileswrite.csv(final_with_elo, "chess_results_with_elo.csv", row.names =FALSE, quote =TRUE)
Summary
ELO formula used pregame rating of each player to predict wining chances and expected result for the match. Compare to actual result, we can get an idea who outperformed our under performed. From this result, Adita Bajaj, Zachary James Houghton, Anvit Rao, Jacob Alexander Lavalley and Amitatosh Pwnanandam were ranked top players who exceeded expectation. On the other hand, we have Loren Schwiebert, George Avery Jones, Jared GE, Rishi Shety and Joshua David Lee performed under expectation.