For this assignment, I plan to use the cleaned Project 1 tournament data, especially the player ratings, opponent matchups, and actual scores, to calculate each player’s expected score. I will use a standard ELO expected score formula and cite the source I use, since the assignment allows for small differences depending on the formula. My plan is to first organize the data so each player is matched with the ratings of the opponents they played. Then I will calculate the expected result for each matchup and add those together to get each player’s total expected score. After that, I will compare the expected score to the actual score to see who performed above or below expectation. The main challenge will be making sure the opponent matching is correct and that the actual scores are carried over consistently from the Project 1 data. Once that is done, I will rank players by the difference between actual score and expected score, then identify the five players who most overperformed and the five who most underperformed.
# Using the same raw tournament text file from Project 1
input_file <- "C:/Users/hkim1/OneDrive/Desktop/tournamentinfo.txt"
lines <- readLines(input_file, warn = FALSE, encoding = "UTF-8")
lines <- str_replace_all(lines, "\r", "")
lines <- lines[nzchar(str_trim(lines))]
player_idx <- which(str_detect(lines, "^\\s*\\d+\\s*\\|"))
player_idx <- player_idx[player_idx < length(lines)]
header_lines <- lines[player_idx]
detail_lines <- lines[player_idx + 1]
header_mat <- str_split(header_lines, "\\|", simplify = TRUE)
header_df <- as.data.frame(header_mat, stringsAsFactors = FALSE)
while (ncol(header_df) > 0 && all(str_trim(header_df[[ncol(header_df)]]) == "")) {
header_df <- header_df[, -ncol(header_df), drop = FALSE]
}
round_count <- ncol(header_df) - 3
round_names <- paste0("r", seq_len(round_count))
colnames(header_df) <- c("pair_raw", "name_raw", "total_raw", round_names)
players <- header_df %>%
transmute(
pair_num = as.integer(str_trim(pair_raw)),
name = str_squish(name_raw) %>% str_to_lower() %>% str_to_title(),
total_pts = as.numeric(str_trim(total_raw)),
across(all_of(round_names), ~ str_squish(.x))
)
detail_mat <- str_split(detail_lines, "\\|", simplify = TRUE)
detail_df <- as.data.frame(detail_mat, stringsAsFactors = FALSE)
state <- str_squish(detail_df[[1]])
pre_rating <- as.integer(str_match(detail_df[[2]], "R:\\s*(\\d+)")[, 2])
players <- players %>%
mutate(
state = state,
pre_rating = pre_rating
)
players
## pair_num name total_pts r1 r2 r3 r4 r5 r6
## 1 1 Gary Hua 6.0 W 39 W 21 W 18 W 14 W 7 D 12
## 2 2 Dakshesh Daruri 6.0 W 63 W 58 L 4 W 17 W 16 W 20
## 3 3 Aditya Bajaj 6.0 L 8 W 61 W 25 W 21 W 11 W 13
## 4 4 Patrick H Schilling 5.5 W 23 D 28 W 2 W 26 D 5 W 19
## 5 5 Hanshi Zuo 5.5 W 45 W 37 D 12 D 13 D 4 W 14
## 6 6 Hansen Song 5.0 W 34 D 29 L 11 W 35 D 10 W 27
## 7 7 Gary Dee Swathell 5.0 W 57 W 46 W 13 W 11 L 1 W 9
## 8 8 Ezekiel Houghton 5.0 W 3 W 32 L 14 L 9 W 47 W 28
## 9 9 Stefano Lee 5.0 W 25 L 18 W 59 W 8 W 26 L 7
## 10 10 Anvit Rao 5.0 D 16 L 19 W 55 W 31 D 6 W 25
## 11 11 Cameron William Mc Leman 4.5 D 38 W 56 W 6 L 7 L 3 W 34
## 12 12 Kenneth J Tack 4.5 W 42 W 33 D 5 W 38 H D 1
## 13 13 Torrance Henry Jr 4.5 W 36 W 27 L 7 D 5 W 33 L 3
## 14 14 Bradley Shaw 4.5 W 54 W 44 W 8 L 1 D 27 L 5
## 15 15 Zachary James Houghton 4.5 D 19 L 16 W 30 L 22 W 54 W 33
## 16 16 Mike Nikitin 4.0 D 10 W 15 H W 39 L 2 W 36
## 17 17 Ronald Grzegorczyk 4.0 W 48 W 41 L 26 L 2 W 23 W 22
## 18 18 David Sundeen 4.0 W 47 W 9 L 1 W 32 L 19 W 38
## 19 19 Dipankar Roy 4.0 D 15 W 10 W 52 D 28 W 18 L 4
## 20 20 Jason Zheng 4.0 L 40 W 49 W 23 W 41 W 28 L 2
## 21 21 Dinh Dang Bui 4.0 W 43 L 1 W 47 L 3 W 40 W 39
## 22 22 Eugene L Mcclure 4.0 W 64 D 52 L 28 W 15 H L 17
## 23 23 Alan Bui 4.0 L 4 W 43 L 20 W 58 L 17 W 37
## 24 24 Michael R Aldrich 4.0 L 28 L 47 W 43 L 25 W 60 W 44
## 25 25 Loren Schwiebert 3.5 L 9 W 53 L 3 W 24 D 34 L 10
## 26 26 Max Zhu 3.5 W 49 W 40 W 17 L 4 L 9 D 32
## 27 27 Gaurav Gidwani 3.5 W 51 L 13 W 46 W 37 D 14 L 6
## 28 28 Sofia Adina Stanescu-Bellu 3.5 W 24 D 4 W 22 D 19 L 20 L 8
## 29 29 Chiedozie Okorie 3.5 W 50 D 6 L 38 L 34 W 52 W 48
## 30 30 George Avery Jones 3.5 L 52 D 64 L 15 W 55 L 31 W 61
## 31 31 Rishi Shetty 3.5 L 58 D 55 W 64 L 10 W 30 W 50
## 32 32 Joshua Philip Mathews 3.5 W 61 L 8 W 44 L 18 W 51 D 26
## 33 33 Jade Ge 3.5 W 60 L 12 W 50 D 36 L 13 L 15
## 34 34 Michael Jeffery Thomas 3.5 L 6 W 60 L 37 W 29 D 25 L 11
## 35 35 Joshua David Lee 3.5 L 46 L 38 W 56 L 6 W 57 D 52
## 36 36 Siddharth Jha 3.5 L 13 W 57 W 51 D 33 H L 16
## 37 37 Amiyatosh Pwnanandam 3.5 B L 5 W 34 L 27 H L 23
## 38 38 Brian Liu 3.0 D 11 W 35 W 29 L 12 H L 18
## 39 39 Joel R Hendon 3.0 L 1 W 54 W 40 L 16 W 44 L 21
## 40 40 Forest Zhang 3.0 W 20 L 26 L 39 W 59 L 21 W 56
## 41 41 Kyle William Murphy 3.0 W 59 L 17 W 58 L 20 X U
## 42 42 Jared Ge 3.0 L 12 L 50 L 57 D 60 D 61 W 64
## 43 43 Robert Glen Vasey 3.0 L 21 L 23 L 24 W 63 W 59 L 46
## 44 44 Justin D Schilling 3.0 B L 14 L 32 W 53 L 39 L 24
## 45 45 Derek Yan 3.0 L 5 L 51 D 60 L 56 W 63 D 55
## 46 46 Jacob Alexander Lavalley 3.0 W 35 L 7 L 27 L 50 W 64 W 43
## 47 47 Eric Wright 2.5 L 18 W 24 L 21 W 61 L 8 D 51
## 48 48 Daniel Khain 2.5 L 17 W 63 H D 52 H L 29
## 49 49 Michael J Martin 2.5 L 26 L 20 D 63 D 64 W 58 H
## 50 50 Shivam Jha 2.5 L 29 W 42 L 33 W 46 H L 31
## 51 51 Tejas Ayyagari 2.5 L 27 W 45 L 36 W 57 L 32 D 47
## 52 52 Ethan Guo 2.5 W 30 D 22 L 19 D 48 L 29 D 35
## 53 53 Jose C Ybarra 2.0 H L 25 H L 44 U W 57
## 54 54 Larry Hodge 2.0 L 14 L 39 L 61 B L 15 L 59
## 55 55 Alex Kong 2.0 L 62 D 31 L 10 L 30 B D 45
## 56 56 Marisa Ricci 2.0 H L 11 L 35 W 45 H L 40
## 57 57 Michael Lu 2.0 L 7 L 36 W 42 L 51 L 35 L 53
## 58 58 Viraj Mohile 2.0 W 31 L 2 L 41 L 23 L 49 B
## 59 59 Sean M Mc Cormick 2.0 L 41 B L 9 L 40 L 43 W 54
## 60 60 Julia Shen 1.5 L 33 L 34 D 45 D 42 L 24 H
## 61 61 Jezzel Farkas 1.5 L 32 L 3 W 54 L 47 D 42 L 30
## 62 62 Ashwin Balaji 1.0 W 55 U U U U U
## 63 63 Thomas Joseph Hosmer 1.0 L 2 L 48 D 49 L 43 L 45 H
## 64 64 Ben Li 1.0 L 22 D 30 L 31 D 49 L 46 L 42
## r7 state pre_rating
## 1 D 4 ON 1794
## 2 W 7 MI 1553
## 3 W 12 MI 1384
## 4 D 1 MI 1716
## 5 W 17 MI 1655
## 6 W 21 OH 1686
## 7 L 2 MI 1649
## 8 W 19 MI 1641
## 9 W 20 ON 1411
## 10 W 18 MI 1365
## 11 W 26 MI 1712
## 12 L 3 MI 1663
## 13 W 32 MI 1666
## 14 W 31 MI 1610
## 15 W 38 MI 1220
## 16 U MI 1604
## 17 L 5 MI 1629
## 18 L 10 MI 1600
## 19 L 8 MI 1564
## 20 L 9 MI 1595
## 21 L 6 ON 1563
## 22 W 40 MI 1555
## 23 W 46 ON 1363
## 24 W 39 MI 1229
## 25 W 47 MI 1745
## 26 L 11 ON 1579
## 27 U MI 1552
## 28 D 36 MI 1507
## 29 U MI 1602
## 30 W 50 ON 1522
## 31 L 14 MI 1494
## 32 L 13 ON 1441
## 33 W 51 MI 1449
## 34 W 52 MI 1399
## 35 W 48 MI 1438
## 36 D 28 MI 1355
## 37 W 61 MI 980
## 38 L 15 MI 1423
## 39 L 24 MI 1436
## 40 L 22 MI 1348
## 41 U MI 1403
## 42 W 56 MI 1332
## 43 W 55 MI 1283
## 44 W 59 MI 1199
## 45 W 58 MI 1242
## 46 L 23 MI 377
## 47 L 25 MI 1362
## 48 L 35 MI 1382
## 49 U MI 1291
## 50 L 30 MI 1056
## 51 L 33 MI 1011
## 52 L 34 MI 935
## 53 U MI 1393
## 54 W 64 MI 1270
## 55 L 43 MI 1186
## 56 L 42 MI 1153
## 57 B MI 1092
## 58 L 45 MI 917
## 59 L 44 MI 853
## 60 U MI 967
## 61 L 37 ON 955
## 62 U MI 1530
## 63 U MI 1175
## 64 L 54 MI 1163
games_long <- players %>%
select(pair_num, name, pre_rating, all_of(round_names)) %>%
pivot_longer(
cols = all_of(round_names),
names_to = "round",
values_to = "result"
) %>%
mutate(
result_code = str_extract(result, "^[WDLHBFXU]"),
opp_pair = suppressWarnings(as.integer(str_extract(result, "\\d+"))),
actual_score = case_when(
result_code == "W" ~ 1,
result_code == "D" ~ 0.5,
result_code == "L" ~ 0,
result_code %in% c("H", "B", "F", "X") ~ 1,
result_code == "U" ~ 0,
TRUE ~ NA_real_
)
) %>%
filter(!is.na(opp_pair), !is.na(actual_score))
games_long
## # A tibble: 408 × 8
## pair_num name pre_rating round result result_code opp_pair actual_score
## <int> <chr> <int> <chr> <chr> <chr> <int> <dbl>
## 1 1 Gary Hua 1794 r1 W 39 W 39 1
## 2 1 Gary Hua 1794 r2 W 21 W 21 1
## 3 1 Gary Hua 1794 r3 W 18 W 18 1
## 4 1 Gary Hua 1794 r4 W 14 W 14 1
## 5 1 Gary Hua 1794 r5 W 7 W 7 1
## 6 1 Gary Hua 1794 r6 D 12 D 12 0.5
## 7 1 Gary Hua 1794 r7 D 4 D 4 0.5
## 8 2 Dakshesh … 1553 r1 W 63 W 63 1
## 9 2 Dakshesh … 1553 r2 W 58 W 58 1
## 10 2 Dakshesh … 1553 r3 L 4 L 4 0
## # ℹ 398 more rows
opponent_lookup <- players %>%
select(
opp_pair = pair_num,
opp_name = name,
opp_rating = pre_rating
)
games_elo <- games_long %>%
left_join(opponent_lookup, by = "opp_pair") %>%
filter(!is.na(opp_rating)) %>%
mutate(
expected_score = 1 / (1 + 10 ^ ((opp_rating - pre_rating) / 400)),
score_diff = actual_score - expected_score
)
games_elo
## # A tibble: 408 × 12
## pair_num name pre_rating round result result_code opp_pair actual_score
## <int> <chr> <int> <chr> <chr> <chr> <int> <dbl>
## 1 1 Gary Hua 1794 r1 W 39 W 39 1
## 2 1 Gary Hua 1794 r2 W 21 W 21 1
## 3 1 Gary Hua 1794 r3 W 18 W 18 1
## 4 1 Gary Hua 1794 r4 W 14 W 14 1
## 5 1 Gary Hua 1794 r5 W 7 W 7 1
## 6 1 Gary Hua 1794 r6 D 12 D 12 0.5
## 7 1 Gary Hua 1794 r7 D 4 D 4 0.5
## 8 2 Dakshesh … 1553 r1 W 63 W 63 1
## 9 2 Dakshesh … 1553 r2 W 58 W 58 1
## 10 2 Dakshesh … 1553 r3 L 4 L 4 0
## # ℹ 398 more rows
## # ℹ 4 more variables: opp_name <chr>, opp_rating <int>, expected_score <dbl>,
## # score_diff <dbl>
player_elo_summary <- games_elo %>%
group_by(pair_num, name, pre_rating) %>%
summarise(
actual_score = sum(actual_score, na.rm = TRUE),
expected_score = sum(expected_score, na.rm = TRUE),
performance_diff = actual_score - expected_score,
.groups = "drop"
) %>%
arrange(desc(performance_diff))
player_elo_summary
## # A tibble: 64 × 6
## pair_num name pre_rating actual_score expected_score performance_diff
## <int> <chr> <int> <dbl> <dbl> <dbl>
## 1 3 Aditya Bajaj 1384 6 1.95 4.05
## 2 15 Zachary Jam… 1220 4.5 1.37 3.13
## 3 10 Anvit Rao 1365 5 1.94 3.06
## 4 46 Jacob Alexa… 377 3 0.0432 2.96
## 5 9 Stefano Lee 1411 5 2.29 2.71
## 6 2 Dakshesh Da… 1553 6 3.78 2.22
## 7 52 Ethan Guo 935 2.5 0.295 2.20
## 8 51 Tejas Ayyag… 1011 2.5 1.03 1.47
## 9 24 Michael R A… 1229 4 2.55 1.45
## 10 37 Amiyatosh P… 980 2 0.773 1.23
## # ℹ 54 more rows
top_5_overperformed <- player_elo_summary %>%
slice_max(order_by = performance_diff, n = 5)
top_5_overperformed
## # A tibble: 5 × 6
## pair_num name pre_rating actual_score expected_score performance_diff
## <int> <chr> <int> <dbl> <dbl> <dbl>
## 1 3 Aditya Bajaj 1384 6 1.95 4.05
## 2 15 Zachary Jame… 1220 4.5 1.37 3.13
## 3 10 Anvit Rao 1365 5 1.94 3.06
## 4 46 Jacob Alexan… 377 3 0.0432 2.96
## 5 9 Stefano Lee 1411 5 2.29 2.71
top_5_underperformed <- player_elo_summary %>%
slice_min(order_by = performance_diff, n = 5)
top_5_underperformed
## # A tibble: 5 × 6
## pair_num name pre_rating actual_score expected_score performance_diff
## <int> <chr> <int> <dbl> <dbl> <dbl>
## 1 25 Loren Schwie… 1745 3.5 6.28 -2.78
## 2 30 George Avery… 1522 3.5 6.02 -2.52
## 3 54 Larry Hodge 1270 1 3.40 -2.40
## 4 42 Jared Ge 1332 3 5.01 -2.01
## 5 31 Rishi Shetty 1494 3.5 5.09 -1.59
plot_data <- player_elo_summary %>%
slice_max(order_by = performance_diff, n = 5) %>%
bind_rows(
player_elo_summary %>%
slice_min(order_by = performance_diff, n = 5)
) %>%
arrange(performance_diff) %>%
mutate(name = factor(name, levels = name))
ggplot(plot_data, aes(x = name, y = performance_diff)) +
geom_col() +
coord_flip() +
labs(
title = "Top 5 Overperformed and Underperformed Players",
x = "Player",
y = "Actual - Expected"
)
write_csv(player_elo_summary, "player_elo_summary.csv")
write_csv(top_5_overperformed, "top_5_overperformed.csv")
write_csv(top_5_underperformed, "top_5_underperformed.csv")
The Elo calculation shows how each player performed compared to what would normally be expected based on the ratings of the opponents they faced. After comparing actual scores to expected scores, the results make it clear that some players finished well above expectation while others finished below it. The tables above highlight the five strongest overperformers and the five strongest underperformers, showing which players most exceeded or fell short of their projected results
cat(
"Using the Elo expected score formula, each player's expected score was calculated from the ratings of the opponents they actually faced in each round.",
"The tables above show the five players who most overperformed and the five players who most underperformed relative to expectation."
)
## Using the Elo expected score formula, each player's expected score was calculated from the ratings of the opponents they actually faced in each round. The tables above show the five players who most overperformed and the five players who most underperformed relative to expectation.