5B approach

Approach

For this assignment, I plan to use the cleaned Project 1 tournament data, especially the player ratings, opponent matchups, and actual scores, to calculate each player’s expected score. I will use a standard ELO expected score formula and cite the source I use, since the assignment allows for small differences depending on the formula. My plan is to first organize the data so each player is matched with the ratings of the opponents they played. Then I will calculate the expected result for each matchup and add those together to get each player’s total expected score. After that, I will compare the expected score to the actual score to see who performed above or below expectation. The main challenge will be making sure the opponent matching is correct and that the actual scores are carried over consistently from the Project 1 data. Once that is done, I will rank players by the difference between actual score and expected score, then identify the five players who most overperformed and the five who most underperformed.

Load and Parse Tournament Data

# Using the same raw tournament text file from Project 1
input_file <- "C:/Users/hkim1/OneDrive/Desktop/tournamentinfo.txt"

lines <- readLines(input_file, warn = FALSE, encoding = "UTF-8")
lines <- str_replace_all(lines, "\r", "")
lines <- lines[nzchar(str_trim(lines))]

player_idx <- which(str_detect(lines, "^\\s*\\d+\\s*\\|"))
player_idx <- player_idx[player_idx < length(lines)]

header_lines <- lines[player_idx]
detail_lines <- lines[player_idx + 1]

header_mat <- str_split(header_lines, "\\|", simplify = TRUE)
header_df <- as.data.frame(header_mat, stringsAsFactors = FALSE)

while (ncol(header_df) > 0 && all(str_trim(header_df[[ncol(header_df)]]) == "")) {
  header_df <- header_df[, -ncol(header_df), drop = FALSE]
}

round_count <- ncol(header_df) - 3
round_names <- paste0("r", seq_len(round_count))
colnames(header_df) <- c("pair_raw", "name_raw", "total_raw", round_names)

players <- header_df %>%
  transmute(
    pair_num  = as.integer(str_trim(pair_raw)),
    name      = str_squish(name_raw) %>% str_to_lower() %>% str_to_title(),
    total_pts = as.numeric(str_trim(total_raw)),
    across(all_of(round_names), ~ str_squish(.x))
  )

detail_mat <- str_split(detail_lines, "\\|", simplify = TRUE)
detail_df <- as.data.frame(detail_mat, stringsAsFactors = FALSE)

state <- str_squish(detail_df[[1]])
pre_rating <- as.integer(str_match(detail_df[[2]], "R:\\s*(\\d+)")[, 2])

players <- players %>%
  mutate(
    state = state,
    pre_rating = pre_rating
  )

players

##    pair_num                       name total_pts   r1   r2   r3   r4   r5   r6
## 1         1                   Gary Hua       6.0 W 39 W 21 W 18 W 14  W 7 D 12
## 2         2            Dakshesh Daruri       6.0 W 63 W 58  L 4 W 17 W 16 W 20
## 3         3               Aditya Bajaj       6.0  L 8 W 61 W 25 W 21 W 11 W 13
## 4         4        Patrick H Schilling       5.5 W 23 D 28  W 2 W 26  D 5 W 19
## 5         5                 Hanshi Zuo       5.5 W 45 W 37 D 12 D 13  D 4 W 14
## 6         6                Hansen Song       5.0 W 34 D 29 L 11 W 35 D 10 W 27
## 7         7          Gary Dee Swathell       5.0 W 57 W 46 W 13 W 11  L 1  W 9
## 8         8           Ezekiel Houghton       5.0  W 3 W 32 L 14  L 9 W 47 W 28
## 9         9                Stefano Lee       5.0 W 25 L 18 W 59  W 8 W 26  L 7
## 10       10                  Anvit Rao       5.0 D 16 L 19 W 55 W 31  D 6 W 25
## 11       11   Cameron William Mc Leman       4.5 D 38 W 56  W 6  L 7  L 3 W 34
## 12       12             Kenneth J Tack       4.5 W 42 W 33  D 5 W 38    H  D 1
## 13       13          Torrance Henry Jr       4.5 W 36 W 27  L 7  D 5 W 33  L 3
## 14       14               Bradley Shaw       4.5 W 54 W 44  W 8  L 1 D 27  L 5
## 15       15     Zachary James Houghton       4.5 D 19 L 16 W 30 L 22 W 54 W 33
## 16       16               Mike Nikitin       4.0 D 10 W 15    H W 39  L 2 W 36
## 17       17         Ronald Grzegorczyk       4.0 W 48 W 41 L 26  L 2 W 23 W 22
## 18       18              David Sundeen       4.0 W 47  W 9  L 1 W 32 L 19 W 38
## 19       19               Dipankar Roy       4.0 D 15 W 10 W 52 D 28 W 18  L 4
## 20       20                Jason Zheng       4.0 L 40 W 49 W 23 W 41 W 28  L 2
## 21       21              Dinh Dang Bui       4.0 W 43  L 1 W 47  L 3 W 40 W 39
## 22       22           Eugene L Mcclure       4.0 W 64 D 52 L 28 W 15    H L 17
## 23       23                   Alan Bui       4.0  L 4 W 43 L 20 W 58 L 17 W 37
## 24       24          Michael R Aldrich       4.0 L 28 L 47 W 43 L 25 W 60 W 44
## 25       25           Loren Schwiebert       3.5  L 9 W 53  L 3 W 24 D 34 L 10
## 26       26                    Max Zhu       3.5 W 49 W 40 W 17  L 4  L 9 D 32
## 27       27             Gaurav Gidwani       3.5 W 51 L 13 W 46 W 37 D 14  L 6
## 28       28 Sofia Adina Stanescu-Bellu       3.5 W 24  D 4 W 22 D 19 L 20  L 8
## 29       29           Chiedozie Okorie       3.5 W 50  D 6 L 38 L 34 W 52 W 48
## 30       30         George Avery Jones       3.5 L 52 D 64 L 15 W 55 L 31 W 61
## 31       31               Rishi Shetty       3.5 L 58 D 55 W 64 L 10 W 30 W 50
## 32       32      Joshua Philip Mathews       3.5 W 61  L 8 W 44 L 18 W 51 D 26
## 33       33                    Jade Ge       3.5 W 60 L 12 W 50 D 36 L 13 L 15
## 34       34     Michael Jeffery Thomas       3.5  L 6 W 60 L 37 W 29 D 25 L 11
## 35       35           Joshua David Lee       3.5 L 46 L 38 W 56  L 6 W 57 D 52
## 36       36              Siddharth Jha       3.5 L 13 W 57 W 51 D 33    H L 16
## 37       37       Amiyatosh Pwnanandam       3.5    B  L 5 W 34 L 27    H L 23
## 38       38                  Brian Liu       3.0 D 11 W 35 W 29 L 12    H L 18
## 39       39              Joel R Hendon       3.0  L 1 W 54 W 40 L 16 W 44 L 21
## 40       40               Forest Zhang       3.0 W 20 L 26 L 39 W 59 L 21 W 56
## 41       41        Kyle William Murphy       3.0 W 59 L 17 W 58 L 20    X    U
## 42       42                   Jared Ge       3.0 L 12 L 50 L 57 D 60 D 61 W 64
## 43       43          Robert Glen Vasey       3.0 L 21 L 23 L 24 W 63 W 59 L 46
## 44       44         Justin D Schilling       3.0    B L 14 L 32 W 53 L 39 L 24
## 45       45                  Derek Yan       3.0  L 5 L 51 D 60 L 56 W 63 D 55
## 46       46   Jacob Alexander Lavalley       3.0 W 35  L 7 L 27 L 50 W 64 W 43
## 47       47                Eric Wright       2.5 L 18 W 24 L 21 W 61  L 8 D 51
## 48       48               Daniel Khain       2.5 L 17 W 63    H D 52    H L 29
## 49       49           Michael J Martin       2.5 L 26 L 20 D 63 D 64 W 58    H
## 50       50                 Shivam Jha       2.5 L 29 W 42 L 33 W 46    H L 31
## 51       51             Tejas Ayyagari       2.5 L 27 W 45 L 36 W 57 L 32 D 47
## 52       52                  Ethan Guo       2.5 W 30 D 22 L 19 D 48 L 29 D 35
## 53       53              Jose C Ybarra       2.0    H L 25    H L 44    U W 57
## 54       54                Larry Hodge       2.0 L 14 L 39 L 61    B L 15 L 59
## 55       55                  Alex Kong       2.0 L 62 D 31 L 10 L 30    B D 45
## 56       56               Marisa Ricci       2.0    H L 11 L 35 W 45    H L 40
## 57       57                 Michael Lu       2.0  L 7 L 36 W 42 L 51 L 35 L 53
## 58       58               Viraj Mohile       2.0 W 31  L 2 L 41 L 23 L 49    B
## 59       59          Sean M Mc Cormick       2.0 L 41    B  L 9 L 40 L 43 W 54
## 60       60                 Julia Shen       1.5 L 33 L 34 D 45 D 42 L 24    H
## 61       61              Jezzel Farkas       1.5 L 32  L 3 W 54 L 47 D 42 L 30
## 62       62              Ashwin Balaji       1.0 W 55    U    U    U    U    U
## 63       63       Thomas Joseph Hosmer       1.0  L 2 L 48 D 49 L 43 L 45    H
## 64       64                     Ben Li       1.0 L 22 D 30 L 31 D 49 L 46 L 42
##      r7 state pre_rating
## 1   D 4    ON       1794
## 2   W 7    MI       1553
## 3  W 12    MI       1384
## 4   D 1    MI       1716
## 5  W 17    MI       1655
## 6  W 21    OH       1686
## 7   L 2    MI       1649
## 8  W 19    MI       1641
## 9  W 20    ON       1411
## 10 W 18    MI       1365
## 11 W 26    MI       1712
## 12  L 3    MI       1663
## 13 W 32    MI       1666
## 14 W 31    MI       1610
## 15 W 38    MI       1220
## 16    U    MI       1604
## 17  L 5    MI       1629
## 18 L 10    MI       1600
## 19  L 8    MI       1564
## 20  L 9    MI       1595
## 21  L 6    ON       1563
## 22 W 40    MI       1555
## 23 W 46    ON       1363
## 24 W 39    MI       1229
## 25 W 47    MI       1745
## 26 L 11    ON       1579
## 27    U    MI       1552
## 28 D 36    MI       1507
## 29    U    MI       1602
## 30 W 50    ON       1522
## 31 L 14    MI       1494
## 32 L 13    ON       1441
## 33 W 51    MI       1449
## 34 W 52    MI       1399
## 35 W 48    MI       1438
## 36 D 28    MI       1355
## 37 W 61    MI        980
## 38 L 15    MI       1423
## 39 L 24    MI       1436
## 40 L 22    MI       1348
## 41    U    MI       1403
## 42 W 56    MI       1332
## 43 W 55    MI       1283
## 44 W 59    MI       1199
## 45 W 58    MI       1242
## 46 L 23    MI        377
## 47 L 25    MI       1362
## 48 L 35    MI       1382
## 49    U    MI       1291
## 50 L 30    MI       1056
## 51 L 33    MI       1011
## 52 L 34    MI        935
## 53    U    MI       1393
## 54 W 64    MI       1270
## 55 L 43    MI       1186
## 56 L 42    MI       1153
## 57    B    MI       1092
## 58 L 45    MI        917
## 59 L 44    MI        853
## 60    U    MI        967
## 61 L 37    ON        955
## 62    U    MI       1530
## 63    U    MI       1175
## 64 L 54    MI       1163

Convert to Game-Level Data

games_long <- players %>%
  select(pair_num, name, pre_rating, all_of(round_names)) %>%
  pivot_longer(
    cols = all_of(round_names),
    names_to = "round",
    values_to = "result"
  ) %>%
  mutate(
    result_code = str_extract(result, "^[WDLHBFXU]"),
    opp_pair = suppressWarnings(as.integer(str_extract(result, "\\d+"))),
    actual_score = case_when(
      result_code == "W" ~ 1,
      result_code == "D" ~ 0.5,
      result_code == "L" ~ 0,
      result_code %in% c("H", "B", "F", "X") ~ 1,
      result_code == "U" ~ 0,
      TRUE ~ NA_real_
    )
  ) %>%
  filter(!is.na(opp_pair), !is.na(actual_score))

games_long

## # A tibble: 408 × 8
##    pair_num name       pre_rating round result result_code opp_pair actual_score
##       <int> <chr>           <int> <chr> <chr>  <chr>          <int>        <dbl>
##  1        1 Gary Hua         1794 r1    W 39   W                 39          1  
##  2        1 Gary Hua         1794 r2    W 21   W                 21          1  
##  3        1 Gary Hua         1794 r3    W 18   W                 18          1  
##  4        1 Gary Hua         1794 r4    W 14   W                 14          1  
##  5        1 Gary Hua         1794 r5    W 7    W                  7          1  
##  6        1 Gary Hua         1794 r6    D 12   D                 12          0.5
##  7        1 Gary Hua         1794 r7    D 4    D                  4          0.5
##  8        2 Dakshesh …       1553 r1    W 63   W                 63          1  
##  9        2 Dakshesh …       1553 r2    W 58   W                 58          1  
## 10        2 Dakshesh …       1553 r3    L 4    L                  4          0  
## # ℹ 398 more rows

Add Opponent Ratings

opponent_lookup <- players %>%
  select(
    opp_pair = pair_num,
    opp_name = name,
    opp_rating = pre_rating
  )

games_elo <- games_long %>%
  left_join(opponent_lookup, by = "opp_pair") %>%
  filter(!is.na(opp_rating)) %>%
  mutate(
    expected_score = 1 / (1 + 10 ^ ((opp_rating - pre_rating) / 400)),
    score_diff = actual_score - expected_score
  )

games_elo

## # A tibble: 408 × 12
##    pair_num name       pre_rating round result result_code opp_pair actual_score
##       <int> <chr>           <int> <chr> <chr>  <chr>          <int>        <dbl>
##  1        1 Gary Hua         1794 r1    W 39   W                 39          1  
##  2        1 Gary Hua         1794 r2    W 21   W                 21          1  
##  3        1 Gary Hua         1794 r3    W 18   W                 18          1  
##  4        1 Gary Hua         1794 r4    W 14   W                 14          1  
##  5        1 Gary Hua         1794 r5    W 7    W                  7          1  
##  6        1 Gary Hua         1794 r6    D 12   D                 12          0.5
##  7        1 Gary Hua         1794 r7    D 4    D                  4          0.5
##  8        2 Dakshesh …       1553 r1    W 63   W                 63          1  
##  9        2 Dakshesh …       1553 r2    W 58   W                 58          1  
## 10        2 Dakshesh …       1553 r3    L 4    L                  4          0  
## # ℹ 398 more rows
## # ℹ 4 more variables: opp_name <chr>, opp_rating <int>, expected_score <dbl>,
## #   score_diff <dbl>

Summarize by Player

player_elo_summary <- games_elo %>%
  group_by(pair_num, name, pre_rating) %>%
  summarise(
    actual_score = sum(actual_score, na.rm = TRUE),
    expected_score = sum(expected_score, na.rm = TRUE),
    performance_diff = actual_score - expected_score,
    .groups = "drop"
  ) %>%
  arrange(desc(performance_diff))

player_elo_summary

## # A tibble: 64 × 6
##    pair_num name         pre_rating actual_score expected_score performance_diff
##       <int> <chr>             <int>        <dbl>          <dbl>            <dbl>
##  1        3 Aditya Bajaj       1384          6           1.95               4.05
##  2       15 Zachary Jam…       1220          4.5         1.37               3.13
##  3       10 Anvit Rao          1365          5           1.94               3.06
##  4       46 Jacob Alexa…        377          3           0.0432             2.96
##  5        9 Stefano Lee        1411          5           2.29               2.71
##  6        2 Dakshesh Da…       1553          6           3.78               2.22
##  7       52 Ethan Guo           935          2.5         0.295              2.20
##  8       51 Tejas Ayyag…       1011          2.5         1.03               1.47
##  9       24 Michael R A…       1229          4           2.55               1.45
## 10       37 Amiyatosh P…        980          2           0.773              1.23
## # ℹ 54 more rows

Five Most Overperformed

top_5_overperformed <- player_elo_summary %>%
  slice_max(order_by = performance_diff, n = 5)

top_5_overperformed

## # A tibble: 5 × 6
##   pair_num name          pre_rating actual_score expected_score performance_diff
##      <int> <chr>              <int>        <dbl>          <dbl>            <dbl>
## 1        3 Aditya Bajaj        1384          6           1.95               4.05
## 2       15 Zachary Jame…       1220          4.5         1.37               3.13
## 3       10 Anvit Rao           1365          5           1.94               3.06
## 4       46 Jacob Alexan…        377          3           0.0432             2.96
## 5        9 Stefano Lee         1411          5           2.29               2.71

Five Most Underperformed

top_5_underperformed <- player_elo_summary %>%
  slice_min(order_by = performance_diff, n = 5)

top_5_underperformed

## # A tibble: 5 × 6
##   pair_num name          pre_rating actual_score expected_score performance_diff
##      <int> <chr>              <int>        <dbl>          <dbl>            <dbl>
## 1       25 Loren Schwie…       1745          3.5           6.28            -2.78
## 2       30 George Avery…       1522          3.5           6.02            -2.52
## 3       54 Larry Hodge         1270          1             3.40            -2.40
## 4       42 Jared Ge            1332          3             5.01            -2.01
## 5       31 Rishi Shetty        1494          3.5           5.09            -1.59

Chart

plot_data <- player_elo_summary %>%
  slice_max(order_by = performance_diff, n = 5) %>%
  bind_rows(
    player_elo_summary %>%
      slice_min(order_by = performance_diff, n = 5)
  ) %>%
  arrange(performance_diff) %>%
  mutate(name = factor(name, levels = name))

ggplot(plot_data, aes(x = name, y = performance_diff)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top 5 Overperformed and Underperformed Players",
    x = "Player",
    y = "Actual - Expected"
  )

Save Output Files

write_csv(player_elo_summary, "player_elo_summary.csv")
write_csv(top_5_overperformed, "top_5_overperformed.csv")
write_csv(top_5_underperformed, "top_5_underperformed.csv")

Conclusion

The Elo calculation shows how each player performed compared to what would normally be expected based on the ratings of the opponents they faced. After comparing actual scores to expected scores, the results make it clear that some players finished well above expectation while others finished below it. The tables above highlight the five strongest overperformers and the five strongest underperformers, showing which players most exceeded or fell short of their projected results

cat(
  "Using the Elo expected score formula, each player's expected score was calculated from the ratings of the opponents they actually faced in each round.",
  "The tables above show the five players who most overperformed and the five players who most underperformed relative to expectation."
)

## Using the Elo expected score formula, each player's expected score was calculated from the ratings of the opponents they actually faced in each round. The tables above show the five players who most overperformed and the five players who most underperformed relative to expectation.