DATA 607 Project 1

Silma Khan

SPRING 2025

For this project, we are tasked with needing to read a .txt file titled (tournamentinfo.txt) that contains records for each chess player. These records are broken up to using two line for each chess player and for this project, we need to extract:

  • Player’s Name : which is from the first line
  • Player’s State : which is in abbreviated form found in the second line
  • Total Number of Points : which is from the first line
  • Player’s Pre-Rating : which is from the second line, specifically in the rating information
  • Average Pre Chess Rating of Opponents : which we have to compute using the opponents pair numbers listed in the rounds (found in the first line) and then looking for each opponents pre-rating from their own record, and then averaging those ratings

So with this new information, the first record should look like:

  • Gary Hua, ON, 6.0, 1794, 1605

In order to complete these steps, we need to first import the necessary libraries in order to precess the string and to manipulate data

library(stringr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Now that we have our necessary libraries, we now need to read and clean the data

First we need to read the tournamentinfo.txt file as a plain text, then we need to remove the header and separator lines, so only the player records are present - Remember that the player info is given in two lines

To make this accessible to others, I added the file to my github and I am able to load it without the empty spaces or the first row that contains only dashed lines

df <- read.csv("https://raw.githubusercontent.com/silmaxk/DATA607_CUNYSPS/refs/heads/main/Project%231/tournamentinfo.txt", header = F, skip = 1, strip.white = T)
head(df)
##                                                                                          V1
## 1  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round|
## 2  Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  |
## 3 -----------------------------------------------------------------------------------------
## 4     1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## 5    ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |
## 6 -----------------------------------------------------------------------------------------

Now taking a look at the dataframe, we can see that there are still dashed lines every third line, so we can remove those as well

df <- data.frame(df[-seq(0,nrow(df),3), ], "")

head(df)
##                                                                 df..seq.0..nrow.df...3....
## 1 Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round|
## 2 Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  |
## 3    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## 4   ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |
## 5    2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|
## 6   MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |
##   X..
## 1    
## 2    
## 3    
## 4    
## 5    
## 6

Now that we removed the dashed lines, we can move onto dealing with the players information being split into two lines

There are multiple ways at handling this, but we can concatenate the two lines together, to make it one row for each players information:

# Step 4: Combine the two lines for each player into one record.
# First, split the dataframe into odd and even rows.
odd_rows <- df[(seq(nrow(df)) %% 2) == 1, ]
even_rows <- df[(seq(nrow(df)) %% 2) == 0, ]

# Combine the corresponding odd and even rows side by side.
df1 <- data.frame(cbind(odd_rows, even_rows))

# Remove any extra columns (if present) that came from the read process.
if("X...1" %in% names(df1)) df1$X...1 <- NULL
if("X.." %in% names(df1)) df1$X.. <- NULL

# Concatenate the two columns into a single character vector.
df1 <- paste(df1[,1], df1[,2])

df1
##  [1] "Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  |"
##  [2] "1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4| ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |"     
##  [3] "2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7| MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |"     
##  [4] "3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12| MI | 14959604 / R: 1384   ->1640     |N:2  |W    |B    |W    |B    |W    |B    |W    |"     
##  [5] "4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1| MI | 12616049 / R: 1716   ->1744     |N:2  |W    |B    |W    |B    |W    |B    |B    |"     
##  [6] "5 | HANSHI ZUO                      |5.5  |W  45|W  37|D  12|D  13|D   4|W  14|W  17| MI | 14601533 / R: 1655   ->1690     |N:2  |B    |W    |B    |W    |B    |W    |B    |"     
##  [7] "6 | HANSEN SONG                     |5.0  |W  34|D  29|L  11|W  35|D  10|W  27|W  21| OH | 15055204 / R: 1686   ->1687     |N:3  |W    |B    |W    |B    |B    |W    |B    |"     
##  [8] "7 | GARY DEE SWATHELL               |5.0  |W  57|W  46|W  13|W  11|L   1|W   9|L   2| MI | 11146376 / R: 1649   ->1673     |N:3  |W    |B    |W    |B    |B    |W    |W    |"     
##  [9] "8 | EZEKIEL HOUGHTON                |5.0  |W   3|W  32|L  14|L   9|W  47|W  28|W  19| MI | 15142253 / R: 1641P17->1657P24  |N:3  |B    |W    |B    |W    |B    |W    |W    |"     
## [10] "9 | STEFANO LEE                     |5.0  |W  25|L  18|W  59|W   8|W  26|L   7|W  20| ON | 14954524 / R: 1411   ->1564     |N:2  |W    |B    |W    |B    |W    |B    |B    |"     
## [11] "10 | ANVIT RAO                       |5.0  |D  16|L  19|W  55|W  31|D   6|W  25|W  18| MI | 14150362 / R: 1365   ->1544     |N:3  |W    |W    |B    |B    |W    |B    |W    |"    
## [12] "11 | CAMERON WILLIAM MC LEMAN        |4.5  |D  38|W  56|W   6|L   7|L   3|W  34|W  26| MI | 12581589 / R: 1712   ->1696     |N:3  |B    |W    |B    |W    |B    |W    |B    |"    
## [13] "12 | KENNETH J TACK                  |4.5  |W  42|W  33|D   5|W  38|H    |D   1|L   3| MI | 12681257 / R: 1663   ->1670     |N:3  |W    |B    |W    |B    |     |W    |B    |"    
## [14] "13 | TORRANCE HENRY JR               |4.5  |W  36|W  27|L   7|D   5|W  33|L   3|W  32| MI | 15082995 / R: 1666   ->1662     |N:3  |B    |W    |B    |B    |W    |W    |B    |"    
## [15] "14 | BRADLEY SHAW                    |4.5  |W  54|W  44|W   8|L   1|D  27|L   5|W  31| MI | 10131499 / R: 1610   ->1618     |N:3  |W    |B    |W    |W    |B    |B    |W    |"    
## [16] "15 | ZACHARY JAMES HOUGHTON          |4.5  |D  19|L  16|W  30|L  22|W  54|W  33|W  38| MI | 15619130 / R: 1220P13->1416P20  |N:3  |B    |B    |W    |W    |B    |B    |W    |"    
## [17] "16 | MIKE NIKITIN                    |4.0  |D  10|W  15|H    |W  39|L   2|W  36|U    | MI | 10295068 / R: 1604   ->1613     |N:3  |B    |W    |     |B    |W    |B    |     |"    
## [18] "17 | RONALD GRZEGORCZYK              |4.0  |W  48|W  41|L  26|L   2|W  23|W  22|L   5| MI | 10297702 / R: 1629   ->1610     |N:3  |W    |B    |W    |B    |W    |B    |W    |"    
## [19] "18 | DAVID SUNDEEN                   |4.0  |W  47|W   9|L   1|W  32|L  19|W  38|L  10| MI | 11342094 / R: 1600   ->1600     |N:3  |B    |W    |B    |W    |B    |W    |B    |"    
## [20] "19 | DIPANKAR ROY                    |4.0  |D  15|W  10|W  52|D  28|W  18|L   4|L   8| MI | 14862333 / R: 1564   ->1570     |N:3  |W    |B    |W    |B    |W    |W    |B    |"    
## [21] "20 | JASON ZHENG                     |4.0  |L  40|W  49|W  23|W  41|W  28|L   2|L   9| MI | 14529060 / R: 1595   ->1569     |N:4  |W    |B    |W    |B    |W    |B    |W    |"    
## [22] "21 | DINH DANG BUI                   |4.0  |W  43|L   1|W  47|L   3|W  40|W  39|L   6| ON | 15495066 / R: 1563P22->1562     |N:3  |B    |W    |B    |W    |W    |B    |W    |"    
## [23] "22 | EUGENE L MCCLURE                |4.0  |W  64|D  52|L  28|W  15|H    |L  17|W  40| MI | 12405534 / R: 1555   ->1529     |N:4  |W    |B    |W    |B    |     |W    |B    |"    
## [24] "23 | ALAN BUI                        |4.0  |L   4|W  43|L  20|W  58|L  17|W  37|W  46| ON | 15030142 / R: 1363   ->1371     |     |B    |W    |B    |W    |B    |W    |B    |"    
## [25] "24 | MICHAEL R ALDRICH               |4.0  |L  28|L  47|W  43|L  25|W  60|W  44|W  39| MI | 13469010 / R: 1229   ->1300     |N:4  |B    |W    |B    |B    |W    |W    |B    |"    
## [26] "25 | LOREN SCHWIEBERT                |3.5  |L   9|W  53|L   3|W  24|D  34|L  10|W  47| MI | 12486656 / R: 1745   ->1681     |N:4  |B    |W    |B    |W    |B    |W    |B    |"    
## [27] "26 | MAX ZHU                         |3.5  |W  49|W  40|W  17|L   4|L   9|D  32|L  11| ON | 15131520 / R: 1579   ->1564     |N:4  |B    |W    |B    |W    |B    |W    |W    |"    
## [28] "27 | GAURAV GIDWANI                  |3.5  |W  51|L  13|W  46|W  37|D  14|L   6|U    | MI | 14476567 / R: 1552   ->1539     |N:4  |W    |B    |W    |B    |W    |B    |     |"    
## [29] "28 | SOFIA ADINA STANESCU-BELLU      |3.5  |W  24|D   4|W  22|D  19|L  20|L   8|D  36| MI | 14882954 / R: 1507   ->1513     |N:3  |W    |W    |B    |W    |B    |B    |W    |"    
## [30] "29 | CHIEDOZIE OKORIE                |3.5  |W  50|D   6|L  38|L  34|W  52|W  48|U    | MI | 15323285 / R: 1602P6 ->1508P12  |N:4  |B    |W    |B    |W    |W    |B    |     |"    
## [31] "30 | GEORGE AVERY JONES              |3.5  |L  52|D  64|L  15|W  55|L  31|W  61|W  50| ON | 12577178 / R: 1522   ->1444     |     |W    |B    |B    |W    |W    |B    |B    |"    
## [32] "31 | RISHI SHETTY                    |3.5  |L  58|D  55|W  64|L  10|W  30|W  50|L  14| MI | 15131618 / R: 1494   ->1444     |     |B    |W    |B    |W    |B    |W    |B    |"    
## [33] "32 | JOSHUA PHILIP MATHEWS           |3.5  |W  61|L   8|W  44|L  18|W  51|D  26|L  13| ON | 14073750 / R: 1441   ->1433     |N:4  |W    |B    |W    |B    |W    |B    |W    |"    
## [34] "33 | JADE GE                         |3.5  |W  60|L  12|W  50|D  36|L  13|L  15|W  51| MI | 14691842 / R: 1449   ->1421     |     |B    |W    |B    |W    |B    |W    |B    |"    
## [35] "34 | MICHAEL JEFFERY THOMAS          |3.5  |L   6|W  60|L  37|W  29|D  25|L  11|W  52| MI | 15051807 / R: 1399   ->1400     |     |B    |W    |B    |B    |W    |B    |W    |"    
## [36] "35 | JOSHUA DAVID LEE                |3.5  |L  46|L  38|W  56|L   6|W  57|D  52|W  48| MI | 14601397 / R: 1438   ->1392     |     |W    |W    |B    |W    |B    |B    |W    |"    
## [37] "36 | SIDDHARTH JHA                   |3.5  |L  13|W  57|W  51|D  33|H    |L  16|D  28| MI | 14773163 / R: 1355   ->1367     |N:4  |W    |B    |W    |B    |     |W    |B    |"    
## [38] "37 | AMIYATOSH PWNANANDAM            |3.5  |B    |L   5|W  34|L  27|H    |L  23|W  61| MI | 15489571 / R:  980P12->1077P17  |     |     |B    |W    |W    |     |B    |W    |"    
## [39] "38 | BRIAN LIU                       |3.0  |D  11|W  35|W  29|L  12|H    |L  18|L  15| MI | 15108523 / R: 1423   ->1439     |N:4  |W    |B    |W    |W    |     |B    |B    |"    
## [40] "39 | JOEL R HENDON                   |3.0  |L   1|W  54|W  40|L  16|W  44|L  21|L  24| MI | 12923035 / R: 1436P23->1413     |N:4  |B    |W    |B    |W    |B    |W    |W    |"    
## [41] "40 | FOREST ZHANG                    |3.0  |W  20|L  26|L  39|W  59|L  21|W  56|L  22| MI | 14892710 / R: 1348   ->1346     |     |B    |B    |W    |W    |B    |W    |W    |"    
## [42] "41 | KYLE WILLIAM MURPHY             |3.0  |W  59|L  17|W  58|L  20|X    |U    |U    | MI | 15761443 / R: 1403P5 ->1341P9   |     |B    |W    |B    |W    |     |     |     |"    
## [43] "42 | JARED GE                        |3.0  |L  12|L  50|L  57|D  60|D  61|W  64|W  56| MI | 14462326 / R: 1332   ->1256     |     |B    |W    |B    |B    |W    |W    |B    |"    
## [44] "43 | ROBERT GLEN VASEY               |3.0  |L  21|L  23|L  24|W  63|W  59|L  46|W  55| MI | 14101068 / R: 1283   ->1244     |     |W    |B    |W    |W    |B    |B    |W    |"    
## [45] "44 | JUSTIN D SCHILLING              |3.0  |B    |L  14|L  32|W  53|L  39|L  24|W  59| MI | 15323504 / R: 1199   ->1199     |     |     |W    |B    |B    |W    |B    |W    |"    
## [46] "45 | DEREK YAN                       |3.0  |L   5|L  51|D  60|L  56|W  63|D  55|W  58| MI | 15372807 / R: 1242   ->1191     |     |W    |B    |W    |B    |W    |B    |W    |"    
## [47] "46 | JACOB ALEXANDER LAVALLEY        |3.0  |W  35|L   7|L  27|L  50|W  64|W  43|L  23| MI | 15490981 / R:  377P3 ->1076P10  |     |B    |W    |B    |W    |B    |W    |W    |"    
## [48] "47 | ERIC WRIGHT                     |2.5  |L  18|W  24|L  21|W  61|L   8|D  51|L  25| MI | 12533115 / R: 1362   ->1341     |     |W    |B    |W    |B    |W    |B    |W    |"    
## [49] "48 | DANIEL KHAIN                    |2.5  |L  17|W  63|H    |D  52|H    |L  29|L  35| MI | 14369165 / R: 1382   ->1335     |     |B    |W    |     |B    |     |W    |B    |"    
## [50] "49 | MICHAEL J MARTIN                |2.5  |L  26|L  20|D  63|D  64|W  58|H    |U    | MI | 12531685 / R: 1291P12->1259P17  |     |W    |W    |B    |W    |B    |     |     |"    
## [51] "50 | SHIVAM JHA                      |2.5  |L  29|W  42|L  33|W  46|H    |L  31|L  30| MI | 14773178 / R: 1056   ->1111     |     |W    |B    |W    |B    |     |B    |W    |"    
## [52] "51 | TEJAS AYYAGARI                  |2.5  |L  27|W  45|L  36|W  57|L  32|D  47|L  33| MI | 15205474 / R: 1011   ->1097     |     |B    |W    |B    |W    |B    |W    |W    |"    
## [53] "52 | ETHAN GUO                       |2.5  |W  30|D  22|L  19|D  48|L  29|D  35|L  34| MI | 14918803 / R:  935   ->1092     |N:4  |B    |W    |B    |W    |B    |W    |B    |"    
## [54] "53 | JOSE C YBARRA                   |2.0  |H    |L  25|H    |L  44|U    |W  57|U    | MI | 12578849 / R: 1393   ->1359     |     |     |B    |     |W    |     |W    |     |"    
## [55] "54 | LARRY HODGE                     |2.0  |L  14|L  39|L  61|B    |L  15|L  59|W  64| MI | 12836773 / R: 1270   ->1200     |     |B    |B    |W    |     |W    |B    |W    |"    
## [56] "55 | ALEX KONG                       |2.0  |L  62|D  31|L  10|L  30|B    |D  45|L  43| MI | 15412571 / R: 1186   ->1163     |     |W    |B    |W    |B    |     |W    |B    |"    
## [57] "56 | MARISA RICCI                    |2.0  |H    |L  11|L  35|W  45|H    |L  40|L  42| MI | 14679887 / R: 1153   ->1140     |     |     |B    |W    |W    |     |B    |W    |"    
## [58] "57 | MICHAEL LU                      |2.0  |L   7|L  36|W  42|L  51|L  35|L  53|B    | MI | 15113330 / R: 1092   ->1079     |     |B    |W    |W    |B    |W    |B    |     |"    
## [59] "58 | VIRAJ MOHILE                    |2.0  |W  31|L   2|L  41|L  23|L  49|B    |L  45| MI | 14700365 / R:  917   -> 941     |     |W    |B    |W    |B    |W    |     |B    |"    
## [60] "59 | SEAN M MC CORMICK               |2.0  |L  41|B    |L   9|L  40|L  43|W  54|L  44| MI | 12841036 / R:  853   -> 878     |     |W    |     |B    |B    |W    |W    |B    |"    
## [61] "60 | JULIA SHEN                      |1.5  |L  33|L  34|D  45|D  42|L  24|H    |U    | MI | 14579262 / R:  967   -> 984     |     |W    |B    |B    |W    |B    |     |     |"    
## [62] "61 | JEZZEL FARKAS                   |1.5  |L  32|L   3|W  54|L  47|D  42|L  30|L  37| ON | 15771592 / R:  955P11-> 979P18  |     |B    |W    |B    |W    |B    |W    |B    |"    
## [63] "62 | ASHWIN BALAJI                   |1.0  |W  55|U    |U    |U    |U    |U    |U    | MI | 15219542 / R: 1530   ->1535     |     |B    |     |     |     |     |     |     |"    
## [64] "63 | THOMAS JOSEPH HOSMER            |1.0  |L   2|L  48|D  49|L  43|L  45|H    |U    | MI | 15057092 / R: 1175   ->1125     |     |W    |B    |W    |B    |B    |     |     |"    
## [65] "64 | BEN LI                          |1.0  |L  22|D  30|L  31|D  49|L  46|L  42|L  54| MI | 15006561 / R: 1163   ->1112     |     |B    |W    |W    |B    |W    |B    |B    |"

Now with the player informations all being in a single row, we can now move onto extracting the player information

Looking at the dataframe, we can see that each field in each record is separated by the “|” character. Using this information we can split each string and trim the white space

Also it seems like the original header is taking up space as the first row, which can hinder our results, so we need to remove those as well

df1 <- df1[-1]
#split each record by the pipe symbol
split_data <- str_split(df1, "\\|")

#create a data frame extracting the fields we need.
players <- data.frame(
  pair = as.numeric(trimws(sapply(split_data, `[`, 1))),
  name = trimws(sapply(split_data, `[`, 2)),
  total_points = as.numeric(trimws(sapply(split_data, `[`, 3))),
  state = trimws(sapply(split_data, `[`, 11)),
  rating_info = trimws(sapply(split_data, `[`, 12)),
  stringsAsFactors = FALSE
)

players
##    pair                       name total_points state
## 1     1                   GARY HUA          6.0    ON
## 2     2            DAKSHESH DARURI          6.0    MI
## 3     3               ADITYA BAJAJ          6.0    MI
## 4     4        PATRICK H SCHILLING          5.5    MI
## 5     5                 HANSHI ZUO          5.5    MI
## 6     6                HANSEN SONG          5.0    OH
## 7     7          GARY DEE SWATHELL          5.0    MI
## 8     8           EZEKIEL HOUGHTON          5.0    MI
## 9     9                STEFANO LEE          5.0    ON
## 10   10                  ANVIT RAO          5.0    MI
## 11   11   CAMERON WILLIAM MC LEMAN          4.5    MI
## 12   12             KENNETH J TACK          4.5    MI
## 13   13          TORRANCE HENRY JR          4.5    MI
## 14   14               BRADLEY SHAW          4.5    MI
## 15   15     ZACHARY JAMES HOUGHTON          4.5    MI
## 16   16               MIKE NIKITIN          4.0    MI
## 17   17         RONALD GRZEGORCZYK          4.0    MI
## 18   18              DAVID SUNDEEN          4.0    MI
## 19   19               DIPANKAR ROY          4.0    MI
## 20   20                JASON ZHENG          4.0    MI
## 21   21              DINH DANG BUI          4.0    ON
## 22   22           EUGENE L MCCLURE          4.0    MI
## 23   23                   ALAN BUI          4.0    ON
## 24   24          MICHAEL R ALDRICH          4.0    MI
## 25   25           LOREN SCHWIEBERT          3.5    MI
## 26   26                    MAX ZHU          3.5    ON
## 27   27             GAURAV GIDWANI          3.5    MI
## 28   28 SOFIA ADINA STANESCU-BELLU          3.5    MI
## 29   29           CHIEDOZIE OKORIE          3.5    MI
## 30   30         GEORGE AVERY JONES          3.5    ON
## 31   31               RISHI SHETTY          3.5    MI
## 32   32      JOSHUA PHILIP MATHEWS          3.5    ON
## 33   33                    JADE GE          3.5    MI
## 34   34     MICHAEL JEFFERY THOMAS          3.5    MI
## 35   35           JOSHUA DAVID LEE          3.5    MI
## 36   36              SIDDHARTH JHA          3.5    MI
## 37   37       AMIYATOSH PWNANANDAM          3.5    MI
## 38   38                  BRIAN LIU          3.0    MI
## 39   39              JOEL R HENDON          3.0    MI
## 40   40               FOREST ZHANG          3.0    MI
## 41   41        KYLE WILLIAM MURPHY          3.0    MI
## 42   42                   JARED GE          3.0    MI
## 43   43          ROBERT GLEN VASEY          3.0    MI
## 44   44         JUSTIN D SCHILLING          3.0    MI
## 45   45                  DEREK YAN          3.0    MI
## 46   46   JACOB ALEXANDER LAVALLEY          3.0    MI
## 47   47                ERIC WRIGHT          2.5    MI
## 48   48               DANIEL KHAIN          2.5    MI
## 49   49           MICHAEL J MARTIN          2.5    MI
## 50   50                 SHIVAM JHA          2.5    MI
## 51   51             TEJAS AYYAGARI          2.5    MI
## 52   52                  ETHAN GUO          2.5    MI
## 53   53              JOSE C YBARRA          2.0    MI
## 54   54                LARRY HODGE          2.0    MI
## 55   55                  ALEX KONG          2.0    MI
## 56   56               MARISA RICCI          2.0    MI
## 57   57                 MICHAEL LU          2.0    MI
## 58   58               VIRAJ MOHILE          2.0    MI
## 59   59          SEAN M MC CORMICK          2.0    MI
## 60   60                 JULIA SHEN          1.5    MI
## 61   61              JEZZEL FARKAS          1.5    ON
## 62   62              ASHWIN BALAJI          1.0    MI
## 63   63       THOMAS JOSEPH HOSMER          1.0    MI
## 64   64                     BEN LI          1.0    MI
##                       rating_info
## 1     15445895 / R: 1794   ->1817
## 2     14598900 / R: 1553   ->1663
## 3     14959604 / R: 1384   ->1640
## 4     12616049 / R: 1716   ->1744
## 5     14601533 / R: 1655   ->1690
## 6     15055204 / R: 1686   ->1687
## 7     11146376 / R: 1649   ->1673
## 8  15142253 / R: 1641P17->1657P24
## 9     14954524 / R: 1411   ->1564
## 10    14150362 / R: 1365   ->1544
## 11    12581589 / R: 1712   ->1696
## 12    12681257 / R: 1663   ->1670
## 13    15082995 / R: 1666   ->1662
## 14    10131499 / R: 1610   ->1618
## 15 15619130 / R: 1220P13->1416P20
## 16    10295068 / R: 1604   ->1613
## 17    10297702 / R: 1629   ->1610
## 18    11342094 / R: 1600   ->1600
## 19    14862333 / R: 1564   ->1570
## 20    14529060 / R: 1595   ->1569
## 21    15495066 / R: 1563P22->1562
## 22    12405534 / R: 1555   ->1529
## 23    15030142 / R: 1363   ->1371
## 24    13469010 / R: 1229   ->1300
## 25    12486656 / R: 1745   ->1681
## 26    15131520 / R: 1579   ->1564
## 27    14476567 / R: 1552   ->1539
## 28    14882954 / R: 1507   ->1513
## 29 15323285 / R: 1602P6 ->1508P12
## 30    12577178 / R: 1522   ->1444
## 31    15131618 / R: 1494   ->1444
## 32    14073750 / R: 1441   ->1433
## 33    14691842 / R: 1449   ->1421
## 34    15051807 / R: 1399   ->1400
## 35    14601397 / R: 1438   ->1392
## 36    14773163 / R: 1355   ->1367
## 37 15489571 / R:  980P12->1077P17
## 38    15108523 / R: 1423   ->1439
## 39    12923035 / R: 1436P23->1413
## 40    14892710 / R: 1348   ->1346
## 41  15761443 / R: 1403P5 ->1341P9
## 42    14462326 / R: 1332   ->1256
## 43    14101068 / R: 1283   ->1244
## 44    15323504 / R: 1199   ->1199
## 45    15372807 / R: 1242   ->1191
## 46 15490981 / R:  377P3 ->1076P10
## 47    12533115 / R: 1362   ->1341
## 48    14369165 / R: 1382   ->1335
## 49 12531685 / R: 1291P12->1259P17
## 50    14773178 / R: 1056   ->1111
## 51    15205474 / R: 1011   ->1097
## 52    14918803 / R:  935   ->1092
## 53    12578849 / R: 1393   ->1359
## 54    12836773 / R: 1270   ->1200
## 55    15412571 / R: 1186   ->1163
## 56    14679887 / R: 1153   ->1140
## 57    15113330 / R: 1092   ->1079
## 58    14700365 / R:  917   -> 941
## 59    12841036 / R:  853   -> 878
## 60    14579262 / R:  967   -> 984
## 61 15771592 / R:  955P11-> 979P18
## 62    15219542 / R: 1530   ->1535
## 63    15057092 / R: 1175   ->1125
## 64    15006561 / R: 1163   ->1112

By creating a new dataframe called players in this case, we were able to extract the fields that we need

Now we can move onto extracting the pre-rating from the rating information

The rating info is a little awkward since it contains text like 15445895 / R: 1794 ->1817 so we need to make sure we extract the number that comes right after “R:”

players <- players %>%
  mutate(pre_rating = as.numeric(str_match(rating_info, "R:\\s*(\\d+)")[,2]))

Using this regex method, it looks for digits after the “R:” and any optional spaces, it than converts the extracted text to numerical values

Now we can move onto extracting the opponent pair numbers from the round data

Looking at the data, we can see that the round data contains results like “W 39” or “D 12”. We only need to extract the numerical part for each round. To do this, we need to add a new list column to our data frame

#we can create a new list column by extrcating opponents IDs
players$opponents <- lapply(split_data, function(x) {
  #extract fields 4 to 10 these contain the roud results for each players
  round_fields <- x[4:10]
  #extract the numeric part for each round result
  opps <- sapply(round_fields, function(field) as.numeric(str_extract(field, "\\d+")))
  #remove any null/empty values
  opps <- opps[!is.na(opps)]
  return(opps)
})

print(players)
##    pair                       name total_points state
## 1     1                   GARY HUA          6.0    ON
## 2     2            DAKSHESH DARURI          6.0    MI
## 3     3               ADITYA BAJAJ          6.0    MI
## 4     4        PATRICK H SCHILLING          5.5    MI
## 5     5                 HANSHI ZUO          5.5    MI
## 6     6                HANSEN SONG          5.0    OH
## 7     7          GARY DEE SWATHELL          5.0    MI
## 8     8           EZEKIEL HOUGHTON          5.0    MI
## 9     9                STEFANO LEE          5.0    ON
## 10   10                  ANVIT RAO          5.0    MI
## 11   11   CAMERON WILLIAM MC LEMAN          4.5    MI
## 12   12             KENNETH J TACK          4.5    MI
## 13   13          TORRANCE HENRY JR          4.5    MI
## 14   14               BRADLEY SHAW          4.5    MI
## 15   15     ZACHARY JAMES HOUGHTON          4.5    MI
## 16   16               MIKE NIKITIN          4.0    MI
## 17   17         RONALD GRZEGORCZYK          4.0    MI
## 18   18              DAVID SUNDEEN          4.0    MI
## 19   19               DIPANKAR ROY          4.0    MI
## 20   20                JASON ZHENG          4.0    MI
## 21   21              DINH DANG BUI          4.0    ON
## 22   22           EUGENE L MCCLURE          4.0    MI
## 23   23                   ALAN BUI          4.0    ON
## 24   24          MICHAEL R ALDRICH          4.0    MI
## 25   25           LOREN SCHWIEBERT          3.5    MI
## 26   26                    MAX ZHU          3.5    ON
## 27   27             GAURAV GIDWANI          3.5    MI
## 28   28 SOFIA ADINA STANESCU-BELLU          3.5    MI
## 29   29           CHIEDOZIE OKORIE          3.5    MI
## 30   30         GEORGE AVERY JONES          3.5    ON
## 31   31               RISHI SHETTY          3.5    MI
## 32   32      JOSHUA PHILIP MATHEWS          3.5    ON
## 33   33                    JADE GE          3.5    MI
## 34   34     MICHAEL JEFFERY THOMAS          3.5    MI
## 35   35           JOSHUA DAVID LEE          3.5    MI
## 36   36              SIDDHARTH JHA          3.5    MI
## 37   37       AMIYATOSH PWNANANDAM          3.5    MI
## 38   38                  BRIAN LIU          3.0    MI
## 39   39              JOEL R HENDON          3.0    MI
## 40   40               FOREST ZHANG          3.0    MI
## 41   41        KYLE WILLIAM MURPHY          3.0    MI
## 42   42                   JARED GE          3.0    MI
## 43   43          ROBERT GLEN VASEY          3.0    MI
## 44   44         JUSTIN D SCHILLING          3.0    MI
## 45   45                  DEREK YAN          3.0    MI
## 46   46   JACOB ALEXANDER LAVALLEY          3.0    MI
## 47   47                ERIC WRIGHT          2.5    MI
## 48   48               DANIEL KHAIN          2.5    MI
## 49   49           MICHAEL J MARTIN          2.5    MI
## 50   50                 SHIVAM JHA          2.5    MI
## 51   51             TEJAS AYYAGARI          2.5    MI
## 52   52                  ETHAN GUO          2.5    MI
## 53   53              JOSE C YBARRA          2.0    MI
## 54   54                LARRY HODGE          2.0    MI
## 55   55                  ALEX KONG          2.0    MI
## 56   56               MARISA RICCI          2.0    MI
## 57   57                 MICHAEL LU          2.0    MI
## 58   58               VIRAJ MOHILE          2.0    MI
## 59   59          SEAN M MC CORMICK          2.0    MI
## 60   60                 JULIA SHEN          1.5    MI
## 61   61              JEZZEL FARKAS          1.5    ON
## 62   62              ASHWIN BALAJI          1.0    MI
## 63   63       THOMAS JOSEPH HOSMER          1.0    MI
## 64   64                     BEN LI          1.0    MI
##                       rating_info pre_rating                  opponents
## 1     15445895 / R: 1794   ->1817       1794   39, 21, 18, 14, 7, 12, 4
## 2     14598900 / R: 1553   ->1663       1553   63, 58, 4, 17, 16, 20, 7
## 3     14959604 / R: 1384   ->1640       1384  8, 61, 25, 21, 11, 13, 12
## 4     12616049 / R: 1716   ->1744       1716    23, 28, 2, 26, 5, 19, 1
## 5     14601533 / R: 1655   ->1690       1655  45, 37, 12, 13, 4, 14, 17
## 6     15055204 / R: 1686   ->1687       1686 34, 29, 11, 35, 10, 27, 21
## 7     11146376 / R: 1649   ->1673       1649    57, 46, 13, 11, 1, 9, 2
## 8  15142253 / R: 1641P17->1657P24       1641   3, 32, 14, 9, 47, 28, 19
## 9     14954524 / R: 1411   ->1564       1411   25, 18, 59, 8, 26, 7, 20
## 10    14150362 / R: 1365   ->1544       1365  16, 19, 55, 31, 6, 25, 18
## 11    12581589 / R: 1712   ->1696       1712    38, 56, 6, 7, 3, 34, 26
## 12    12681257 / R: 1663   ->1670       1663        42, 33, 5, 38, 1, 3
## 13    15082995 / R: 1666   ->1662       1666    36, 27, 7, 5, 33, 3, 32
## 14    10131499 / R: 1610   ->1618       1610    54, 44, 8, 1, 27, 5, 31
## 15 15619130 / R: 1220P13->1416P20       1220 19, 16, 30, 22, 54, 33, 38
## 16    10295068 / R: 1604   ->1613       1604          10, 15, 39, 2, 36
## 17    10297702 / R: 1629   ->1610       1629   48, 41, 26, 2, 23, 22, 5
## 18    11342094 / R: 1600   ->1600       1600   47, 9, 1, 32, 19, 38, 10
## 19    14862333 / R: 1564   ->1570       1564   15, 10, 52, 28, 18, 4, 8
## 20    14529060 / R: 1595   ->1569       1595   40, 49, 23, 41, 28, 2, 9
## 21    15495066 / R: 1563P22->1562       1563    43, 1, 47, 3, 40, 39, 6
## 22    12405534 / R: 1555   ->1529       1555     64, 52, 28, 15, 17, 40
## 23    15030142 / R: 1363   ->1371       1363  4, 43, 20, 58, 17, 37, 46
## 24    13469010 / R: 1229   ->1300       1229 28, 47, 43, 25, 60, 44, 39
## 25    12486656 / R: 1745   ->1681       1745   9, 53, 3, 24, 34, 10, 47
## 26    15131520 / R: 1579   ->1564       1579   49, 40, 17, 4, 9, 32, 11
## 27    14476567 / R: 1552   ->1539       1552      51, 13, 46, 37, 14, 6
## 28    14882954 / R: 1507   ->1513       1507   24, 4, 22, 19, 20, 8, 36
## 29 15323285 / R: 1602P6 ->1508P12       1602      50, 6, 38, 34, 52, 48
## 30    12577178 / R: 1522   ->1444       1522 52, 64, 15, 55, 31, 61, 50
## 31    15131618 / R: 1494   ->1444       1494 58, 55, 64, 10, 30, 50, 14
## 32    14073750 / R: 1441   ->1433       1441  61, 8, 44, 18, 51, 26, 13
## 33    14691842 / R: 1449   ->1421       1449 60, 12, 50, 36, 13, 15, 51
## 34    15051807 / R: 1399   ->1400       1399  6, 60, 37, 29, 25, 11, 52
## 35    14601397 / R: 1438   ->1392       1438  46, 38, 56, 6, 57, 52, 48
## 36    14773163 / R: 1355   ->1367       1355     13, 57, 51, 33, 16, 28
## 37 15489571 / R:  980P12->1077P17        980          5, 34, 27, 23, 61
## 38    15108523 / R: 1423   ->1439       1423     11, 35, 29, 12, 18, 15
## 39    12923035 / R: 1436P23->1413       1436  1, 54, 40, 16, 44, 21, 24
## 40    14892710 / R: 1348   ->1346       1348 20, 26, 39, 59, 21, 56, 22
## 41  15761443 / R: 1403P5 ->1341P9       1403             59, 17, 58, 20
## 42    14462326 / R: 1332   ->1256       1332 12, 50, 57, 60, 61, 64, 56
## 43    14101068 / R: 1283   ->1244       1283 21, 23, 24, 63, 59, 46, 55
## 44    15323504 / R: 1199   ->1199       1199     14, 32, 53, 39, 24, 59
## 45    15372807 / R: 1242   ->1191       1242  5, 51, 60, 56, 63, 55, 58
## 46 15490981 / R:  377P3 ->1076P10        377  35, 7, 27, 50, 64, 43, 23
## 47    12533115 / R: 1362   ->1341       1362  18, 24, 21, 61, 8, 51, 25
## 48    14369165 / R: 1382   ->1335       1382         17, 63, 52, 29, 35
## 49 12531685 / R: 1291P12->1259P17       1291         26, 20, 63, 64, 58
## 50    14773178 / R: 1056   ->1111       1056     29, 42, 33, 46, 31, 30
## 51    15205474 / R: 1011   ->1097       1011 27, 45, 36, 57, 32, 47, 33
## 52    14918803 / R:  935   ->1092        935 30, 22, 19, 48, 29, 35, 34
## 53    12578849 / R: 1393   ->1359       1393                 25, 44, 57
## 54    12836773 / R: 1270   ->1200       1270     14, 39, 61, 15, 59, 64
## 55    15412571 / R: 1186   ->1163       1186     62, 31, 10, 30, 45, 43
## 56    14679887 / R: 1153   ->1140       1153         11, 35, 45, 40, 42
## 57    15113330 / R: 1092   ->1079       1092      7, 36, 42, 51, 35, 53
## 58    14700365 / R:  917   -> 941        917      31, 2, 41, 23, 49, 45
## 59    12841036 / R:  853   -> 878        853      41, 9, 40, 43, 54, 44
## 60    14579262 / R:  967   -> 984        967         33, 34, 45, 42, 24
## 61 15771592 / R:  955P11-> 979P18        955  32, 3, 54, 47, 42, 30, 37
## 62    15219542 / R: 1530   ->1535       1530                         55
## 63    15057092 / R: 1175   ->1125       1175          2, 48, 49, 43, 45
## 64    15006561 / R: 1163   ->1112       1163 22, 30, 31, 49, 46, 42, 54

Now we need to ocmpute the Average Pre-Chess rating for the opponents using this information

For each player we need to look up each opponents pre rating by using their unique pair number and compute the average. We can do this by using a loop

#initialize a new column for average opponent rating
players$opp_avg <- NA_real_

#we can loop through each player
for(i in 1:nrow(players)){
  opp_ids <- players$opponents[[i]]
  if(length(opp_ids) > 0) {
    #for each opponent we need to find the corresponding pre_rating by matching the pair number
    opp_ratings <- sapply(opp_ids, function(id) {
      idx <- which(players$pair == id)
      if(length(idx) > 0) {
        return(players$pre_rating[idx])
      } else {
        return(NA_real_)
      }
    })
    #compute the average opponent pre-rating
    players$opp_avg[i] <- mean(opp_ratings, na.rm = TRUE)
  }
}

head(players)
##   pair                name total_points state                 rating_info
## 1    1            GARY HUA          6.0    ON 15445895 / R: 1794   ->1817
## 2    2     DAKSHESH DARURI          6.0    MI 14598900 / R: 1553   ->1663
## 3    3        ADITYA BAJAJ          6.0    MI 14959604 / R: 1384   ->1640
## 4    4 PATRICK H SCHILLING          5.5    MI 12616049 / R: 1716   ->1744
## 5    5          HANSHI ZUO          5.5    MI 14601533 / R: 1655   ->1690
## 6    6         HANSEN SONG          5.0    OH 15055204 / R: 1686   ->1687
##   pre_rating                  opponents  opp_avg
## 1       1794   39, 21, 18, 14, 7, 12, 4 1605.286
## 2       1553   63, 58, 4, 17, 16, 20, 7 1469.286
## 3       1384  8, 61, 25, 21, 11, 13, 12 1563.571
## 4       1716    23, 28, 2, 26, 5, 19, 1 1573.571
## 5       1655  45, 37, 12, 13, 4, 14, 17 1500.857
## 6       1686 34, 29, 11, 35, 10, 27, 21 1518.714

Now that we have all the necessary columns and calculations complete, we can now create the final output and write it to a CSV

#select the final output columns
final_output <- players %>%
  select(name, state, total_points, pre_rating, opp_avg)

#write the final output to a CSV file
write.csv(final_output, "tournament_results.csv", row.names = FALSE)

final_output
##                          name state total_points pre_rating  opp_avg
## 1                    GARY HUA    ON          6.0       1794 1605.286
## 2             DAKSHESH DARURI    MI          6.0       1553 1469.286
## 3                ADITYA BAJAJ    MI          6.0       1384 1563.571
## 4         PATRICK H SCHILLING    MI          5.5       1716 1573.571
## 5                  HANSHI ZUO    MI          5.5       1655 1500.857
## 6                 HANSEN SONG    OH          5.0       1686 1518.714
## 7           GARY DEE SWATHELL    MI          5.0       1649 1372.143
## 8            EZEKIEL HOUGHTON    MI          5.0       1641 1468.429
## 9                 STEFANO LEE    ON          5.0       1411 1523.143
## 10                  ANVIT RAO    MI          5.0       1365 1554.143
## 11   CAMERON WILLIAM MC LEMAN    MI          4.5       1712 1467.571
## 12             KENNETH J TACK    MI          4.5       1663 1506.167
## 13          TORRANCE HENRY JR    MI          4.5       1666 1497.857
## 14               BRADLEY SHAW    MI          4.5       1610 1515.000
## 15     ZACHARY JAMES HOUGHTON    MI          4.5       1220 1483.857
## 16               MIKE NIKITIN    MI          4.0       1604 1385.800
## 17         RONALD GRZEGORCZYK    MI          4.0       1629 1498.571
## 18              DAVID SUNDEEN    MI          4.0       1600 1480.000
## 19               DIPANKAR ROY    MI          4.0       1564 1426.286
## 20                JASON ZHENG    MI          4.0       1595 1410.857
## 21              DINH DANG BUI    ON          4.0       1563 1470.429
## 22           EUGENE L MCCLURE    MI          4.0       1555 1300.333
## 23                   ALAN BUI    ON          4.0       1363 1213.857
## 24          MICHAEL R ALDRICH    MI          4.0       1229 1357.000
## 25           LOREN SCHWIEBERT    MI          3.5       1745 1363.286
## 26                    MAX ZHU    ON          3.5       1579 1506.857
## 27             GAURAV GIDWANI    MI          3.5       1552 1221.667
## 28 SOFIA ADINA STANESCU-BELLU    MI          3.5       1507 1522.143
## 29           CHIEDOZIE OKORIE    MI          3.5       1602 1313.500
## 30         GEORGE AVERY JONES    ON          3.5       1522 1144.143
## 31               RISHI SHETTY    MI          3.5       1494 1259.857
## 32      JOSHUA PHILIP MATHEWS    ON          3.5       1441 1378.714
## 33                    JADE GE    MI          3.5       1449 1276.857
## 34     MICHAEL JEFFERY THOMAS    MI          3.5       1399 1375.286
## 35           JOSHUA DAVID LEE    MI          3.5       1438 1149.714
## 36              SIDDHARTH JHA    MI          3.5       1355 1388.167
## 37       AMIYATOSH PWNANANDAM    MI          3.5        980 1384.800
## 38                  BRIAN LIU    MI          3.0       1423 1539.167
## 39              JOEL R HENDON    MI          3.0       1436 1429.571
## 40               FOREST ZHANG    MI          3.0       1348 1390.571
## 41        KYLE WILLIAM MURPHY    MI          3.0       1403 1248.500
## 42                   JARED GE    MI          3.0       1332 1149.857
## 43          ROBERT GLEN VASEY    MI          3.0       1283 1106.571
## 44         JUSTIN D SCHILLING    MI          3.0       1199 1327.000
## 45                  DEREK YAN    MI          3.0       1242 1152.000
## 46   JACOB ALEXANDER LAVALLEY    MI          3.0        377 1357.714
## 47                ERIC WRIGHT    MI          2.5       1362 1392.000
## 48               DANIEL KHAIN    MI          2.5       1382 1355.800
## 49           MICHAEL J MARTIN    MI          2.5       1291 1285.800
## 50                 SHIVAM JHA    MI          2.5       1056 1296.000
## 51             TEJAS AYYAGARI    MI          2.5       1011 1356.143
## 52                  ETHAN GUO    MI          2.5        935 1494.571
## 53              JOSE C YBARRA    MI          2.0       1393 1345.333
## 54                LARRY HODGE    MI          2.0       1270 1206.167
## 55                  ALEX KONG    MI          2.0       1186 1406.000
## 56               MARISA RICCI    MI          2.0       1153 1414.400
## 57                 MICHAEL LU    MI          2.0       1092 1363.000
## 58               VIRAJ MOHILE    MI          2.0        917 1391.000
## 59          SEAN M MC CORMICK    MI          2.0        853 1319.000
## 60                 JULIA SHEN    MI          1.5        967 1330.200
## 61              JEZZEL FARKAS    ON          1.5        955 1327.286
## 62              ASHWIN BALAJI    MI          1.0       1530 1186.000
## 63       THOMAS JOSEPH HOSMER    MI          1.0       1175 1350.200
## 64                     BEN LI    MI          1.0       1163 1263.000