For this project, we are tasked with needing to read a .txt file titled (tournamentinfo.txt) that contains records for each chess player. These records are broken up to using two line for each chess player and for this project, we need to extract:
So with this new information, the first record should look like:
In order to complete these steps, we need to first import the necessary libraries in order to precess the string and to manipulate data
library(stringr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Now that we have our necessary libraries, we now need to read and clean the data
First we need to read the tournamentinfo.txt file as a plain text, then we need to remove the header and separator lines, so only the player records are present - Remember that the player info is given in two lines
To make this accessible to others, I added the file to my github and I am able to load it without the empty spaces or the first row that contains only dashed lines
df <- read.csv("https://raw.githubusercontent.com/silmaxk/DATA607_CUNYSPS/refs/heads/main/Project%231/tournamentinfo.txt", header = F, skip = 1, strip.white = T)
head(df)
## V1
## 1 Pair | Player Name |Total|Round|Round|Round|Round|Round|Round|Round|
## 2 Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
## 3 -----------------------------------------------------------------------------------------
## 4 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|
## 5 ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |
## 6 -----------------------------------------------------------------------------------------
Now taking a look at the dataframe, we can see that there are still dashed lines every third line, so we can remove those as well
df <- data.frame(df[-seq(0,nrow(df),3), ], "")
head(df)
## df..seq.0..nrow.df...3....
## 1 Pair | Player Name |Total|Round|Round|Round|Round|Round|Round|Round|
## 2 Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
## 3 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|
## 4 ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |
## 5 2 | DAKSHESH DARURI |6.0 |W 63|W 58|L 4|W 17|W 16|W 20|W 7|
## 6 MI | 14598900 / R: 1553 ->1663 |N:2 |B |W |B |W |B |W |B |
## X..
## 1
## 2
## 3
## 4
## 5
## 6
Now that we removed the dashed lines, we can move onto dealing with the players information being split into two lines
There are multiple ways at handling this, but we can concatenate the two lines together, to make it one row for each players information:
# Step 4: Combine the two lines for each player into one record.
# First, split the dataframe into odd and even rows.
odd_rows <- df[(seq(nrow(df)) %% 2) == 1, ]
even_rows <- df[(seq(nrow(df)) %% 2) == 0, ]
# Combine the corresponding odd and even rows side by side.
df1 <- data.frame(cbind(odd_rows, even_rows))
# Remove any extra columns (if present) that came from the read process.
if("X...1" %in% names(df1)) df1$X...1 <- NULL
if("X.." %in% names(df1)) df1$X.. <- NULL
# Concatenate the two columns into a single character vector.
df1 <- paste(df1[,1], df1[,2])
df1
## [1] "Pair | Player Name |Total|Round|Round|Round|Round|Round|Round|Round| Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 |"
## [2] "1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4| ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |"
## [3] "2 | DAKSHESH DARURI |6.0 |W 63|W 58|L 4|W 17|W 16|W 20|W 7| MI | 14598900 / R: 1553 ->1663 |N:2 |B |W |B |W |B |W |B |"
## [4] "3 | ADITYA BAJAJ |6.0 |L 8|W 61|W 25|W 21|W 11|W 13|W 12| MI | 14959604 / R: 1384 ->1640 |N:2 |W |B |W |B |W |B |W |"
## [5] "4 | PATRICK H SCHILLING |5.5 |W 23|D 28|W 2|W 26|D 5|W 19|D 1| MI | 12616049 / R: 1716 ->1744 |N:2 |W |B |W |B |W |B |B |"
## [6] "5 | HANSHI ZUO |5.5 |W 45|W 37|D 12|D 13|D 4|W 14|W 17| MI | 14601533 / R: 1655 ->1690 |N:2 |B |W |B |W |B |W |B |"
## [7] "6 | HANSEN SONG |5.0 |W 34|D 29|L 11|W 35|D 10|W 27|W 21| OH | 15055204 / R: 1686 ->1687 |N:3 |W |B |W |B |B |W |B |"
## [8] "7 | GARY DEE SWATHELL |5.0 |W 57|W 46|W 13|W 11|L 1|W 9|L 2| MI | 11146376 / R: 1649 ->1673 |N:3 |W |B |W |B |B |W |W |"
## [9] "8 | EZEKIEL HOUGHTON |5.0 |W 3|W 32|L 14|L 9|W 47|W 28|W 19| MI | 15142253 / R: 1641P17->1657P24 |N:3 |B |W |B |W |B |W |W |"
## [10] "9 | STEFANO LEE |5.0 |W 25|L 18|W 59|W 8|W 26|L 7|W 20| ON | 14954524 / R: 1411 ->1564 |N:2 |W |B |W |B |W |B |B |"
## [11] "10 | ANVIT RAO |5.0 |D 16|L 19|W 55|W 31|D 6|W 25|W 18| MI | 14150362 / R: 1365 ->1544 |N:3 |W |W |B |B |W |B |W |"
## [12] "11 | CAMERON WILLIAM MC LEMAN |4.5 |D 38|W 56|W 6|L 7|L 3|W 34|W 26| MI | 12581589 / R: 1712 ->1696 |N:3 |B |W |B |W |B |W |B |"
## [13] "12 | KENNETH J TACK |4.5 |W 42|W 33|D 5|W 38|H |D 1|L 3| MI | 12681257 / R: 1663 ->1670 |N:3 |W |B |W |B | |W |B |"
## [14] "13 | TORRANCE HENRY JR |4.5 |W 36|W 27|L 7|D 5|W 33|L 3|W 32| MI | 15082995 / R: 1666 ->1662 |N:3 |B |W |B |B |W |W |B |"
## [15] "14 | BRADLEY SHAW |4.5 |W 54|W 44|W 8|L 1|D 27|L 5|W 31| MI | 10131499 / R: 1610 ->1618 |N:3 |W |B |W |W |B |B |W |"
## [16] "15 | ZACHARY JAMES HOUGHTON |4.5 |D 19|L 16|W 30|L 22|W 54|W 33|W 38| MI | 15619130 / R: 1220P13->1416P20 |N:3 |B |B |W |W |B |B |W |"
## [17] "16 | MIKE NIKITIN |4.0 |D 10|W 15|H |W 39|L 2|W 36|U | MI | 10295068 / R: 1604 ->1613 |N:3 |B |W | |B |W |B | |"
## [18] "17 | RONALD GRZEGORCZYK |4.0 |W 48|W 41|L 26|L 2|W 23|W 22|L 5| MI | 10297702 / R: 1629 ->1610 |N:3 |W |B |W |B |W |B |W |"
## [19] "18 | DAVID SUNDEEN |4.0 |W 47|W 9|L 1|W 32|L 19|W 38|L 10| MI | 11342094 / R: 1600 ->1600 |N:3 |B |W |B |W |B |W |B |"
## [20] "19 | DIPANKAR ROY |4.0 |D 15|W 10|W 52|D 28|W 18|L 4|L 8| MI | 14862333 / R: 1564 ->1570 |N:3 |W |B |W |B |W |W |B |"
## [21] "20 | JASON ZHENG |4.0 |L 40|W 49|W 23|W 41|W 28|L 2|L 9| MI | 14529060 / R: 1595 ->1569 |N:4 |W |B |W |B |W |B |W |"
## [22] "21 | DINH DANG BUI |4.0 |W 43|L 1|W 47|L 3|W 40|W 39|L 6| ON | 15495066 / R: 1563P22->1562 |N:3 |B |W |B |W |W |B |W |"
## [23] "22 | EUGENE L MCCLURE |4.0 |W 64|D 52|L 28|W 15|H |L 17|W 40| MI | 12405534 / R: 1555 ->1529 |N:4 |W |B |W |B | |W |B |"
## [24] "23 | ALAN BUI |4.0 |L 4|W 43|L 20|W 58|L 17|W 37|W 46| ON | 15030142 / R: 1363 ->1371 | |B |W |B |W |B |W |B |"
## [25] "24 | MICHAEL R ALDRICH |4.0 |L 28|L 47|W 43|L 25|W 60|W 44|W 39| MI | 13469010 / R: 1229 ->1300 |N:4 |B |W |B |B |W |W |B |"
## [26] "25 | LOREN SCHWIEBERT |3.5 |L 9|W 53|L 3|W 24|D 34|L 10|W 47| MI | 12486656 / R: 1745 ->1681 |N:4 |B |W |B |W |B |W |B |"
## [27] "26 | MAX ZHU |3.5 |W 49|W 40|W 17|L 4|L 9|D 32|L 11| ON | 15131520 / R: 1579 ->1564 |N:4 |B |W |B |W |B |W |W |"
## [28] "27 | GAURAV GIDWANI |3.5 |W 51|L 13|W 46|W 37|D 14|L 6|U | MI | 14476567 / R: 1552 ->1539 |N:4 |W |B |W |B |W |B | |"
## [29] "28 | SOFIA ADINA STANESCU-BELLU |3.5 |W 24|D 4|W 22|D 19|L 20|L 8|D 36| MI | 14882954 / R: 1507 ->1513 |N:3 |W |W |B |W |B |B |W |"
## [30] "29 | CHIEDOZIE OKORIE |3.5 |W 50|D 6|L 38|L 34|W 52|W 48|U | MI | 15323285 / R: 1602P6 ->1508P12 |N:4 |B |W |B |W |W |B | |"
## [31] "30 | GEORGE AVERY JONES |3.5 |L 52|D 64|L 15|W 55|L 31|W 61|W 50| ON | 12577178 / R: 1522 ->1444 | |W |B |B |W |W |B |B |"
## [32] "31 | RISHI SHETTY |3.5 |L 58|D 55|W 64|L 10|W 30|W 50|L 14| MI | 15131618 / R: 1494 ->1444 | |B |W |B |W |B |W |B |"
## [33] "32 | JOSHUA PHILIP MATHEWS |3.5 |W 61|L 8|W 44|L 18|W 51|D 26|L 13| ON | 14073750 / R: 1441 ->1433 |N:4 |W |B |W |B |W |B |W |"
## [34] "33 | JADE GE |3.5 |W 60|L 12|W 50|D 36|L 13|L 15|W 51| MI | 14691842 / R: 1449 ->1421 | |B |W |B |W |B |W |B |"
## [35] "34 | MICHAEL JEFFERY THOMAS |3.5 |L 6|W 60|L 37|W 29|D 25|L 11|W 52| MI | 15051807 / R: 1399 ->1400 | |B |W |B |B |W |B |W |"
## [36] "35 | JOSHUA DAVID LEE |3.5 |L 46|L 38|W 56|L 6|W 57|D 52|W 48| MI | 14601397 / R: 1438 ->1392 | |W |W |B |W |B |B |W |"
## [37] "36 | SIDDHARTH JHA |3.5 |L 13|W 57|W 51|D 33|H |L 16|D 28| MI | 14773163 / R: 1355 ->1367 |N:4 |W |B |W |B | |W |B |"
## [38] "37 | AMIYATOSH PWNANANDAM |3.5 |B |L 5|W 34|L 27|H |L 23|W 61| MI | 15489571 / R: 980P12->1077P17 | | |B |W |W | |B |W |"
## [39] "38 | BRIAN LIU |3.0 |D 11|W 35|W 29|L 12|H |L 18|L 15| MI | 15108523 / R: 1423 ->1439 |N:4 |W |B |W |W | |B |B |"
## [40] "39 | JOEL R HENDON |3.0 |L 1|W 54|W 40|L 16|W 44|L 21|L 24| MI | 12923035 / R: 1436P23->1413 |N:4 |B |W |B |W |B |W |W |"
## [41] "40 | FOREST ZHANG |3.0 |W 20|L 26|L 39|W 59|L 21|W 56|L 22| MI | 14892710 / R: 1348 ->1346 | |B |B |W |W |B |W |W |"
## [42] "41 | KYLE WILLIAM MURPHY |3.0 |W 59|L 17|W 58|L 20|X |U |U | MI | 15761443 / R: 1403P5 ->1341P9 | |B |W |B |W | | | |"
## [43] "42 | JARED GE |3.0 |L 12|L 50|L 57|D 60|D 61|W 64|W 56| MI | 14462326 / R: 1332 ->1256 | |B |W |B |B |W |W |B |"
## [44] "43 | ROBERT GLEN VASEY |3.0 |L 21|L 23|L 24|W 63|W 59|L 46|W 55| MI | 14101068 / R: 1283 ->1244 | |W |B |W |W |B |B |W |"
## [45] "44 | JUSTIN D SCHILLING |3.0 |B |L 14|L 32|W 53|L 39|L 24|W 59| MI | 15323504 / R: 1199 ->1199 | | |W |B |B |W |B |W |"
## [46] "45 | DEREK YAN |3.0 |L 5|L 51|D 60|L 56|W 63|D 55|W 58| MI | 15372807 / R: 1242 ->1191 | |W |B |W |B |W |B |W |"
## [47] "46 | JACOB ALEXANDER LAVALLEY |3.0 |W 35|L 7|L 27|L 50|W 64|W 43|L 23| MI | 15490981 / R: 377P3 ->1076P10 | |B |W |B |W |B |W |W |"
## [48] "47 | ERIC WRIGHT |2.5 |L 18|W 24|L 21|W 61|L 8|D 51|L 25| MI | 12533115 / R: 1362 ->1341 | |W |B |W |B |W |B |W |"
## [49] "48 | DANIEL KHAIN |2.5 |L 17|W 63|H |D 52|H |L 29|L 35| MI | 14369165 / R: 1382 ->1335 | |B |W | |B | |W |B |"
## [50] "49 | MICHAEL J MARTIN |2.5 |L 26|L 20|D 63|D 64|W 58|H |U | MI | 12531685 / R: 1291P12->1259P17 | |W |W |B |W |B | | |"
## [51] "50 | SHIVAM JHA |2.5 |L 29|W 42|L 33|W 46|H |L 31|L 30| MI | 14773178 / R: 1056 ->1111 | |W |B |W |B | |B |W |"
## [52] "51 | TEJAS AYYAGARI |2.5 |L 27|W 45|L 36|W 57|L 32|D 47|L 33| MI | 15205474 / R: 1011 ->1097 | |B |W |B |W |B |W |W |"
## [53] "52 | ETHAN GUO |2.5 |W 30|D 22|L 19|D 48|L 29|D 35|L 34| MI | 14918803 / R: 935 ->1092 |N:4 |B |W |B |W |B |W |B |"
## [54] "53 | JOSE C YBARRA |2.0 |H |L 25|H |L 44|U |W 57|U | MI | 12578849 / R: 1393 ->1359 | | |B | |W | |W | |"
## [55] "54 | LARRY HODGE |2.0 |L 14|L 39|L 61|B |L 15|L 59|W 64| MI | 12836773 / R: 1270 ->1200 | |B |B |W | |W |B |W |"
## [56] "55 | ALEX KONG |2.0 |L 62|D 31|L 10|L 30|B |D 45|L 43| MI | 15412571 / R: 1186 ->1163 | |W |B |W |B | |W |B |"
## [57] "56 | MARISA RICCI |2.0 |H |L 11|L 35|W 45|H |L 40|L 42| MI | 14679887 / R: 1153 ->1140 | | |B |W |W | |B |W |"
## [58] "57 | MICHAEL LU |2.0 |L 7|L 36|W 42|L 51|L 35|L 53|B | MI | 15113330 / R: 1092 ->1079 | |B |W |W |B |W |B | |"
## [59] "58 | VIRAJ MOHILE |2.0 |W 31|L 2|L 41|L 23|L 49|B |L 45| MI | 14700365 / R: 917 -> 941 | |W |B |W |B |W | |B |"
## [60] "59 | SEAN M MC CORMICK |2.0 |L 41|B |L 9|L 40|L 43|W 54|L 44| MI | 12841036 / R: 853 -> 878 | |W | |B |B |W |W |B |"
## [61] "60 | JULIA SHEN |1.5 |L 33|L 34|D 45|D 42|L 24|H |U | MI | 14579262 / R: 967 -> 984 | |W |B |B |W |B | | |"
## [62] "61 | JEZZEL FARKAS |1.5 |L 32|L 3|W 54|L 47|D 42|L 30|L 37| ON | 15771592 / R: 955P11-> 979P18 | |B |W |B |W |B |W |B |"
## [63] "62 | ASHWIN BALAJI |1.0 |W 55|U |U |U |U |U |U | MI | 15219542 / R: 1530 ->1535 | |B | | | | | | |"
## [64] "63 | THOMAS JOSEPH HOSMER |1.0 |L 2|L 48|D 49|L 43|L 45|H |U | MI | 15057092 / R: 1175 ->1125 | |W |B |W |B |B | | |"
## [65] "64 | BEN LI |1.0 |L 22|D 30|L 31|D 49|L 46|L 42|L 54| MI | 15006561 / R: 1163 ->1112 | |B |W |W |B |W |B |B |"
Now with the player informations all being in a single row, we can now move onto extracting the player information
Looking at the dataframe, we can see that each field in each record is separated by the “|” character. Using this information we can split each string and trim the white space
Also it seems like the original header is taking up space as the first row, which can hinder our results, so we need to remove those as well
df1 <- df1[-1]
#split each record by the pipe symbol
split_data <- str_split(df1, "\\|")
#create a data frame extracting the fields we need.
players <- data.frame(
pair = as.numeric(trimws(sapply(split_data, `[`, 1))),
name = trimws(sapply(split_data, `[`, 2)),
total_points = as.numeric(trimws(sapply(split_data, `[`, 3))),
state = trimws(sapply(split_data, `[`, 11)),
rating_info = trimws(sapply(split_data, `[`, 12)),
stringsAsFactors = FALSE
)
players
## pair name total_points state
## 1 1 GARY HUA 6.0 ON
## 2 2 DAKSHESH DARURI 6.0 MI
## 3 3 ADITYA BAJAJ 6.0 MI
## 4 4 PATRICK H SCHILLING 5.5 MI
## 5 5 HANSHI ZUO 5.5 MI
## 6 6 HANSEN SONG 5.0 OH
## 7 7 GARY DEE SWATHELL 5.0 MI
## 8 8 EZEKIEL HOUGHTON 5.0 MI
## 9 9 STEFANO LEE 5.0 ON
## 10 10 ANVIT RAO 5.0 MI
## 11 11 CAMERON WILLIAM MC LEMAN 4.5 MI
## 12 12 KENNETH J TACK 4.5 MI
## 13 13 TORRANCE HENRY JR 4.5 MI
## 14 14 BRADLEY SHAW 4.5 MI
## 15 15 ZACHARY JAMES HOUGHTON 4.5 MI
## 16 16 MIKE NIKITIN 4.0 MI
## 17 17 RONALD GRZEGORCZYK 4.0 MI
## 18 18 DAVID SUNDEEN 4.0 MI
## 19 19 DIPANKAR ROY 4.0 MI
## 20 20 JASON ZHENG 4.0 MI
## 21 21 DINH DANG BUI 4.0 ON
## 22 22 EUGENE L MCCLURE 4.0 MI
## 23 23 ALAN BUI 4.0 ON
## 24 24 MICHAEL R ALDRICH 4.0 MI
## 25 25 LOREN SCHWIEBERT 3.5 MI
## 26 26 MAX ZHU 3.5 ON
## 27 27 GAURAV GIDWANI 3.5 MI
## 28 28 SOFIA ADINA STANESCU-BELLU 3.5 MI
## 29 29 CHIEDOZIE OKORIE 3.5 MI
## 30 30 GEORGE AVERY JONES 3.5 ON
## 31 31 RISHI SHETTY 3.5 MI
## 32 32 JOSHUA PHILIP MATHEWS 3.5 ON
## 33 33 JADE GE 3.5 MI
## 34 34 MICHAEL JEFFERY THOMAS 3.5 MI
## 35 35 JOSHUA DAVID LEE 3.5 MI
## 36 36 SIDDHARTH JHA 3.5 MI
## 37 37 AMIYATOSH PWNANANDAM 3.5 MI
## 38 38 BRIAN LIU 3.0 MI
## 39 39 JOEL R HENDON 3.0 MI
## 40 40 FOREST ZHANG 3.0 MI
## 41 41 KYLE WILLIAM MURPHY 3.0 MI
## 42 42 JARED GE 3.0 MI
## 43 43 ROBERT GLEN VASEY 3.0 MI
## 44 44 JUSTIN D SCHILLING 3.0 MI
## 45 45 DEREK YAN 3.0 MI
## 46 46 JACOB ALEXANDER LAVALLEY 3.0 MI
## 47 47 ERIC WRIGHT 2.5 MI
## 48 48 DANIEL KHAIN 2.5 MI
## 49 49 MICHAEL J MARTIN 2.5 MI
## 50 50 SHIVAM JHA 2.5 MI
## 51 51 TEJAS AYYAGARI 2.5 MI
## 52 52 ETHAN GUO 2.5 MI
## 53 53 JOSE C YBARRA 2.0 MI
## 54 54 LARRY HODGE 2.0 MI
## 55 55 ALEX KONG 2.0 MI
## 56 56 MARISA RICCI 2.0 MI
## 57 57 MICHAEL LU 2.0 MI
## 58 58 VIRAJ MOHILE 2.0 MI
## 59 59 SEAN M MC CORMICK 2.0 MI
## 60 60 JULIA SHEN 1.5 MI
## 61 61 JEZZEL FARKAS 1.5 ON
## 62 62 ASHWIN BALAJI 1.0 MI
## 63 63 THOMAS JOSEPH HOSMER 1.0 MI
## 64 64 BEN LI 1.0 MI
## rating_info
## 1 15445895 / R: 1794 ->1817
## 2 14598900 / R: 1553 ->1663
## 3 14959604 / R: 1384 ->1640
## 4 12616049 / R: 1716 ->1744
## 5 14601533 / R: 1655 ->1690
## 6 15055204 / R: 1686 ->1687
## 7 11146376 / R: 1649 ->1673
## 8 15142253 / R: 1641P17->1657P24
## 9 14954524 / R: 1411 ->1564
## 10 14150362 / R: 1365 ->1544
## 11 12581589 / R: 1712 ->1696
## 12 12681257 / R: 1663 ->1670
## 13 15082995 / R: 1666 ->1662
## 14 10131499 / R: 1610 ->1618
## 15 15619130 / R: 1220P13->1416P20
## 16 10295068 / R: 1604 ->1613
## 17 10297702 / R: 1629 ->1610
## 18 11342094 / R: 1600 ->1600
## 19 14862333 / R: 1564 ->1570
## 20 14529060 / R: 1595 ->1569
## 21 15495066 / R: 1563P22->1562
## 22 12405534 / R: 1555 ->1529
## 23 15030142 / R: 1363 ->1371
## 24 13469010 / R: 1229 ->1300
## 25 12486656 / R: 1745 ->1681
## 26 15131520 / R: 1579 ->1564
## 27 14476567 / R: 1552 ->1539
## 28 14882954 / R: 1507 ->1513
## 29 15323285 / R: 1602P6 ->1508P12
## 30 12577178 / R: 1522 ->1444
## 31 15131618 / R: 1494 ->1444
## 32 14073750 / R: 1441 ->1433
## 33 14691842 / R: 1449 ->1421
## 34 15051807 / R: 1399 ->1400
## 35 14601397 / R: 1438 ->1392
## 36 14773163 / R: 1355 ->1367
## 37 15489571 / R: 980P12->1077P17
## 38 15108523 / R: 1423 ->1439
## 39 12923035 / R: 1436P23->1413
## 40 14892710 / R: 1348 ->1346
## 41 15761443 / R: 1403P5 ->1341P9
## 42 14462326 / R: 1332 ->1256
## 43 14101068 / R: 1283 ->1244
## 44 15323504 / R: 1199 ->1199
## 45 15372807 / R: 1242 ->1191
## 46 15490981 / R: 377P3 ->1076P10
## 47 12533115 / R: 1362 ->1341
## 48 14369165 / R: 1382 ->1335
## 49 12531685 / R: 1291P12->1259P17
## 50 14773178 / R: 1056 ->1111
## 51 15205474 / R: 1011 ->1097
## 52 14918803 / R: 935 ->1092
## 53 12578849 / R: 1393 ->1359
## 54 12836773 / R: 1270 ->1200
## 55 15412571 / R: 1186 ->1163
## 56 14679887 / R: 1153 ->1140
## 57 15113330 / R: 1092 ->1079
## 58 14700365 / R: 917 -> 941
## 59 12841036 / R: 853 -> 878
## 60 14579262 / R: 967 -> 984
## 61 15771592 / R: 955P11-> 979P18
## 62 15219542 / R: 1530 ->1535
## 63 15057092 / R: 1175 ->1125
## 64 15006561 / R: 1163 ->1112
By creating a new dataframe called players in this case, we were able to extract the fields that we need
Now we can move onto extracting the pre-rating from the rating information
The rating info is a little awkward since it contains text like 15445895 / R: 1794 ->1817 so we need to make sure we extract the number that comes right after “R:”
players <- players %>%
mutate(pre_rating = as.numeric(str_match(rating_info, "R:\\s*(\\d+)")[,2]))
Using this regex method, it looks for digits after the “R:” and any optional spaces, it than converts the extracted text to numerical values
Now we can move onto extracting the opponent pair numbers from the round data
Looking at the data, we can see that the round data contains results like “W 39” or “D 12”. We only need to extract the numerical part for each round. To do this, we need to add a new list column to our data frame
#we can create a new list column by extrcating opponents IDs
players$opponents <- lapply(split_data, function(x) {
#extract fields 4 to 10 these contain the roud results for each players
round_fields <- x[4:10]
#extract the numeric part for each round result
opps <- sapply(round_fields, function(field) as.numeric(str_extract(field, "\\d+")))
#remove any null/empty values
opps <- opps[!is.na(opps)]
return(opps)
})
print(players)
## pair name total_points state
## 1 1 GARY HUA 6.0 ON
## 2 2 DAKSHESH DARURI 6.0 MI
## 3 3 ADITYA BAJAJ 6.0 MI
## 4 4 PATRICK H SCHILLING 5.5 MI
## 5 5 HANSHI ZUO 5.5 MI
## 6 6 HANSEN SONG 5.0 OH
## 7 7 GARY DEE SWATHELL 5.0 MI
## 8 8 EZEKIEL HOUGHTON 5.0 MI
## 9 9 STEFANO LEE 5.0 ON
## 10 10 ANVIT RAO 5.0 MI
## 11 11 CAMERON WILLIAM MC LEMAN 4.5 MI
## 12 12 KENNETH J TACK 4.5 MI
## 13 13 TORRANCE HENRY JR 4.5 MI
## 14 14 BRADLEY SHAW 4.5 MI
## 15 15 ZACHARY JAMES HOUGHTON 4.5 MI
## 16 16 MIKE NIKITIN 4.0 MI
## 17 17 RONALD GRZEGORCZYK 4.0 MI
## 18 18 DAVID SUNDEEN 4.0 MI
## 19 19 DIPANKAR ROY 4.0 MI
## 20 20 JASON ZHENG 4.0 MI
## 21 21 DINH DANG BUI 4.0 ON
## 22 22 EUGENE L MCCLURE 4.0 MI
## 23 23 ALAN BUI 4.0 ON
## 24 24 MICHAEL R ALDRICH 4.0 MI
## 25 25 LOREN SCHWIEBERT 3.5 MI
## 26 26 MAX ZHU 3.5 ON
## 27 27 GAURAV GIDWANI 3.5 MI
## 28 28 SOFIA ADINA STANESCU-BELLU 3.5 MI
## 29 29 CHIEDOZIE OKORIE 3.5 MI
## 30 30 GEORGE AVERY JONES 3.5 ON
## 31 31 RISHI SHETTY 3.5 MI
## 32 32 JOSHUA PHILIP MATHEWS 3.5 ON
## 33 33 JADE GE 3.5 MI
## 34 34 MICHAEL JEFFERY THOMAS 3.5 MI
## 35 35 JOSHUA DAVID LEE 3.5 MI
## 36 36 SIDDHARTH JHA 3.5 MI
## 37 37 AMIYATOSH PWNANANDAM 3.5 MI
## 38 38 BRIAN LIU 3.0 MI
## 39 39 JOEL R HENDON 3.0 MI
## 40 40 FOREST ZHANG 3.0 MI
## 41 41 KYLE WILLIAM MURPHY 3.0 MI
## 42 42 JARED GE 3.0 MI
## 43 43 ROBERT GLEN VASEY 3.0 MI
## 44 44 JUSTIN D SCHILLING 3.0 MI
## 45 45 DEREK YAN 3.0 MI
## 46 46 JACOB ALEXANDER LAVALLEY 3.0 MI
## 47 47 ERIC WRIGHT 2.5 MI
## 48 48 DANIEL KHAIN 2.5 MI
## 49 49 MICHAEL J MARTIN 2.5 MI
## 50 50 SHIVAM JHA 2.5 MI
## 51 51 TEJAS AYYAGARI 2.5 MI
## 52 52 ETHAN GUO 2.5 MI
## 53 53 JOSE C YBARRA 2.0 MI
## 54 54 LARRY HODGE 2.0 MI
## 55 55 ALEX KONG 2.0 MI
## 56 56 MARISA RICCI 2.0 MI
## 57 57 MICHAEL LU 2.0 MI
## 58 58 VIRAJ MOHILE 2.0 MI
## 59 59 SEAN M MC CORMICK 2.0 MI
## 60 60 JULIA SHEN 1.5 MI
## 61 61 JEZZEL FARKAS 1.5 ON
## 62 62 ASHWIN BALAJI 1.0 MI
## 63 63 THOMAS JOSEPH HOSMER 1.0 MI
## 64 64 BEN LI 1.0 MI
## rating_info pre_rating opponents
## 1 15445895 / R: 1794 ->1817 1794 39, 21, 18, 14, 7, 12, 4
## 2 14598900 / R: 1553 ->1663 1553 63, 58, 4, 17, 16, 20, 7
## 3 14959604 / R: 1384 ->1640 1384 8, 61, 25, 21, 11, 13, 12
## 4 12616049 / R: 1716 ->1744 1716 23, 28, 2, 26, 5, 19, 1
## 5 14601533 / R: 1655 ->1690 1655 45, 37, 12, 13, 4, 14, 17
## 6 15055204 / R: 1686 ->1687 1686 34, 29, 11, 35, 10, 27, 21
## 7 11146376 / R: 1649 ->1673 1649 57, 46, 13, 11, 1, 9, 2
## 8 15142253 / R: 1641P17->1657P24 1641 3, 32, 14, 9, 47, 28, 19
## 9 14954524 / R: 1411 ->1564 1411 25, 18, 59, 8, 26, 7, 20
## 10 14150362 / R: 1365 ->1544 1365 16, 19, 55, 31, 6, 25, 18
## 11 12581589 / R: 1712 ->1696 1712 38, 56, 6, 7, 3, 34, 26
## 12 12681257 / R: 1663 ->1670 1663 42, 33, 5, 38, 1, 3
## 13 15082995 / R: 1666 ->1662 1666 36, 27, 7, 5, 33, 3, 32
## 14 10131499 / R: 1610 ->1618 1610 54, 44, 8, 1, 27, 5, 31
## 15 15619130 / R: 1220P13->1416P20 1220 19, 16, 30, 22, 54, 33, 38
## 16 10295068 / R: 1604 ->1613 1604 10, 15, 39, 2, 36
## 17 10297702 / R: 1629 ->1610 1629 48, 41, 26, 2, 23, 22, 5
## 18 11342094 / R: 1600 ->1600 1600 47, 9, 1, 32, 19, 38, 10
## 19 14862333 / R: 1564 ->1570 1564 15, 10, 52, 28, 18, 4, 8
## 20 14529060 / R: 1595 ->1569 1595 40, 49, 23, 41, 28, 2, 9
## 21 15495066 / R: 1563P22->1562 1563 43, 1, 47, 3, 40, 39, 6
## 22 12405534 / R: 1555 ->1529 1555 64, 52, 28, 15, 17, 40
## 23 15030142 / R: 1363 ->1371 1363 4, 43, 20, 58, 17, 37, 46
## 24 13469010 / R: 1229 ->1300 1229 28, 47, 43, 25, 60, 44, 39
## 25 12486656 / R: 1745 ->1681 1745 9, 53, 3, 24, 34, 10, 47
## 26 15131520 / R: 1579 ->1564 1579 49, 40, 17, 4, 9, 32, 11
## 27 14476567 / R: 1552 ->1539 1552 51, 13, 46, 37, 14, 6
## 28 14882954 / R: 1507 ->1513 1507 24, 4, 22, 19, 20, 8, 36
## 29 15323285 / R: 1602P6 ->1508P12 1602 50, 6, 38, 34, 52, 48
## 30 12577178 / R: 1522 ->1444 1522 52, 64, 15, 55, 31, 61, 50
## 31 15131618 / R: 1494 ->1444 1494 58, 55, 64, 10, 30, 50, 14
## 32 14073750 / R: 1441 ->1433 1441 61, 8, 44, 18, 51, 26, 13
## 33 14691842 / R: 1449 ->1421 1449 60, 12, 50, 36, 13, 15, 51
## 34 15051807 / R: 1399 ->1400 1399 6, 60, 37, 29, 25, 11, 52
## 35 14601397 / R: 1438 ->1392 1438 46, 38, 56, 6, 57, 52, 48
## 36 14773163 / R: 1355 ->1367 1355 13, 57, 51, 33, 16, 28
## 37 15489571 / R: 980P12->1077P17 980 5, 34, 27, 23, 61
## 38 15108523 / R: 1423 ->1439 1423 11, 35, 29, 12, 18, 15
## 39 12923035 / R: 1436P23->1413 1436 1, 54, 40, 16, 44, 21, 24
## 40 14892710 / R: 1348 ->1346 1348 20, 26, 39, 59, 21, 56, 22
## 41 15761443 / R: 1403P5 ->1341P9 1403 59, 17, 58, 20
## 42 14462326 / R: 1332 ->1256 1332 12, 50, 57, 60, 61, 64, 56
## 43 14101068 / R: 1283 ->1244 1283 21, 23, 24, 63, 59, 46, 55
## 44 15323504 / R: 1199 ->1199 1199 14, 32, 53, 39, 24, 59
## 45 15372807 / R: 1242 ->1191 1242 5, 51, 60, 56, 63, 55, 58
## 46 15490981 / R: 377P3 ->1076P10 377 35, 7, 27, 50, 64, 43, 23
## 47 12533115 / R: 1362 ->1341 1362 18, 24, 21, 61, 8, 51, 25
## 48 14369165 / R: 1382 ->1335 1382 17, 63, 52, 29, 35
## 49 12531685 / R: 1291P12->1259P17 1291 26, 20, 63, 64, 58
## 50 14773178 / R: 1056 ->1111 1056 29, 42, 33, 46, 31, 30
## 51 15205474 / R: 1011 ->1097 1011 27, 45, 36, 57, 32, 47, 33
## 52 14918803 / R: 935 ->1092 935 30, 22, 19, 48, 29, 35, 34
## 53 12578849 / R: 1393 ->1359 1393 25, 44, 57
## 54 12836773 / R: 1270 ->1200 1270 14, 39, 61, 15, 59, 64
## 55 15412571 / R: 1186 ->1163 1186 62, 31, 10, 30, 45, 43
## 56 14679887 / R: 1153 ->1140 1153 11, 35, 45, 40, 42
## 57 15113330 / R: 1092 ->1079 1092 7, 36, 42, 51, 35, 53
## 58 14700365 / R: 917 -> 941 917 31, 2, 41, 23, 49, 45
## 59 12841036 / R: 853 -> 878 853 41, 9, 40, 43, 54, 44
## 60 14579262 / R: 967 -> 984 967 33, 34, 45, 42, 24
## 61 15771592 / R: 955P11-> 979P18 955 32, 3, 54, 47, 42, 30, 37
## 62 15219542 / R: 1530 ->1535 1530 55
## 63 15057092 / R: 1175 ->1125 1175 2, 48, 49, 43, 45
## 64 15006561 / R: 1163 ->1112 1163 22, 30, 31, 49, 46, 42, 54
Now we need to ocmpute the Average Pre-Chess rating for the opponents using this information
For each player we need to look up each opponents pre rating by using their unique pair number and compute the average. We can do this by using a loop
#initialize a new column for average opponent rating
players$opp_avg <- NA_real_
#we can loop through each player
for(i in 1:nrow(players)){
opp_ids <- players$opponents[[i]]
if(length(opp_ids) > 0) {
#for each opponent we need to find the corresponding pre_rating by matching the pair number
opp_ratings <- sapply(opp_ids, function(id) {
idx <- which(players$pair == id)
if(length(idx) > 0) {
return(players$pre_rating[idx])
} else {
return(NA_real_)
}
})
#compute the average opponent pre-rating
players$opp_avg[i] <- mean(opp_ratings, na.rm = TRUE)
}
}
head(players)
## pair name total_points state rating_info
## 1 1 GARY HUA 6.0 ON 15445895 / R: 1794 ->1817
## 2 2 DAKSHESH DARURI 6.0 MI 14598900 / R: 1553 ->1663
## 3 3 ADITYA BAJAJ 6.0 MI 14959604 / R: 1384 ->1640
## 4 4 PATRICK H SCHILLING 5.5 MI 12616049 / R: 1716 ->1744
## 5 5 HANSHI ZUO 5.5 MI 14601533 / R: 1655 ->1690
## 6 6 HANSEN SONG 5.0 OH 15055204 / R: 1686 ->1687
## pre_rating opponents opp_avg
## 1 1794 39, 21, 18, 14, 7, 12, 4 1605.286
## 2 1553 63, 58, 4, 17, 16, 20, 7 1469.286
## 3 1384 8, 61, 25, 21, 11, 13, 12 1563.571
## 4 1716 23, 28, 2, 26, 5, 19, 1 1573.571
## 5 1655 45, 37, 12, 13, 4, 14, 17 1500.857
## 6 1686 34, 29, 11, 35, 10, 27, 21 1518.714
Now that we have all the necessary columns and calculations complete, we can now create the final output and write it to a CSV
#select the final output columns
final_output <- players %>%
select(name, state, total_points, pre_rating, opp_avg)
#write the final output to a CSV file
write.csv(final_output, "tournament_results.csv", row.names = FALSE)
final_output
## name state total_points pre_rating opp_avg
## 1 GARY HUA ON 6.0 1794 1605.286
## 2 DAKSHESH DARURI MI 6.0 1553 1469.286
## 3 ADITYA BAJAJ MI 6.0 1384 1563.571
## 4 PATRICK H SCHILLING MI 5.5 1716 1573.571
## 5 HANSHI ZUO MI 5.5 1655 1500.857
## 6 HANSEN SONG OH 5.0 1686 1518.714
## 7 GARY DEE SWATHELL MI 5.0 1649 1372.143
## 8 EZEKIEL HOUGHTON MI 5.0 1641 1468.429
## 9 STEFANO LEE ON 5.0 1411 1523.143
## 10 ANVIT RAO MI 5.0 1365 1554.143
## 11 CAMERON WILLIAM MC LEMAN MI 4.5 1712 1467.571
## 12 KENNETH J TACK MI 4.5 1663 1506.167
## 13 TORRANCE HENRY JR MI 4.5 1666 1497.857
## 14 BRADLEY SHAW MI 4.5 1610 1515.000
## 15 ZACHARY JAMES HOUGHTON MI 4.5 1220 1483.857
## 16 MIKE NIKITIN MI 4.0 1604 1385.800
## 17 RONALD GRZEGORCZYK MI 4.0 1629 1498.571
## 18 DAVID SUNDEEN MI 4.0 1600 1480.000
## 19 DIPANKAR ROY MI 4.0 1564 1426.286
## 20 JASON ZHENG MI 4.0 1595 1410.857
## 21 DINH DANG BUI ON 4.0 1563 1470.429
## 22 EUGENE L MCCLURE MI 4.0 1555 1300.333
## 23 ALAN BUI ON 4.0 1363 1213.857
## 24 MICHAEL R ALDRICH MI 4.0 1229 1357.000
## 25 LOREN SCHWIEBERT MI 3.5 1745 1363.286
## 26 MAX ZHU ON 3.5 1579 1506.857
## 27 GAURAV GIDWANI MI 3.5 1552 1221.667
## 28 SOFIA ADINA STANESCU-BELLU MI 3.5 1507 1522.143
## 29 CHIEDOZIE OKORIE MI 3.5 1602 1313.500
## 30 GEORGE AVERY JONES ON 3.5 1522 1144.143
## 31 RISHI SHETTY MI 3.5 1494 1259.857
## 32 JOSHUA PHILIP MATHEWS ON 3.5 1441 1378.714
## 33 JADE GE MI 3.5 1449 1276.857
## 34 MICHAEL JEFFERY THOMAS MI 3.5 1399 1375.286
## 35 JOSHUA DAVID LEE MI 3.5 1438 1149.714
## 36 SIDDHARTH JHA MI 3.5 1355 1388.167
## 37 AMIYATOSH PWNANANDAM MI 3.5 980 1384.800
## 38 BRIAN LIU MI 3.0 1423 1539.167
## 39 JOEL R HENDON MI 3.0 1436 1429.571
## 40 FOREST ZHANG MI 3.0 1348 1390.571
## 41 KYLE WILLIAM MURPHY MI 3.0 1403 1248.500
## 42 JARED GE MI 3.0 1332 1149.857
## 43 ROBERT GLEN VASEY MI 3.0 1283 1106.571
## 44 JUSTIN D SCHILLING MI 3.0 1199 1327.000
## 45 DEREK YAN MI 3.0 1242 1152.000
## 46 JACOB ALEXANDER LAVALLEY MI 3.0 377 1357.714
## 47 ERIC WRIGHT MI 2.5 1362 1392.000
## 48 DANIEL KHAIN MI 2.5 1382 1355.800
## 49 MICHAEL J MARTIN MI 2.5 1291 1285.800
## 50 SHIVAM JHA MI 2.5 1056 1296.000
## 51 TEJAS AYYAGARI MI 2.5 1011 1356.143
## 52 ETHAN GUO MI 2.5 935 1494.571
## 53 JOSE C YBARRA MI 2.0 1393 1345.333
## 54 LARRY HODGE MI 2.0 1270 1206.167
## 55 ALEX KONG MI 2.0 1186 1406.000
## 56 MARISA RICCI MI 2.0 1153 1414.400
## 57 MICHAEL LU MI 2.0 1092 1363.000
## 58 VIRAJ MOHILE MI 2.0 917 1391.000
## 59 SEAN M MC CORMICK MI 2.0 853 1319.000
## 60 JULIA SHEN MI 1.5 967 1330.200
## 61 JEZZEL FARKAS ON 1.5 955 1327.286
## 62 ASHWIN BALAJI MI 1.0 1530 1186.000
## 63 THOMAS JOSEPH HOSMER MI 1.0 1175 1350.200
## 64 BEN LI MI 1.0 1163 1263.000