This is the R Markdown document for my 607 project 1. I had to take in txt file (tournamentinfo.txt) of chess tournament data and make a new csv with 5 columns using the info in the txt file:
In order to accomplish this, I saved the 5 requested columns into separate vectors and combined them into a dataframe later. To get the needed data for the vectors, I heavily used str_extract and regex/pattern making.
Within the used regexs, the way the data was saved (with | in between columns) was a big help in targeting specific sections of the string. Working with str_extract gave a lot of NA values as well that needed to get removed with !is.na().
library(stringr)
library(readr)
First step was to load in the txt file, tournamentinfo.txt with the needed data. Below is what the raw data from that file looked like:
txt_data <- paste(readLines("tournamentinfo.txt"))
## Warning in readLines("tournamentinfo.txt"): incomplete final line found on
## 'tournamentinfo.txt'
txt_data
## [1] "-----------------------------------------------------------------------------------------"
## [2] " Pair | Player Name |Total|Round|Round|Round|Round|Round|Round|Round| "
## [3] " Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 | "
## [4] "-----------------------------------------------------------------------------------------"
## [5] " 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|"
## [6] " ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |"
## [7] "-----------------------------------------------------------------------------------------"
## [8] " 2 | DAKSHESH DARURI |6.0 |W 63|W 58|L 4|W 17|W 16|W 20|W 7|"
## [9] " MI | 14598900 / R: 1553 ->1663 |N:2 |B |W |B |W |B |W |B |"
## [10] "-----------------------------------------------------------------------------------------"
## [11] " 3 | ADITYA BAJAJ |6.0 |L 8|W 61|W 25|W 21|W 11|W 13|W 12|"
## [12] " MI | 14959604 / R: 1384 ->1640 |N:2 |W |B |W |B |W |B |W |"
## [13] "-----------------------------------------------------------------------------------------"
## [14] " 4 | PATRICK H SCHILLING |5.5 |W 23|D 28|W 2|W 26|D 5|W 19|D 1|"
## [15] " MI | 12616049 / R: 1716 ->1744 |N:2 |W |B |W |B |W |B |B |"
## [16] "-----------------------------------------------------------------------------------------"
## [17] " 5 | HANSHI ZUO |5.5 |W 45|W 37|D 12|D 13|D 4|W 14|W 17|"
## [18] " MI | 14601533 / R: 1655 ->1690 |N:2 |B |W |B |W |B |W |B |"
## [19] "-----------------------------------------------------------------------------------------"
## [20] " 6 | HANSEN SONG |5.0 |W 34|D 29|L 11|W 35|D 10|W 27|W 21|"
## [21] " OH | 15055204 / R: 1686 ->1687 |N:3 |W |B |W |B |B |W |B |"
## [22] "-----------------------------------------------------------------------------------------"
## [23] " 7 | GARY DEE SWATHELL |5.0 |W 57|W 46|W 13|W 11|L 1|W 9|L 2|"
## [24] " MI | 11146376 / R: 1649 ->1673 |N:3 |W |B |W |B |B |W |W |"
## [25] "-----------------------------------------------------------------------------------------"
## [26] " 8 | EZEKIEL HOUGHTON |5.0 |W 3|W 32|L 14|L 9|W 47|W 28|W 19|"
## [27] " MI | 15142253 / R: 1641P17->1657P24 |N:3 |B |W |B |W |B |W |W |"
## [28] "-----------------------------------------------------------------------------------------"
## [29] " 9 | STEFANO LEE |5.0 |W 25|L 18|W 59|W 8|W 26|L 7|W 20|"
## [30] " ON | 14954524 / R: 1411 ->1564 |N:2 |W |B |W |B |W |B |B |"
## [31] "-----------------------------------------------------------------------------------------"
## [32] " 10 | ANVIT RAO |5.0 |D 16|L 19|W 55|W 31|D 6|W 25|W 18|"
## [33] " MI | 14150362 / R: 1365 ->1544 |N:3 |W |W |B |B |W |B |W |"
## [34] "-----------------------------------------------------------------------------------------"
## [35] " 11 | CAMERON WILLIAM MC LEMAN |4.5 |D 38|W 56|W 6|L 7|L 3|W 34|W 26|"
## [36] " MI | 12581589 / R: 1712 ->1696 |N:3 |B |W |B |W |B |W |B |"
## [37] "-----------------------------------------------------------------------------------------"
## [38] " 12 | KENNETH J TACK |4.5 |W 42|W 33|D 5|W 38|H |D 1|L 3|"
## [39] " MI | 12681257 / R: 1663 ->1670 |N:3 |W |B |W |B | |W |B |"
## [40] "-----------------------------------------------------------------------------------------"
## [41] " 13 | TORRANCE HENRY JR |4.5 |W 36|W 27|L 7|D 5|W 33|L 3|W 32|"
## [42] " MI | 15082995 / R: 1666 ->1662 |N:3 |B |W |B |B |W |W |B |"
## [43] "-----------------------------------------------------------------------------------------"
## [44] " 14 | BRADLEY SHAW |4.5 |W 54|W 44|W 8|L 1|D 27|L 5|W 31|"
## [45] " MI | 10131499 / R: 1610 ->1618 |N:3 |W |B |W |W |B |B |W |"
## [46] "-----------------------------------------------------------------------------------------"
## [47] " 15 | ZACHARY JAMES HOUGHTON |4.5 |D 19|L 16|W 30|L 22|W 54|W 33|W 38|"
## [48] " MI | 15619130 / R: 1220P13->1416P20 |N:3 |B |B |W |W |B |B |W |"
## [49] "-----------------------------------------------------------------------------------------"
## [50] " 16 | MIKE NIKITIN |4.0 |D 10|W 15|H |W 39|L 2|W 36|U |"
## [51] " MI | 10295068 / R: 1604 ->1613 |N:3 |B |W | |B |W |B | |"
## [52] "-----------------------------------------------------------------------------------------"
## [53] " 17 | RONALD GRZEGORCZYK |4.0 |W 48|W 41|L 26|L 2|W 23|W 22|L 5|"
## [54] " MI | 10297702 / R: 1629 ->1610 |N:3 |W |B |W |B |W |B |W |"
## [55] "-----------------------------------------------------------------------------------------"
## [56] " 18 | DAVID SUNDEEN |4.0 |W 47|W 9|L 1|W 32|L 19|W 38|L 10|"
## [57] " MI | 11342094 / R: 1600 ->1600 |N:3 |B |W |B |W |B |W |B |"
## [58] "-----------------------------------------------------------------------------------------"
## [59] " 19 | DIPANKAR ROY |4.0 |D 15|W 10|W 52|D 28|W 18|L 4|L 8|"
## [60] " MI | 14862333 / R: 1564 ->1570 |N:3 |W |B |W |B |W |W |B |"
## [61] "-----------------------------------------------------------------------------------------"
## [62] " 20 | JASON ZHENG |4.0 |L 40|W 49|W 23|W 41|W 28|L 2|L 9|"
## [63] " MI | 14529060 / R: 1595 ->1569 |N:4 |W |B |W |B |W |B |W |"
## [64] "-----------------------------------------------------------------------------------------"
## [65] " 21 | DINH DANG BUI |4.0 |W 43|L 1|W 47|L 3|W 40|W 39|L 6|"
## [66] " ON | 15495066 / R: 1563P22->1562 |N:3 |B |W |B |W |W |B |W |"
## [67] "-----------------------------------------------------------------------------------------"
## [68] " 22 | EUGENE L MCCLURE |4.0 |W 64|D 52|L 28|W 15|H |L 17|W 40|"
## [69] " MI | 12405534 / R: 1555 ->1529 |N:4 |W |B |W |B | |W |B |"
## [70] "-----------------------------------------------------------------------------------------"
## [71] " 23 | ALAN BUI |4.0 |L 4|W 43|L 20|W 58|L 17|W 37|W 46|"
## [72] " ON | 15030142 / R: 1363 ->1371 | |B |W |B |W |B |W |B |"
## [73] "-----------------------------------------------------------------------------------------"
## [74] " 24 | MICHAEL R ALDRICH |4.0 |L 28|L 47|W 43|L 25|W 60|W 44|W 39|"
## [75] " MI | 13469010 / R: 1229 ->1300 |N:4 |B |W |B |B |W |W |B |"
## [76] "-----------------------------------------------------------------------------------------"
## [77] " 25 | LOREN SCHWIEBERT |3.5 |L 9|W 53|L 3|W 24|D 34|L 10|W 47|"
## [78] " MI | 12486656 / R: 1745 ->1681 |N:4 |B |W |B |W |B |W |B |"
## [79] "-----------------------------------------------------------------------------------------"
## [80] " 26 | MAX ZHU |3.5 |W 49|W 40|W 17|L 4|L 9|D 32|L 11|"
## [81] " ON | 15131520 / R: 1579 ->1564 |N:4 |B |W |B |W |B |W |W |"
## [82] "-----------------------------------------------------------------------------------------"
## [83] " 27 | GAURAV GIDWANI |3.5 |W 51|L 13|W 46|W 37|D 14|L 6|U |"
## [84] " MI | 14476567 / R: 1552 ->1539 |N:4 |W |B |W |B |W |B | |"
## [85] "-----------------------------------------------------------------------------------------"
## [86] " 28 | SOFIA ADINA STANESCU-BELLU |3.5 |W 24|D 4|W 22|D 19|L 20|L 8|D 36|"
## [87] " MI | 14882954 / R: 1507 ->1513 |N:3 |W |W |B |W |B |B |W |"
## [88] "-----------------------------------------------------------------------------------------"
## [89] " 29 | CHIEDOZIE OKORIE |3.5 |W 50|D 6|L 38|L 34|W 52|W 48|U |"
## [90] " MI | 15323285 / R: 1602P6 ->1508P12 |N:4 |B |W |B |W |W |B | |"
## [91] "-----------------------------------------------------------------------------------------"
## [92] " 30 | GEORGE AVERY JONES |3.5 |L 52|D 64|L 15|W 55|L 31|W 61|W 50|"
## [93] " ON | 12577178 / R: 1522 ->1444 | |W |B |B |W |W |B |B |"
## [94] "-----------------------------------------------------------------------------------------"
## [95] " 31 | RISHI SHETTY |3.5 |L 58|D 55|W 64|L 10|W 30|W 50|L 14|"
## [96] " MI | 15131618 / R: 1494 ->1444 | |B |W |B |W |B |W |B |"
## [97] "-----------------------------------------------------------------------------------------"
## [98] " 32 | JOSHUA PHILIP MATHEWS |3.5 |W 61|L 8|W 44|L 18|W 51|D 26|L 13|"
## [99] " ON | 14073750 / R: 1441 ->1433 |N:4 |W |B |W |B |W |B |W |"
## [100] "-----------------------------------------------------------------------------------------"
## [101] " 33 | JADE GE |3.5 |W 60|L 12|W 50|D 36|L 13|L 15|W 51|"
## [102] " MI | 14691842 / R: 1449 ->1421 | |B |W |B |W |B |W |B |"
## [103] "-----------------------------------------------------------------------------------------"
## [104] " 34 | MICHAEL JEFFERY THOMAS |3.5 |L 6|W 60|L 37|W 29|D 25|L 11|W 52|"
## [105] " MI | 15051807 / R: 1399 ->1400 | |B |W |B |B |W |B |W |"
## [106] "-----------------------------------------------------------------------------------------"
## [107] " 35 | JOSHUA DAVID LEE |3.5 |L 46|L 38|W 56|L 6|W 57|D 52|W 48|"
## [108] " MI | 14601397 / R: 1438 ->1392 | |W |W |B |W |B |B |W |"
## [109] "-----------------------------------------------------------------------------------------"
## [110] " 36 | SIDDHARTH JHA |3.5 |L 13|W 57|W 51|D 33|H |L 16|D 28|"
## [111] " MI | 14773163 / R: 1355 ->1367 |N:4 |W |B |W |B | |W |B |"
## [112] "-----------------------------------------------------------------------------------------"
## [113] " 37 | AMIYATOSH PWNANANDAM |3.5 |B |L 5|W 34|L 27|H |L 23|W 61|"
## [114] " MI | 15489571 / R: 980P12->1077P17 | | |B |W |W | |B |W |"
## [115] "-----------------------------------------------------------------------------------------"
## [116] " 38 | BRIAN LIU |3.0 |D 11|W 35|W 29|L 12|H |L 18|L 15|"
## [117] " MI | 15108523 / R: 1423 ->1439 |N:4 |W |B |W |W | |B |B |"
## [118] "-----------------------------------------------------------------------------------------"
## [119] " 39 | JOEL R HENDON |3.0 |L 1|W 54|W 40|L 16|W 44|L 21|L 24|"
## [120] " MI | 12923035 / R: 1436P23->1413 |N:4 |B |W |B |W |B |W |W |"
## [121] "-----------------------------------------------------------------------------------------"
## [122] " 40 | FOREST ZHANG |3.0 |W 20|L 26|L 39|W 59|L 21|W 56|L 22|"
## [123] " MI | 14892710 / R: 1348 ->1346 | |B |B |W |W |B |W |W |"
## [124] "-----------------------------------------------------------------------------------------"
## [125] " 41 | KYLE WILLIAM MURPHY |3.0 |W 59|L 17|W 58|L 20|X |U |U |"
## [126] " MI | 15761443 / R: 1403P5 ->1341P9 | |B |W |B |W | | | |"
## [127] "-----------------------------------------------------------------------------------------"
## [128] " 42 | JARED GE |3.0 |L 12|L 50|L 57|D 60|D 61|W 64|W 56|"
## [129] " MI | 14462326 / R: 1332 ->1256 | |B |W |B |B |W |W |B |"
## [130] "-----------------------------------------------------------------------------------------"
## [131] " 43 | ROBERT GLEN VASEY |3.0 |L 21|L 23|L 24|W 63|W 59|L 46|W 55|"
## [132] " MI | 14101068 / R: 1283 ->1244 | |W |B |W |W |B |B |W |"
## [133] "-----------------------------------------------------------------------------------------"
## [134] " 44 | JUSTIN D SCHILLING |3.0 |B |L 14|L 32|W 53|L 39|L 24|W 59|"
## [135] " MI | 15323504 / R: 1199 ->1199 | | |W |B |B |W |B |W |"
## [136] "-----------------------------------------------------------------------------------------"
## [137] " 45 | DEREK YAN |3.0 |L 5|L 51|D 60|L 56|W 63|D 55|W 58|"
## [138] " MI | 15372807 / R: 1242 ->1191 | |W |B |W |B |W |B |W |"
## [139] "-----------------------------------------------------------------------------------------"
## [140] " 46 | JACOB ALEXANDER LAVALLEY |3.0 |W 35|L 7|L 27|L 50|W 64|W 43|L 23|"
## [141] " MI | 15490981 / R: 377P3 ->1076P10 | |B |W |B |W |B |W |W |"
## [142] "-----------------------------------------------------------------------------------------"
## [143] " 47 | ERIC WRIGHT |2.5 |L 18|W 24|L 21|W 61|L 8|D 51|L 25|"
## [144] " MI | 12533115 / R: 1362 ->1341 | |W |B |W |B |W |B |W |"
## [145] "-----------------------------------------------------------------------------------------"
## [146] " 48 | DANIEL KHAIN |2.5 |L 17|W 63|H |D 52|H |L 29|L 35|"
## [147] " MI | 14369165 / R: 1382 ->1335 | |B |W | |B | |W |B |"
## [148] "-----------------------------------------------------------------------------------------"
## [149] " 49 | MICHAEL J MARTIN |2.5 |L 26|L 20|D 63|D 64|W 58|H |U |"
## [150] " MI | 12531685 / R: 1291P12->1259P17 | |W |W |B |W |B | | |"
## [151] "-----------------------------------------------------------------------------------------"
## [152] " 50 | SHIVAM JHA |2.5 |L 29|W 42|L 33|W 46|H |L 31|L 30|"
## [153] " MI | 14773178 / R: 1056 ->1111 | |W |B |W |B | |B |W |"
## [154] "-----------------------------------------------------------------------------------------"
## [155] " 51 | TEJAS AYYAGARI |2.5 |L 27|W 45|L 36|W 57|L 32|D 47|L 33|"
## [156] " MI | 15205474 / R: 1011 ->1097 | |B |W |B |W |B |W |W |"
## [157] "-----------------------------------------------------------------------------------------"
## [158] " 52 | ETHAN GUO |2.5 |W 30|D 22|L 19|D 48|L 29|D 35|L 34|"
## [159] " MI | 14918803 / R: 935 ->1092 |N:4 |B |W |B |W |B |W |B |"
## [160] "-----------------------------------------------------------------------------------------"
## [161] " 53 | JOSE C YBARRA |2.0 |H |L 25|H |L 44|U |W 57|U |"
## [162] " MI | 12578849 / R: 1393 ->1359 | | |B | |W | |W | |"
## [163] "-----------------------------------------------------------------------------------------"
## [164] " 54 | LARRY HODGE |2.0 |L 14|L 39|L 61|B |L 15|L 59|W 64|"
## [165] " MI | 12836773 / R: 1270 ->1200 | |B |B |W | |W |B |W |"
## [166] "-----------------------------------------------------------------------------------------"
## [167] " 55 | ALEX KONG |2.0 |L 62|D 31|L 10|L 30|B |D 45|L 43|"
## [168] " MI | 15412571 / R: 1186 ->1163 | |W |B |W |B | |W |B |"
## [169] "-----------------------------------------------------------------------------------------"
## [170] " 56 | MARISA RICCI |2.0 |H |L 11|L 35|W 45|H |L 40|L 42|"
## [171] " MI | 14679887 / R: 1153 ->1140 | | |B |W |W | |B |W |"
## [172] "-----------------------------------------------------------------------------------------"
## [173] " 57 | MICHAEL LU |2.0 |L 7|L 36|W 42|L 51|L 35|L 53|B |"
## [174] " MI | 15113330 / R: 1092 ->1079 | |B |W |W |B |W |B | |"
## [175] "-----------------------------------------------------------------------------------------"
## [176] " 58 | VIRAJ MOHILE |2.0 |W 31|L 2|L 41|L 23|L 49|B |L 45|"
## [177] " MI | 14700365 / R: 917 -> 941 | |W |B |W |B |W | |B |"
## [178] "-----------------------------------------------------------------------------------------"
## [179] " 59 | SEAN M MC CORMICK |2.0 |L 41|B |L 9|L 40|L 43|W 54|L 44|"
## [180] " MI | 12841036 / R: 853 -> 878 | |W | |B |B |W |W |B |"
## [181] "-----------------------------------------------------------------------------------------"
## [182] " 60 | JULIA SHEN |1.5 |L 33|L 34|D 45|D 42|L 24|H |U |"
## [183] " MI | 14579262 / R: 967 -> 984 | |W |B |B |W |B | | |"
## [184] "-----------------------------------------------------------------------------------------"
## [185] " 61 | JEZZEL FARKAS |1.5 |L 32|L 3|W 54|L 47|D 42|L 30|L 37|"
## [186] " ON | 15771592 / R: 955P11-> 979P18 | |B |W |B |W |B |W |B |"
## [187] "-----------------------------------------------------------------------------------------"
## [188] " 62 | ASHWIN BALAJI |1.0 |W 55|U |U |U |U |U |U |"
## [189] " MI | 15219542 / R: 1530 ->1535 | |B | | | | | | |"
## [190] "-----------------------------------------------------------------------------------------"
## [191] " 63 | THOMAS JOSEPH HOSMER |1.0 |L 2|L 48|D 49|L 43|L 45|H |U |"
## [192] " MI | 15057092 / R: 1175 ->1125 | |W |B |W |B |B | | |"
## [193] "-----------------------------------------------------------------------------------------"
## [194] " 64 | BEN LI |1.0 |L 22|D 30|L 31|D 49|L 46|L 42|L 54|"
## [195] " MI | 15006561 / R: 1163 ->1112 | |B |W |W |B |W |B |B |"
## [196] "-----------------------------------------------------------------------------------------"
I used str_extract with the regular expression “\| [A-Z]+(.)+[A-Z]+”.
The (.) and additional spaces after were added in order to catch full
names of people with more than 2 words i.e. “THOMAS JOSEPH
HOSMER”.
This is the only vector that had an extra row in it as the label at the
top of the txt file with “USCF ID” was also returning in player_names so
it had to be manually removed.
player_names <- str_extract(txt_data, "\\| [A-Z]+(.)+[A-Z]+ ")
player_names <-player_names[!is.na(player_names)]
player_names <- str_extract(player_names, "[A-Z]+(.)+[A-Z]+")
player_names <- player_names[ !player_names == 'USCF ID']
player_names
## [1] "GARY HUA" "DAKSHESH DARURI"
## [3] "ADITYA BAJAJ" "PATRICK H SCHILLING"
## [5] "HANSHI ZUO" "HANSEN SONG"
## [7] "GARY DEE SWATHELL" "EZEKIEL HOUGHTON"
## [9] "STEFANO LEE" "ANVIT RAO"
## [11] "CAMERON WILLIAM MC LEMAN" "KENNETH J TACK"
## [13] "TORRANCE HENRY JR" "BRADLEY SHAW"
## [15] "ZACHARY JAMES HOUGHTON" "MIKE NIKITIN"
## [17] "RONALD GRZEGORCZYK" "DAVID SUNDEEN"
## [19] "DIPANKAR ROY" "JASON ZHENG"
## [21] "DINH DANG BUI" "EUGENE L MCCLURE"
## [23] "ALAN BUI" "MICHAEL R ALDRICH"
## [25] "LOREN SCHWIEBERT" "MAX ZHU"
## [27] "GAURAV GIDWANI" "SOFIA ADINA STANESCU-BELLU"
## [29] "CHIEDOZIE OKORIE" "GEORGE AVERY JONES"
## [31] "RISHI SHETTY" "JOSHUA PHILIP MATHEWS"
## [33] "JADE GE" "MICHAEL JEFFERY THOMAS"
## [35] "JOSHUA DAVID LEE" "SIDDHARTH JHA"
## [37] "AMIYATOSH PWNANANDAM" "BRIAN LIU"
## [39] "JOEL R HENDON" "FOREST ZHANG"
## [41] "KYLE WILLIAM MURPHY" "JARED GE"
## [43] "ROBERT GLEN VASEY" "JUSTIN D SCHILLING"
## [45] "DEREK YAN" "JACOB ALEXANDER LAVALLEY"
## [47] "ERIC WRIGHT" "DANIEL KHAIN"
## [49] "MICHAEL J MARTIN" "SHIVAM JHA"
## [51] "TEJAS AYYAGARI" "ETHAN GUO"
## [53] "JOSE C YBARRA" "LARRY HODGE"
## [55] "ALEX KONG" "MARISA RICCI"
## [57] "MICHAEL LU" "VIRAJ MOHILE"
## [59] "SEAN M MC CORMICK" "JULIA SHEN"
## [61] "JEZZEL FARKAS" "ASHWIN BALAJI"
## [63] "THOMAS JOSEPH HOSMER" "BEN LI"
I used str_extract with the regular expression “[A-Z][A-Z] \|”. This one was simplier than the player’s names as states are always two 2 letters.
player_states <- str_extract(txt_data, "[A-Z][A-Z] \\|")
player_states <-player_states[!is.na(player_states)]
player_states <- str_extract(player_states, "[A-Z][A-Z]")
player_states
## [1] "ON" "MI" "MI" "MI" "MI" "OH" "MI" "MI" "ON" "MI" "MI" "MI" "MI" "MI" "MI"
## [16] "MI" "MI" "MI" "MI" "MI" "ON" "MI" "ON" "MI" "MI" "ON" "MI" "MI" "MI" "ON"
## [31] "MI" "ON" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI"
## [46] "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI"
## [61] "ON" "MI" "MI" "MI"
I used str_extract with the regular expression “\|[0-9].[0-9]” This one was simple like player’s state as it would always be #.#. I could also write the regex as “\|[0-9].[0,5]” as the 2nd number was always 0 or 5.
points_total <- str_extract(txt_data, "\\|[0-9].[0-9]")
points_total <-points_total[!is.na(points_total)]
points_total <- str_extract(points_total, "[0-9].[0-9]")
points_total
## [1] "6.0" "6.0" "6.0" "5.5" "5.5" "5.0" "5.0" "5.0" "5.0" "5.0" "4.5" "4.5"
## [13] "4.5" "4.5" "4.5" "4.0" "4.0" "4.0" "4.0" "4.0" "4.0" "4.0" "4.0" "4.0"
## [25] "3.5" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5"
## [37] "3.5" "3.0" "3.0" "3.0" "3.0" "3.0" "3.0" "3.0" "3.0" "3.0" "2.5" "2.5"
## [49] "2.5" "2.5" "2.5" "2.5" "2.0" "2.0" "2.0" "2.0" "2.0" "2.0" "2.0" "1.5"
## [61] "1.5" "1.0" "1.0" "1.0"
I used str_extract with the regular expression “R: +[0-9][0-9][0-9]+”.
player_prerating <- str_extract(txt_data, ("R: +[0-9][0-9][0-9]+"))
player_prerating <-player_prerating[!is.na(player_prerating)]
player_prerating <- str_extract(player_prerating, "[0-9][0-9][0-9]+")
player_prerating
## [1] "1794" "1553" "1384" "1716" "1655" "1686" "1649" "1641" "1411" "1365"
## [11] "1712" "1663" "1666" "1610" "1220" "1604" "1629" "1600" "1564" "1595"
## [21] "1563" "1555" "1363" "1229" "1745" "1579" "1552" "1507" "1602" "1522"
## [31] "1494" "1441" "1449" "1399" "1438" "1355" "980" "1423" "1436" "1348"
## [41] "1403" "1332" "1283" "1199" "1242" "377" "1362" "1382" "1291" "1056"
## [51] "1011" "935" "1393" "1270" "1186" "1153" "1092" "917" "853" "967"
## [61] "955" "1530" "1175" "1163"
This last vector was a lot more work than the other vectors as we first had to extract the section of each player’s data with their opponents’ ids. It took 2 str_extracts to get the opponent ids by themselves (together in a character vector for that player). Below is the code with the str_extract statements and examples of what a line in opponents_prerating looks like.
opponents_prerating <- str_extract(txt_data, ("(\\|[L,W,D]+ +[0-9]+).*\\|"))
opponents_prerating <- opponents_prerating[!is.na(opponents_prerating)]
opponents_prerating[1]
## [1] "|W 39|W 21|W 18|W 14|W 7|D 12|D 4|"
opponents_prerating <- str_extract_all(opponents_prerating,"[0-9]+")
opponents_prerating[1]
## [[1]]
## [1] "39" "21" "18" "14" "7" "12" "4"
After getting it down to just the opponents’ id numbers for each player, I then needed to find the mean of each player’s opponents pre rating. In order to do this, I used 2 for loops (1 looping through the players and 1 looping around the opponents’ id for that player). For each opponent id in that player’s vector, I called the vector player_prerating I made earlier to get the opponent’s prerating. After the last opponent id in that player’s vector, I divided it by number of opponents and added it to avg_opponents_prerating. I made use of two temp variables for the number of opponents for each player and the combined prerating scores of their opponents that would reset after every player vector.
avg_opponents_prerating <- vector(length=length(opponents_prerating))
for (i in 1:length(opponents_prerating)){
opponents_num <- 0
scores_combined <- 0
for (j in opponents_prerating[[i]]){
opponents_num <- opponents_num+1
scores_combined <- scores_combined+as.numeric(player_prerating[as.numeric(j[1])])
}
avg_opponents_prerating[i] <- scores_combined/opponents_num
}
avg_opponents_prerating <- round(avg_opponents_prerating, 0)
avg_opponents_prerating
## [1] 1605 1469 1564 1574 1501 1519 1372 1468 1523 1554 1468 1506 1498 1515 1484
## [16] 1386 1499 1480 1426 1411 1470 1300 1214 1357 1363 1507 1222 1522 1314 1144
## [31] 1260 1379 1277 1375 1150 1388 1385 1539 1430 1391 1248 1150 1107 1327 1152
## [46] 1358 1392 1356 1286 1296 1356 1495 1345 1206 1406 1414 1363 1391 1319 1330
## [61] 1327 1186 1350 1263
With all 5 vectors created and filled, it is now time to make the csv_data dataframe that will be saved as a csv later. Below is the final results of the collected data from tournamentinfo.txt
csv_data <- data.frame(player_names,player_states,points_total,player_prerating,avg_opponents_prerating)
colnames(csv_data)[1]="Player’s Name"
colnames(csv_data)[2]="Player’s State"
colnames(csv_data)[3]="Total Number of Points"
colnames(csv_data)[4]="Player’s Pre-Rating"
colnames(csv_data)[5]="Average Pre Chess Rating of Opponents"
csv_data
## Player’s Name Player’s State Total Number of Points
## 1 GARY HUA ON 6.0
## 2 DAKSHESH DARURI MI 6.0
## 3 ADITYA BAJAJ MI 6.0
## 4 PATRICK H SCHILLING MI 5.5
## 5 HANSHI ZUO MI 5.5
## 6 HANSEN SONG OH 5.0
## 7 GARY DEE SWATHELL MI 5.0
## 8 EZEKIEL HOUGHTON MI 5.0
## 9 STEFANO LEE ON 5.0
## 10 ANVIT RAO MI 5.0
## 11 CAMERON WILLIAM MC LEMAN MI 4.5
## 12 KENNETH J TACK MI 4.5
## 13 TORRANCE HENRY JR MI 4.5
## 14 BRADLEY SHAW MI 4.5
## 15 ZACHARY JAMES HOUGHTON MI 4.5
## 16 MIKE NIKITIN MI 4.0
## 17 RONALD GRZEGORCZYK MI 4.0
## 18 DAVID SUNDEEN MI 4.0
## 19 DIPANKAR ROY MI 4.0
## 20 JASON ZHENG MI 4.0
## 21 DINH DANG BUI ON 4.0
## 22 EUGENE L MCCLURE MI 4.0
## 23 ALAN BUI ON 4.0
## 24 MICHAEL R ALDRICH MI 4.0
## 25 LOREN SCHWIEBERT MI 3.5
## 26 MAX ZHU ON 3.5
## 27 GAURAV GIDWANI MI 3.5
## 28 SOFIA ADINA STANESCU-BELLU MI 3.5
## 29 CHIEDOZIE OKORIE MI 3.5
## 30 GEORGE AVERY JONES ON 3.5
## 31 RISHI SHETTY MI 3.5
## 32 JOSHUA PHILIP MATHEWS ON 3.5
## 33 JADE GE MI 3.5
## 34 MICHAEL JEFFERY THOMAS MI 3.5
## 35 JOSHUA DAVID LEE MI 3.5
## 36 SIDDHARTH JHA MI 3.5
## 37 AMIYATOSH PWNANANDAM MI 3.5
## 38 BRIAN LIU MI 3.0
## 39 JOEL R HENDON MI 3.0
## 40 FOREST ZHANG MI 3.0
## 41 KYLE WILLIAM MURPHY MI 3.0
## 42 JARED GE MI 3.0
## 43 ROBERT GLEN VASEY MI 3.0
## 44 JUSTIN D SCHILLING MI 3.0
## 45 DEREK YAN MI 3.0
## 46 JACOB ALEXANDER LAVALLEY MI 3.0
## 47 ERIC WRIGHT MI 2.5
## 48 DANIEL KHAIN MI 2.5
## 49 MICHAEL J MARTIN MI 2.5
## 50 SHIVAM JHA MI 2.5
## 51 TEJAS AYYAGARI MI 2.5
## 52 ETHAN GUO MI 2.5
## 53 JOSE C YBARRA MI 2.0
## 54 LARRY HODGE MI 2.0
## 55 ALEX KONG MI 2.0
## 56 MARISA RICCI MI 2.0
## 57 MICHAEL LU MI 2.0
## 58 VIRAJ MOHILE MI 2.0
## 59 SEAN M MC CORMICK MI 2.0
## 60 JULIA SHEN MI 1.5
## 61 JEZZEL FARKAS ON 1.5
## 62 ASHWIN BALAJI MI 1.0
## 63 THOMAS JOSEPH HOSMER MI 1.0
## 64 BEN LI MI 1.0
## Player’s Pre-Rating Average Pre Chess Rating of Opponents
## 1 1794 1605
## 2 1553 1469
## 3 1384 1564
## 4 1716 1574
## 5 1655 1501
## 6 1686 1519
## 7 1649 1372
## 8 1641 1468
## 9 1411 1523
## 10 1365 1554
## 11 1712 1468
## 12 1663 1506
## 13 1666 1498
## 14 1610 1515
## 15 1220 1484
## 16 1604 1386
## 17 1629 1499
## 18 1600 1480
## 19 1564 1426
## 20 1595 1411
## 21 1563 1470
## 22 1555 1300
## 23 1363 1214
## 24 1229 1357
## 25 1745 1363
## 26 1579 1507
## 27 1552 1222
## 28 1507 1522
## 29 1602 1314
## 30 1522 1144
## 31 1494 1260
## 32 1441 1379
## 33 1449 1277
## 34 1399 1375
## 35 1438 1150
## 36 1355 1388
## 37 980 1385
## 38 1423 1539
## 39 1436 1430
## 40 1348 1391
## 41 1403 1248
## 42 1332 1150
## 43 1283 1107
## 44 1199 1327
## 45 1242 1152
## 46 377 1358
## 47 1362 1392
## 48 1382 1356
## 49 1291 1286
## 50 1056 1296
## 51 1011 1356
## 52 935 1495
## 53 1393 1345
## 54 1270 1206
## 55 1186 1406
## 56 1153 1414
## 57 1092 1363
## 58 917 1391
## 59 853 1319
## 60 967 1330
## 61 955 1327
## 62 1530 1186
## 63 1175 1350
## 64 1163 1263
For the final step, using the readr library and the function write_csv, we can write csv_data into tournamentinfo.csv.
write_csv(csv_data, "tournamentinfo.csv")