Introduction

This is the R Markdown document for my 607 project 1. I had to take in txt file (tournamentinfo.txt) of chess tournament data and make a new csv with 5 columns using the info in the txt file:

In order to accomplish this, I saved the 5 requested columns into separate vectors and combined them into a dataframe later. To get the needed data for the vectors, I heavily used str_extract and regex/pattern making.

Within the used regexs, the way the data was saved (with | in between columns) was a big help in targeting specific sections of the string. Working with str_extract gave a lot of NA values as well that needed to get removed with !is.na().

library(stringr)
library(readr)

Load txt file

First step was to load in the txt file, tournamentinfo.txt with the needed data. Below is what the raw data from that file looked like:

txt_data <- paste(readLines("tournamentinfo.txt"))
## Warning in readLines("tournamentinfo.txt"): incomplete final line found on
## 'tournamentinfo.txt'
txt_data
##   [1] "-----------------------------------------------------------------------------------------" 
##   [2] " Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| "
##   [3] " Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | "
##   [4] "-----------------------------------------------------------------------------------------" 
##   [5] "    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|" 
##   [6] "   ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |" 
##   [7] "-----------------------------------------------------------------------------------------" 
##   [8] "    2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|" 
##   [9] "   MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |" 
##  [10] "-----------------------------------------------------------------------------------------" 
##  [11] "    3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12|" 
##  [12] "   MI | 14959604 / R: 1384   ->1640     |N:2  |W    |B    |W    |B    |W    |B    |W    |" 
##  [13] "-----------------------------------------------------------------------------------------" 
##  [14] "    4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1|" 
##  [15] "   MI | 12616049 / R: 1716   ->1744     |N:2  |W    |B    |W    |B    |W    |B    |B    |" 
##  [16] "-----------------------------------------------------------------------------------------" 
##  [17] "    5 | HANSHI ZUO                      |5.5  |W  45|W  37|D  12|D  13|D   4|W  14|W  17|" 
##  [18] "   MI | 14601533 / R: 1655   ->1690     |N:2  |B    |W    |B    |W    |B    |W    |B    |" 
##  [19] "-----------------------------------------------------------------------------------------" 
##  [20] "    6 | HANSEN SONG                     |5.0  |W  34|D  29|L  11|W  35|D  10|W  27|W  21|" 
##  [21] "   OH | 15055204 / R: 1686   ->1687     |N:3  |W    |B    |W    |B    |B    |W    |B    |" 
##  [22] "-----------------------------------------------------------------------------------------" 
##  [23] "    7 | GARY DEE SWATHELL               |5.0  |W  57|W  46|W  13|W  11|L   1|W   9|L   2|" 
##  [24] "   MI | 11146376 / R: 1649   ->1673     |N:3  |W    |B    |W    |B    |B    |W    |W    |" 
##  [25] "-----------------------------------------------------------------------------------------" 
##  [26] "    8 | EZEKIEL HOUGHTON                |5.0  |W   3|W  32|L  14|L   9|W  47|W  28|W  19|" 
##  [27] "   MI | 15142253 / R: 1641P17->1657P24  |N:3  |B    |W    |B    |W    |B    |W    |W    |" 
##  [28] "-----------------------------------------------------------------------------------------" 
##  [29] "    9 | STEFANO LEE                     |5.0  |W  25|L  18|W  59|W   8|W  26|L   7|W  20|" 
##  [30] "   ON | 14954524 / R: 1411   ->1564     |N:2  |W    |B    |W    |B    |W    |B    |B    |" 
##  [31] "-----------------------------------------------------------------------------------------" 
##  [32] "   10 | ANVIT RAO                       |5.0  |D  16|L  19|W  55|W  31|D   6|W  25|W  18|" 
##  [33] "   MI | 14150362 / R: 1365   ->1544     |N:3  |W    |W    |B    |B    |W    |B    |W    |" 
##  [34] "-----------------------------------------------------------------------------------------" 
##  [35] "   11 | CAMERON WILLIAM MC LEMAN        |4.5  |D  38|W  56|W   6|L   7|L   3|W  34|W  26|" 
##  [36] "   MI | 12581589 / R: 1712   ->1696     |N:3  |B    |W    |B    |W    |B    |W    |B    |" 
##  [37] "-----------------------------------------------------------------------------------------" 
##  [38] "   12 | KENNETH J TACK                  |4.5  |W  42|W  33|D   5|W  38|H    |D   1|L   3|" 
##  [39] "   MI | 12681257 / R: 1663   ->1670     |N:3  |W    |B    |W    |B    |     |W    |B    |" 
##  [40] "-----------------------------------------------------------------------------------------" 
##  [41] "   13 | TORRANCE HENRY JR               |4.5  |W  36|W  27|L   7|D   5|W  33|L   3|W  32|" 
##  [42] "   MI | 15082995 / R: 1666   ->1662     |N:3  |B    |W    |B    |B    |W    |W    |B    |" 
##  [43] "-----------------------------------------------------------------------------------------" 
##  [44] "   14 | BRADLEY SHAW                    |4.5  |W  54|W  44|W   8|L   1|D  27|L   5|W  31|" 
##  [45] "   MI | 10131499 / R: 1610   ->1618     |N:3  |W    |B    |W    |W    |B    |B    |W    |" 
##  [46] "-----------------------------------------------------------------------------------------" 
##  [47] "   15 | ZACHARY JAMES HOUGHTON          |4.5  |D  19|L  16|W  30|L  22|W  54|W  33|W  38|" 
##  [48] "   MI | 15619130 / R: 1220P13->1416P20  |N:3  |B    |B    |W    |W    |B    |B    |W    |" 
##  [49] "-----------------------------------------------------------------------------------------" 
##  [50] "   16 | MIKE NIKITIN                    |4.0  |D  10|W  15|H    |W  39|L   2|W  36|U    |" 
##  [51] "   MI | 10295068 / R: 1604   ->1613     |N:3  |B    |W    |     |B    |W    |B    |     |" 
##  [52] "-----------------------------------------------------------------------------------------" 
##  [53] "   17 | RONALD GRZEGORCZYK              |4.0  |W  48|W  41|L  26|L   2|W  23|W  22|L   5|" 
##  [54] "   MI | 10297702 / R: 1629   ->1610     |N:3  |W    |B    |W    |B    |W    |B    |W    |" 
##  [55] "-----------------------------------------------------------------------------------------" 
##  [56] "   18 | DAVID SUNDEEN                   |4.0  |W  47|W   9|L   1|W  32|L  19|W  38|L  10|" 
##  [57] "   MI | 11342094 / R: 1600   ->1600     |N:3  |B    |W    |B    |W    |B    |W    |B    |" 
##  [58] "-----------------------------------------------------------------------------------------" 
##  [59] "   19 | DIPANKAR ROY                    |4.0  |D  15|W  10|W  52|D  28|W  18|L   4|L   8|" 
##  [60] "   MI | 14862333 / R: 1564   ->1570     |N:3  |W    |B    |W    |B    |W    |W    |B    |" 
##  [61] "-----------------------------------------------------------------------------------------" 
##  [62] "   20 | JASON ZHENG                     |4.0  |L  40|W  49|W  23|W  41|W  28|L   2|L   9|" 
##  [63] "   MI | 14529060 / R: 1595   ->1569     |N:4  |W    |B    |W    |B    |W    |B    |W    |" 
##  [64] "-----------------------------------------------------------------------------------------" 
##  [65] "   21 | DINH DANG BUI                   |4.0  |W  43|L   1|W  47|L   3|W  40|W  39|L   6|" 
##  [66] "   ON | 15495066 / R: 1563P22->1562     |N:3  |B    |W    |B    |W    |W    |B    |W    |" 
##  [67] "-----------------------------------------------------------------------------------------" 
##  [68] "   22 | EUGENE L MCCLURE                |4.0  |W  64|D  52|L  28|W  15|H    |L  17|W  40|" 
##  [69] "   MI | 12405534 / R: 1555   ->1529     |N:4  |W    |B    |W    |B    |     |W    |B    |" 
##  [70] "-----------------------------------------------------------------------------------------" 
##  [71] "   23 | ALAN BUI                        |4.0  |L   4|W  43|L  20|W  58|L  17|W  37|W  46|" 
##  [72] "   ON | 15030142 / R: 1363   ->1371     |     |B    |W    |B    |W    |B    |W    |B    |" 
##  [73] "-----------------------------------------------------------------------------------------" 
##  [74] "   24 | MICHAEL R ALDRICH               |4.0  |L  28|L  47|W  43|L  25|W  60|W  44|W  39|" 
##  [75] "   MI | 13469010 / R: 1229   ->1300     |N:4  |B    |W    |B    |B    |W    |W    |B    |" 
##  [76] "-----------------------------------------------------------------------------------------" 
##  [77] "   25 | LOREN SCHWIEBERT                |3.5  |L   9|W  53|L   3|W  24|D  34|L  10|W  47|" 
##  [78] "   MI | 12486656 / R: 1745   ->1681     |N:4  |B    |W    |B    |W    |B    |W    |B    |" 
##  [79] "-----------------------------------------------------------------------------------------" 
##  [80] "   26 | MAX ZHU                         |3.5  |W  49|W  40|W  17|L   4|L   9|D  32|L  11|" 
##  [81] "   ON | 15131520 / R: 1579   ->1564     |N:4  |B    |W    |B    |W    |B    |W    |W    |" 
##  [82] "-----------------------------------------------------------------------------------------" 
##  [83] "   27 | GAURAV GIDWANI                  |3.5  |W  51|L  13|W  46|W  37|D  14|L   6|U    |" 
##  [84] "   MI | 14476567 / R: 1552   ->1539     |N:4  |W    |B    |W    |B    |W    |B    |     |" 
##  [85] "-----------------------------------------------------------------------------------------" 
##  [86] "   28 | SOFIA ADINA STANESCU-BELLU      |3.5  |W  24|D   4|W  22|D  19|L  20|L   8|D  36|" 
##  [87] "   MI | 14882954 / R: 1507   ->1513     |N:3  |W    |W    |B    |W    |B    |B    |W    |" 
##  [88] "-----------------------------------------------------------------------------------------" 
##  [89] "   29 | CHIEDOZIE OKORIE                |3.5  |W  50|D   6|L  38|L  34|W  52|W  48|U    |" 
##  [90] "   MI | 15323285 / R: 1602P6 ->1508P12  |N:4  |B    |W    |B    |W    |W    |B    |     |" 
##  [91] "-----------------------------------------------------------------------------------------" 
##  [92] "   30 | GEORGE AVERY JONES              |3.5  |L  52|D  64|L  15|W  55|L  31|W  61|W  50|" 
##  [93] "   ON | 12577178 / R: 1522   ->1444     |     |W    |B    |B    |W    |W    |B    |B    |" 
##  [94] "-----------------------------------------------------------------------------------------" 
##  [95] "   31 | RISHI SHETTY                    |3.5  |L  58|D  55|W  64|L  10|W  30|W  50|L  14|" 
##  [96] "   MI | 15131618 / R: 1494   ->1444     |     |B    |W    |B    |W    |B    |W    |B    |" 
##  [97] "-----------------------------------------------------------------------------------------" 
##  [98] "   32 | JOSHUA PHILIP MATHEWS           |3.5  |W  61|L   8|W  44|L  18|W  51|D  26|L  13|" 
##  [99] "   ON | 14073750 / R: 1441   ->1433     |N:4  |W    |B    |W    |B    |W    |B    |W    |" 
## [100] "-----------------------------------------------------------------------------------------" 
## [101] "   33 | JADE GE                         |3.5  |W  60|L  12|W  50|D  36|L  13|L  15|W  51|" 
## [102] "   MI | 14691842 / R: 1449   ->1421     |     |B    |W    |B    |W    |B    |W    |B    |" 
## [103] "-----------------------------------------------------------------------------------------" 
## [104] "   34 | MICHAEL JEFFERY THOMAS          |3.5  |L   6|W  60|L  37|W  29|D  25|L  11|W  52|" 
## [105] "   MI | 15051807 / R: 1399   ->1400     |     |B    |W    |B    |B    |W    |B    |W    |" 
## [106] "-----------------------------------------------------------------------------------------" 
## [107] "   35 | JOSHUA DAVID LEE                |3.5  |L  46|L  38|W  56|L   6|W  57|D  52|W  48|" 
## [108] "   MI | 14601397 / R: 1438   ->1392     |     |W    |W    |B    |W    |B    |B    |W    |" 
## [109] "-----------------------------------------------------------------------------------------" 
## [110] "   36 | SIDDHARTH JHA                   |3.5  |L  13|W  57|W  51|D  33|H    |L  16|D  28|" 
## [111] "   MI | 14773163 / R: 1355   ->1367     |N:4  |W    |B    |W    |B    |     |W    |B    |" 
## [112] "-----------------------------------------------------------------------------------------" 
## [113] "   37 | AMIYATOSH PWNANANDAM            |3.5  |B    |L   5|W  34|L  27|H    |L  23|W  61|" 
## [114] "   MI | 15489571 / R:  980P12->1077P17  |     |     |B    |W    |W    |     |B    |W    |" 
## [115] "-----------------------------------------------------------------------------------------" 
## [116] "   38 | BRIAN LIU                       |3.0  |D  11|W  35|W  29|L  12|H    |L  18|L  15|" 
## [117] "   MI | 15108523 / R: 1423   ->1439     |N:4  |W    |B    |W    |W    |     |B    |B    |" 
## [118] "-----------------------------------------------------------------------------------------" 
## [119] "   39 | JOEL R HENDON                   |3.0  |L   1|W  54|W  40|L  16|W  44|L  21|L  24|" 
## [120] "   MI | 12923035 / R: 1436P23->1413     |N:4  |B    |W    |B    |W    |B    |W    |W    |" 
## [121] "-----------------------------------------------------------------------------------------" 
## [122] "   40 | FOREST ZHANG                    |3.0  |W  20|L  26|L  39|W  59|L  21|W  56|L  22|" 
## [123] "   MI | 14892710 / R: 1348   ->1346     |     |B    |B    |W    |W    |B    |W    |W    |" 
## [124] "-----------------------------------------------------------------------------------------" 
## [125] "   41 | KYLE WILLIAM MURPHY             |3.0  |W  59|L  17|W  58|L  20|X    |U    |U    |" 
## [126] "   MI | 15761443 / R: 1403P5 ->1341P9   |     |B    |W    |B    |W    |     |     |     |" 
## [127] "-----------------------------------------------------------------------------------------" 
## [128] "   42 | JARED GE                        |3.0  |L  12|L  50|L  57|D  60|D  61|W  64|W  56|" 
## [129] "   MI | 14462326 / R: 1332   ->1256     |     |B    |W    |B    |B    |W    |W    |B    |" 
## [130] "-----------------------------------------------------------------------------------------" 
## [131] "   43 | ROBERT GLEN VASEY               |3.0  |L  21|L  23|L  24|W  63|W  59|L  46|W  55|" 
## [132] "   MI | 14101068 / R: 1283   ->1244     |     |W    |B    |W    |W    |B    |B    |W    |" 
## [133] "-----------------------------------------------------------------------------------------" 
## [134] "   44 | JUSTIN D SCHILLING              |3.0  |B    |L  14|L  32|W  53|L  39|L  24|W  59|" 
## [135] "   MI | 15323504 / R: 1199   ->1199     |     |     |W    |B    |B    |W    |B    |W    |" 
## [136] "-----------------------------------------------------------------------------------------" 
## [137] "   45 | DEREK YAN                       |3.0  |L   5|L  51|D  60|L  56|W  63|D  55|W  58|" 
## [138] "   MI | 15372807 / R: 1242   ->1191     |     |W    |B    |W    |B    |W    |B    |W    |" 
## [139] "-----------------------------------------------------------------------------------------" 
## [140] "   46 | JACOB ALEXANDER LAVALLEY        |3.0  |W  35|L   7|L  27|L  50|W  64|W  43|L  23|" 
## [141] "   MI | 15490981 / R:  377P3 ->1076P10  |     |B    |W    |B    |W    |B    |W    |W    |" 
## [142] "-----------------------------------------------------------------------------------------" 
## [143] "   47 | ERIC WRIGHT                     |2.5  |L  18|W  24|L  21|W  61|L   8|D  51|L  25|" 
## [144] "   MI | 12533115 / R: 1362   ->1341     |     |W    |B    |W    |B    |W    |B    |W    |" 
## [145] "-----------------------------------------------------------------------------------------" 
## [146] "   48 | DANIEL KHAIN                    |2.5  |L  17|W  63|H    |D  52|H    |L  29|L  35|" 
## [147] "   MI | 14369165 / R: 1382   ->1335     |     |B    |W    |     |B    |     |W    |B    |" 
## [148] "-----------------------------------------------------------------------------------------" 
## [149] "   49 | MICHAEL J MARTIN                |2.5  |L  26|L  20|D  63|D  64|W  58|H    |U    |" 
## [150] "   MI | 12531685 / R: 1291P12->1259P17  |     |W    |W    |B    |W    |B    |     |     |" 
## [151] "-----------------------------------------------------------------------------------------" 
## [152] "   50 | SHIVAM JHA                      |2.5  |L  29|W  42|L  33|W  46|H    |L  31|L  30|" 
## [153] "   MI | 14773178 / R: 1056   ->1111     |     |W    |B    |W    |B    |     |B    |W    |" 
## [154] "-----------------------------------------------------------------------------------------" 
## [155] "   51 | TEJAS AYYAGARI                  |2.5  |L  27|W  45|L  36|W  57|L  32|D  47|L  33|" 
## [156] "   MI | 15205474 / R: 1011   ->1097     |     |B    |W    |B    |W    |B    |W    |W    |" 
## [157] "-----------------------------------------------------------------------------------------" 
## [158] "   52 | ETHAN GUO                       |2.5  |W  30|D  22|L  19|D  48|L  29|D  35|L  34|" 
## [159] "   MI | 14918803 / R:  935   ->1092     |N:4  |B    |W    |B    |W    |B    |W    |B    |" 
## [160] "-----------------------------------------------------------------------------------------" 
## [161] "   53 | JOSE C YBARRA                   |2.0  |H    |L  25|H    |L  44|U    |W  57|U    |" 
## [162] "   MI | 12578849 / R: 1393   ->1359     |     |     |B    |     |W    |     |W    |     |" 
## [163] "-----------------------------------------------------------------------------------------" 
## [164] "   54 | LARRY HODGE                     |2.0  |L  14|L  39|L  61|B    |L  15|L  59|W  64|" 
## [165] "   MI | 12836773 / R: 1270   ->1200     |     |B    |B    |W    |     |W    |B    |W    |" 
## [166] "-----------------------------------------------------------------------------------------" 
## [167] "   55 | ALEX KONG                       |2.0  |L  62|D  31|L  10|L  30|B    |D  45|L  43|" 
## [168] "   MI | 15412571 / R: 1186   ->1163     |     |W    |B    |W    |B    |     |W    |B    |" 
## [169] "-----------------------------------------------------------------------------------------" 
## [170] "   56 | MARISA RICCI                    |2.0  |H    |L  11|L  35|W  45|H    |L  40|L  42|" 
## [171] "   MI | 14679887 / R: 1153   ->1140     |     |     |B    |W    |W    |     |B    |W    |" 
## [172] "-----------------------------------------------------------------------------------------" 
## [173] "   57 | MICHAEL LU                      |2.0  |L   7|L  36|W  42|L  51|L  35|L  53|B    |" 
## [174] "   MI | 15113330 / R: 1092   ->1079     |     |B    |W    |W    |B    |W    |B    |     |" 
## [175] "-----------------------------------------------------------------------------------------" 
## [176] "   58 | VIRAJ MOHILE                    |2.0  |W  31|L   2|L  41|L  23|L  49|B    |L  45|" 
## [177] "   MI | 14700365 / R:  917   -> 941     |     |W    |B    |W    |B    |W    |     |B    |" 
## [178] "-----------------------------------------------------------------------------------------" 
## [179] "   59 | SEAN M MC CORMICK               |2.0  |L  41|B    |L   9|L  40|L  43|W  54|L  44|" 
## [180] "   MI | 12841036 / R:  853   -> 878     |     |W    |     |B    |B    |W    |W    |B    |" 
## [181] "-----------------------------------------------------------------------------------------" 
## [182] "   60 | JULIA SHEN                      |1.5  |L  33|L  34|D  45|D  42|L  24|H    |U    |" 
## [183] "   MI | 14579262 / R:  967   -> 984     |     |W    |B    |B    |W    |B    |     |     |" 
## [184] "-----------------------------------------------------------------------------------------" 
## [185] "   61 | JEZZEL FARKAS                   |1.5  |L  32|L   3|W  54|L  47|D  42|L  30|L  37|" 
## [186] "   ON | 15771592 / R:  955P11-> 979P18  |     |B    |W    |B    |W    |B    |W    |B    |" 
## [187] "-----------------------------------------------------------------------------------------" 
## [188] "   62 | ASHWIN BALAJI                   |1.0  |W  55|U    |U    |U    |U    |U    |U    |" 
## [189] "   MI | 15219542 / R: 1530   ->1535     |     |B    |     |     |     |     |     |     |" 
## [190] "-----------------------------------------------------------------------------------------" 
## [191] "   63 | THOMAS JOSEPH HOSMER            |1.0  |L   2|L  48|D  49|L  43|L  45|H    |U    |" 
## [192] "   MI | 15057092 / R: 1175   ->1125     |     |W    |B    |W    |B    |B    |     |     |" 
## [193] "-----------------------------------------------------------------------------------------" 
## [194] "   64 | BEN LI                          |1.0  |L  22|D  30|L  31|D  49|L  46|L  42|L  54|" 
## [195] "   MI | 15006561 / R: 1163   ->1112     |     |B    |W    |W    |B    |W    |B    |B    |" 
## [196] "-----------------------------------------------------------------------------------------"

Make and fill a vector for Player’s Name

I used str_extract with the regular expression “\| [A-Z]+(.)+[A-Z]+”. The (.) and additional spaces after were added in order to catch full names of people with more than 2 words i.e. “THOMAS JOSEPH HOSMER”.
This is the only vector that had an extra row in it as the label at the top of the txt file with “USCF ID” was also returning in player_names so it had to be manually removed.

player_names <- str_extract(txt_data, "\\| [A-Z]+(.)+[A-Z]+     ")
player_names <-player_names[!is.na(player_names)]
player_names <- str_extract(player_names, "[A-Z]+(.)+[A-Z]+")
player_names <- player_names[ !player_names == 'USCF ID']
player_names
##  [1] "GARY HUA"                   "DAKSHESH DARURI"           
##  [3] "ADITYA BAJAJ"               "PATRICK H SCHILLING"       
##  [5] "HANSHI ZUO"                 "HANSEN SONG"               
##  [7] "GARY DEE SWATHELL"          "EZEKIEL HOUGHTON"          
##  [9] "STEFANO LEE"                "ANVIT RAO"                 
## [11] "CAMERON WILLIAM MC LEMAN"   "KENNETH J TACK"            
## [13] "TORRANCE HENRY JR"          "BRADLEY SHAW"              
## [15] "ZACHARY JAMES HOUGHTON"     "MIKE NIKITIN"              
## [17] "RONALD GRZEGORCZYK"         "DAVID SUNDEEN"             
## [19] "DIPANKAR ROY"               "JASON ZHENG"               
## [21] "DINH DANG BUI"              "EUGENE L MCCLURE"          
## [23] "ALAN BUI"                   "MICHAEL R ALDRICH"         
## [25] "LOREN SCHWIEBERT"           "MAX ZHU"                   
## [27] "GAURAV GIDWANI"             "SOFIA ADINA STANESCU-BELLU"
## [29] "CHIEDOZIE OKORIE"           "GEORGE AVERY JONES"        
## [31] "RISHI SHETTY"               "JOSHUA PHILIP MATHEWS"     
## [33] "JADE GE"                    "MICHAEL JEFFERY THOMAS"    
## [35] "JOSHUA DAVID LEE"           "SIDDHARTH JHA"             
## [37] "AMIYATOSH PWNANANDAM"       "BRIAN LIU"                 
## [39] "JOEL R HENDON"              "FOREST ZHANG"              
## [41] "KYLE WILLIAM MURPHY"        "JARED GE"                  
## [43] "ROBERT GLEN VASEY"          "JUSTIN D SCHILLING"        
## [45] "DEREK YAN"                  "JACOB ALEXANDER LAVALLEY"  
## [47] "ERIC WRIGHT"                "DANIEL KHAIN"              
## [49] "MICHAEL J MARTIN"           "SHIVAM JHA"                
## [51] "TEJAS AYYAGARI"             "ETHAN GUO"                 
## [53] "JOSE C YBARRA"              "LARRY HODGE"               
## [55] "ALEX KONG"                  "MARISA RICCI"              
## [57] "MICHAEL LU"                 "VIRAJ MOHILE"              
## [59] "SEAN M MC CORMICK"          "JULIA SHEN"                
## [61] "JEZZEL FARKAS"              "ASHWIN BALAJI"             
## [63] "THOMAS JOSEPH HOSMER"       "BEN LI"

Make and fill a vector for Player’s State

I used str_extract with the regular expression “[A-Z][A-Z] \|”. This one was simplier than the player’s names as states are always two 2 letters.

player_states <- str_extract(txt_data, "[A-Z][A-Z] \\|")
player_states <-player_states[!is.na(player_states)]
player_states <- str_extract(player_states, "[A-Z][A-Z]")
player_states
##  [1] "ON" "MI" "MI" "MI" "MI" "OH" "MI" "MI" "ON" "MI" "MI" "MI" "MI" "MI" "MI"
## [16] "MI" "MI" "MI" "MI" "MI" "ON" "MI" "ON" "MI" "MI" "ON" "MI" "MI" "MI" "ON"
## [31] "MI" "ON" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI"
## [46] "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI"
## [61] "ON" "MI" "MI" "MI"

Make and fill a vector for Total Number of Points

I used str_extract with the regular expression “\|[0-9].[0-9]” This one was simple like player’s state as it would always be #.#. I could also write the regex as “\|[0-9].[0,5]” as the 2nd number was always 0 or 5.

points_total <- str_extract(txt_data, "\\|[0-9].[0-9]")
points_total <-points_total[!is.na(points_total)]
points_total <- str_extract(points_total, "[0-9].[0-9]")
points_total
##  [1] "6.0" "6.0" "6.0" "5.5" "5.5" "5.0" "5.0" "5.0" "5.0" "5.0" "4.5" "4.5"
## [13] "4.5" "4.5" "4.5" "4.0" "4.0" "4.0" "4.0" "4.0" "4.0" "4.0" "4.0" "4.0"
## [25] "3.5" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5"
## [37] "3.5" "3.0" "3.0" "3.0" "3.0" "3.0" "3.0" "3.0" "3.0" "3.0" "2.5" "2.5"
## [49] "2.5" "2.5" "2.5" "2.5" "2.0" "2.0" "2.0" "2.0" "2.0" "2.0" "2.0" "1.5"
## [61] "1.5" "1.0" "1.0" "1.0"

Make and fill a vector for Player’s Pre-Rating

I used str_extract with the regular expression “R: +[0-9][0-9][0-9]+”.

player_prerating <- str_extract(txt_data, ("R: +[0-9][0-9][0-9]+"))
player_prerating <-player_prerating[!is.na(player_prerating)]
player_prerating <- str_extract(player_prerating, "[0-9][0-9][0-9]+")
player_prerating
##  [1] "1794" "1553" "1384" "1716" "1655" "1686" "1649" "1641" "1411" "1365"
## [11] "1712" "1663" "1666" "1610" "1220" "1604" "1629" "1600" "1564" "1595"
## [21] "1563" "1555" "1363" "1229" "1745" "1579" "1552" "1507" "1602" "1522"
## [31] "1494" "1441" "1449" "1399" "1438" "1355" "980"  "1423" "1436" "1348"
## [41] "1403" "1332" "1283" "1199" "1242" "377"  "1362" "1382" "1291" "1056"
## [51] "1011" "935"  "1393" "1270" "1186" "1153" "1092" "917"  "853"  "967" 
## [61] "955"  "1530" "1175" "1163"

Make and fill a vector for Average Pre Chess Rating of Opponents

This last vector was a lot more work than the other vectors as we first had to extract the section of each player’s data with their opponents’ ids. It took 2 str_extracts to get the opponent ids by themselves (together in a character vector for that player). Below is the code with the str_extract statements and examples of what a line in opponents_prerating looks like.

opponents_prerating <- str_extract(txt_data, ("(\\|[L,W,D]+ +[0-9]+).*\\|"))
opponents_prerating <- opponents_prerating[!is.na(opponents_prerating)]
opponents_prerating[1]
## [1] "|W  39|W  21|W  18|W  14|W   7|D  12|D   4|"
opponents_prerating <- str_extract_all(opponents_prerating,"[0-9]+")
opponents_prerating[1]
## [[1]]
## [1] "39" "21" "18" "14" "7"  "12" "4"

After getting it down to just the opponents’ id numbers for each player, I then needed to find the mean of each player’s opponents pre rating. In order to do this, I used 2 for loops (1 looping through the players and 1 looping around the opponents’ id for that player). For each opponent id in that player’s vector, I called the vector player_prerating I made earlier to get the opponent’s prerating. After the last opponent id in that player’s vector, I divided it by number of opponents and added it to avg_opponents_prerating. I made use of two temp variables for the number of opponents for each player and the combined prerating scores of their opponents that would reset after every player vector.

avg_opponents_prerating <- vector(length=length(opponents_prerating))
for (i in 1:length(opponents_prerating)){
  opponents_num <- 0 
  scores_combined <- 0 
  for (j in opponents_prerating[[i]]){
    opponents_num <- opponents_num+1
    scores_combined <- scores_combined+as.numeric(player_prerating[as.numeric(j[1])])
  }
  avg_opponents_prerating[i] <- scores_combined/opponents_num
}
avg_opponents_prerating <- round(avg_opponents_prerating, 0)
avg_opponents_prerating
##  [1] 1605 1469 1564 1574 1501 1519 1372 1468 1523 1554 1468 1506 1498 1515 1484
## [16] 1386 1499 1480 1426 1411 1470 1300 1214 1357 1363 1507 1222 1522 1314 1144
## [31] 1260 1379 1277 1375 1150 1388 1385 1539 1430 1391 1248 1150 1107 1327 1152
## [46] 1358 1392 1356 1286 1296 1356 1495 1345 1206 1406 1414 1363 1391 1319 1330
## [61] 1327 1186 1350 1263

Turn the 5 vectors into a dataframe

With all 5 vectors created and filled, it is now time to make the csv_data dataframe that will be saved as a csv later. Below is the final results of the collected data from tournamentinfo.txt

csv_data <- data.frame(player_names,player_states,points_total,player_prerating,avg_opponents_prerating)

colnames(csv_data)[1]="Player’s Name"
colnames(csv_data)[2]="Player’s State"
colnames(csv_data)[3]="Total Number of Points"
colnames(csv_data)[4]="Player’s Pre-Rating"
colnames(csv_data)[5]="Average Pre Chess Rating of Opponents"

csv_data
##                 Player’s Name Player’s State Total Number of Points
## 1                    GARY HUA             ON                    6.0
## 2             DAKSHESH DARURI             MI                    6.0
## 3                ADITYA BAJAJ             MI                    6.0
## 4         PATRICK H SCHILLING             MI                    5.5
## 5                  HANSHI ZUO             MI                    5.5
## 6                 HANSEN SONG             OH                    5.0
## 7           GARY DEE SWATHELL             MI                    5.0
## 8            EZEKIEL HOUGHTON             MI                    5.0
## 9                 STEFANO LEE             ON                    5.0
## 10                  ANVIT RAO             MI                    5.0
## 11   CAMERON WILLIAM MC LEMAN             MI                    4.5
## 12             KENNETH J TACK             MI                    4.5
## 13          TORRANCE HENRY JR             MI                    4.5
## 14               BRADLEY SHAW             MI                    4.5
## 15     ZACHARY JAMES HOUGHTON             MI                    4.5
## 16               MIKE NIKITIN             MI                    4.0
## 17         RONALD GRZEGORCZYK             MI                    4.0
## 18              DAVID SUNDEEN             MI                    4.0
## 19               DIPANKAR ROY             MI                    4.0
## 20                JASON ZHENG             MI                    4.0
## 21              DINH DANG BUI             ON                    4.0
## 22           EUGENE L MCCLURE             MI                    4.0
## 23                   ALAN BUI             ON                    4.0
## 24          MICHAEL R ALDRICH             MI                    4.0
## 25           LOREN SCHWIEBERT             MI                    3.5
## 26                    MAX ZHU             ON                    3.5
## 27             GAURAV GIDWANI             MI                    3.5
## 28 SOFIA ADINA STANESCU-BELLU             MI                    3.5
## 29           CHIEDOZIE OKORIE             MI                    3.5
## 30         GEORGE AVERY JONES             ON                    3.5
## 31               RISHI SHETTY             MI                    3.5
## 32      JOSHUA PHILIP MATHEWS             ON                    3.5
## 33                    JADE GE             MI                    3.5
## 34     MICHAEL JEFFERY THOMAS             MI                    3.5
## 35           JOSHUA DAVID LEE             MI                    3.5
## 36              SIDDHARTH JHA             MI                    3.5
## 37       AMIYATOSH PWNANANDAM             MI                    3.5
## 38                  BRIAN LIU             MI                    3.0
## 39              JOEL R HENDON             MI                    3.0
## 40               FOREST ZHANG             MI                    3.0
## 41        KYLE WILLIAM MURPHY             MI                    3.0
## 42                   JARED GE             MI                    3.0
## 43          ROBERT GLEN VASEY             MI                    3.0
## 44         JUSTIN D SCHILLING             MI                    3.0
## 45                  DEREK YAN             MI                    3.0
## 46   JACOB ALEXANDER LAVALLEY             MI                    3.0
## 47                ERIC WRIGHT             MI                    2.5
## 48               DANIEL KHAIN             MI                    2.5
## 49           MICHAEL J MARTIN             MI                    2.5
## 50                 SHIVAM JHA             MI                    2.5
## 51             TEJAS AYYAGARI             MI                    2.5
## 52                  ETHAN GUO             MI                    2.5
## 53              JOSE C YBARRA             MI                    2.0
## 54                LARRY HODGE             MI                    2.0
## 55                  ALEX KONG             MI                    2.0
## 56               MARISA RICCI             MI                    2.0
## 57                 MICHAEL LU             MI                    2.0
## 58               VIRAJ MOHILE             MI                    2.0
## 59          SEAN M MC CORMICK             MI                    2.0
## 60                 JULIA SHEN             MI                    1.5
## 61              JEZZEL FARKAS             ON                    1.5
## 62              ASHWIN BALAJI             MI                    1.0
## 63       THOMAS JOSEPH HOSMER             MI                    1.0
## 64                     BEN LI             MI                    1.0
##    Player’s Pre-Rating Average Pre Chess Rating of Opponents
## 1                 1794                                  1605
## 2                 1553                                  1469
## 3                 1384                                  1564
## 4                 1716                                  1574
## 5                 1655                                  1501
## 6                 1686                                  1519
## 7                 1649                                  1372
## 8                 1641                                  1468
## 9                 1411                                  1523
## 10                1365                                  1554
## 11                1712                                  1468
## 12                1663                                  1506
## 13                1666                                  1498
## 14                1610                                  1515
## 15                1220                                  1484
## 16                1604                                  1386
## 17                1629                                  1499
## 18                1600                                  1480
## 19                1564                                  1426
## 20                1595                                  1411
## 21                1563                                  1470
## 22                1555                                  1300
## 23                1363                                  1214
## 24                1229                                  1357
## 25                1745                                  1363
## 26                1579                                  1507
## 27                1552                                  1222
## 28                1507                                  1522
## 29                1602                                  1314
## 30                1522                                  1144
## 31                1494                                  1260
## 32                1441                                  1379
## 33                1449                                  1277
## 34                1399                                  1375
## 35                1438                                  1150
## 36                1355                                  1388
## 37                 980                                  1385
## 38                1423                                  1539
## 39                1436                                  1430
## 40                1348                                  1391
## 41                1403                                  1248
## 42                1332                                  1150
## 43                1283                                  1107
## 44                1199                                  1327
## 45                1242                                  1152
## 46                 377                                  1358
## 47                1362                                  1392
## 48                1382                                  1356
## 49                1291                                  1286
## 50                1056                                  1296
## 51                1011                                  1356
## 52                 935                                  1495
## 53                1393                                  1345
## 54                1270                                  1206
## 55                1186                                  1406
## 56                1153                                  1414
## 57                1092                                  1363
## 58                 917                                  1391
## 59                 853                                  1319
## 60                 967                                  1330
## 61                 955                                  1327
## 62                1530                                  1186
## 63                1175                                  1350
## 64                1163                                  1263

Write dataframe to csv

For the final step, using the readr library and the function write_csv, we can write csv_data into tournamentinfo.csv.

write_csv(csv_data, "tournamentinfo.csv")