First we must load the data from a .txt file. The hyphen in Stanescu-Bellu’s name was trouble so I replaced it with an underscore in the command line (Windows 10 PowerShell):
get-content .\tournamentinfo1.txt |%{$_ -replace “-”, “_“}|Set-Content .\tournamentinfo2.txt
I’m more familiar with sed() in Linux, so I had install PowerShell and get the syntax from https://stackoverflow.com/questions/15295958/get-content-multiple-replacements
The regular expressions means to look for a hyphen between word boundaries and replace it with an underscore. This can be fixed back at a latter time with the same command, albeit with the underscore and hyphen switched. The file name has a 1 added. I copied the data with a different name in case the replace function did not behave as expected, I could re-copy the original. Since this worked, I’ll use tournamentinfo2.txt from here on out.
You can change the separator of the read.csv function and trial-and-error indicated that the hyphens used to separate rows was the best character to use. This lead to me changing the hyphenated name above, since it was putting that person’s data into a different column.
library(readtext)
## Warning: package 'readtext' was built under R version 3.4.1
chess_data_raw <- read.csv("C:\\Users\\Nate\\Documents\\DataSet\\tournamentinfo2.txt", sep = "-")
#chess_data_raw
chess_data <- data.frame(chess_data_raw$X,chess_data_raw$X.1)
chess_data
## chess_data_raw.X
## 1 Pair | Player Name |Total|Round|Round|Round|Round|Round|Round|Round|
## 2 Num | USCF ID / Rtg (Pre
## 3
## 4 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|
## 5 ON | 15445895 / R: 1794
## 6
## 7 2 | DAKSHESH DARURI |6.0 |W 63|W 58|L 4|W 17|W 16|W 20|W 7|
## 8 MI | 14598900 / R: 1553
## 9
## 10 3 | ADITYA BAJAJ |6.0 |L 8|W 61|W 25|W 21|W 11|W 13|W 12|
## 11 MI | 14959604 / R: 1384
## 12
## 13 4 | PATRICK H SCHILLING |5.5 |W 23|D 28|W 2|W 26|D 5|W 19|D 1|
## 14 MI | 12616049 / R: 1716
## 15
## 16 5 | HANSHI ZUO |5.5 |W 45|W 37|D 12|D 13|D 4|W 14|W 17|
## 17 MI | 14601533 / R: 1655
## 18
## 19 6 | HANSEN SONG |5.0 |W 34|D 29|L 11|W 35|D 10|W 27|W 21|
## 20 OH | 15055204 / R: 1686
## 21
## 22 7 | GARY DEE SWATHELL |5.0 |W 57|W 46|W 13|W 11|L 1|W 9|L 2|
## 23 MI | 11146376 / R: 1649
## 24
## 25 8 | EZEKIEL HOUGHTON |5.0 |W 3|W 32|L 14|L 9|W 47|W 28|W 19|
## 26 MI | 15142253 / R: 1641P17
## 27
## 28 9 | STEFANO LEE |5.0 |W 25|L 18|W 59|W 8|W 26|L 7|W 20|
## 29 ON | 14954524 / R: 1411
## 30
## 31 10 | ANVIT RAO |5.0 |D 16|L 19|W 55|W 31|D 6|W 25|W 18|
## 32 MI | 14150362 / R: 1365
## 33
## 34 11 | CAMERON WILLIAM MC LEMAN |4.5 |D 38|W 56|W 6|L 7|L 3|W 34|W 26|
## 35 MI | 12581589 / R: 1712
## 36
## 37 12 | KENNETH J TACK |4.5 |W 42|W 33|D 5|W 38|H |D 1|L 3|
## 38 MI | 12681257 / R: 1663
## 39
## 40 13 | TORRANCE HENRY JR |4.5 |W 36|W 27|L 7|D 5|W 33|L 3|W 32|
## 41 MI | 15082995 / R: 1666
## 42
## 43 14 | BRADLEY SHAW |4.5 |W 54|W 44|W 8|L 1|D 27|L 5|W 31|
## 44 MI | 10131499 / R: 1610
## 45
## 46 15 | ZACHARY JAMES HOUGHTON |4.5 |D 19|L 16|W 30|L 22|W 54|W 33|W 38|
## 47 MI | 15619130 / R: 1220P13
## 48
## 49 16 | MIKE NIKITIN |4.0 |D 10|W 15|H |W 39|L 2|W 36|U |
## 50 MI | 10295068 / R: 1604
## 51
## 52 17 | RONALD GRZEGORCZYK |4.0 |W 48|W 41|L 26|L 2|W 23|W 22|L 5|
## 53 MI | 10297702 / R: 1629
## 54
## 55 18 | DAVID SUNDEEN |4.0 |W 47|W 9|L 1|W 32|L 19|W 38|L 10|
## 56 MI | 11342094 / R: 1600
## 57
## 58 19 | DIPANKAR ROY |4.0 |D 15|W 10|W 52|D 28|W 18|L 4|L 8|
## 59 MI | 14862333 / R: 1564
## 60
## 61 20 | JASON ZHENG |4.0 |L 40|W 49|W 23|W 41|W 28|L 2|L 9|
## 62 MI | 14529060 / R: 1595
## 63
## 64 21 | DINH DANG BUI |4.0 |W 43|L 1|W 47|L 3|W 40|W 39|L 6|
## 65 ON | 15495066 / R: 1563P22
## 66
## 67 22 | EUGENE L MCCLURE |4.0 |W 64|D 52|L 28|W 15|H |L 17|W 40|
## 68 MI | 12405534 / R: 1555
## 69
## 70 23 | ALAN BUI |4.0 |L 4|W 43|L 20|W 58|L 17|W 37|W 46|
## 71 ON | 15030142 / R: 1363
## 72
## 73 24 | MICHAEL R ALDRICH |4.0 |L 28|L 47|W 43|L 25|W 60|W 44|W 39|
## 74 MI | 13469010 / R: 1229
## 75
## 76 25 | LOREN SCHWIEBERT |3.5 |L 9|W 53|L 3|W 24|D 34|L 10|W 47|
## 77 MI | 12486656 / R: 1745
## 78
## 79 26 | MAX ZHU |3.5 |W 49|W 40|W 17|L 4|L 9|D 32|L 11|
## 80 ON | 15131520 / R: 1579
## 81
## 82 27 | GAURAV GIDWANI |3.5 |W 51|L 13|W 46|W 37|D 14|L 6|U |
## 83 MI | 14476567 / R: 1552
## 84
## 85 28 | SOFIA ADINA STANESCU_BELLU |3.5 |W 24|D 4|W 22|D 19|L 20|L 8|D 36|
## 86 MI | 14882954 / R: 1507
## 87
## 88 29 | CHIEDOZIE OKORIE |3.5 |W 50|D 6|L 38|L 34|W 52|W 48|U |
## 89 MI | 15323285 / R: 1602P6
## 90
## 91 30 | GEORGE AVERY JONES |3.5 |L 52|D 64|L 15|W 55|L 31|W 61|W 50|
## 92 ON | 12577178 / R: 1522
## 93
## 94 31 | RISHI SHETTY |3.5 |L 58|D 55|W 64|L 10|W 30|W 50|L 14|
## 95 MI | 15131618 / R: 1494
## 96
## 97 32 | JOSHUA PHILIP MATHEWS |3.5 |W 61|L 8|W 44|L 18|W 51|D 26|L 13|
## 98 ON | 14073750 / R: 1441
## 99
## 100 33 | JADE GE |3.5 |W 60|L 12|W 50|D 36|L 13|L 15|W 51|
## 101 MI | 14691842 / R: 1449
## 102
## 103 34 | MICHAEL JEFFERY THOMAS |3.5 |L 6|W 60|L 37|W 29|D 25|L 11|W 52|
## 104 MI | 15051807 / R: 1399
## 105
## 106 35 | JOSHUA DAVID LEE |3.5 |L 46|L 38|W 56|L 6|W 57|D 52|W 48|
## 107 MI | 14601397 / R: 1438
## 108
## 109 36 | SIDDHARTH JHA |3.5 |L 13|W 57|W 51|D 33|H |L 16|D 28|
## 110 MI | 14773163 / R: 1355
## 111
## 112 37 | AMIYATOSH PWNANANDAM |3.5 |B |L 5|W 34|L 27|H |L 23|W 61|
## 113 MI | 15489571 / R: 980P12
## 114
## 115 38 | BRIAN LIU |3.0 |D 11|W 35|W 29|L 12|H |L 18|L 15|
## 116 MI | 15108523 / R: 1423
## 117
## 118 39 | JOEL R HENDON |3.0 |L 1|W 54|W 40|L 16|W 44|L 21|L 24|
## 119 MI | 12923035 / R: 1436P23
## 120
## 121 40 | FOREST ZHANG |3.0 |W 20|L 26|L 39|W 59|L 21|W 56|L 22|
## 122 MI | 14892710 / R: 1348
## 123
## 124 41 | KYLE WILLIAM MURPHY |3.0 |W 59|L 17|W 58|L 20|X |U |U |
## 125 MI | 15761443 / R: 1403P5
## 126
## 127 42 | JARED GE |3.0 |L 12|L 50|L 57|D 60|D 61|W 64|W 56|
## 128 MI | 14462326 / R: 1332
## 129
## 130 43 | ROBERT GLEN VASEY |3.0 |L 21|L 23|L 24|W 63|W 59|L 46|W 55|
## 131 MI | 14101068 / R: 1283
## 132
## 133 44 | JUSTIN D SCHILLING |3.0 |B |L 14|L 32|W 53|L 39|L 24|W 59|
## 134 MI | 15323504 / R: 1199
## 135
## 136 45 | DEREK YAN |3.0 |L 5|L 51|D 60|L 56|W 63|D 55|W 58|
## 137 MI | 15372807 / R: 1242
## 138
## 139 46 | JACOB ALEXANDER LAVALLEY |3.0 |W 35|L 7|L 27|L 50|W 64|W 43|L 23|
## 140 MI | 15490981 / R: 377P3
## 141
## 142 47 | ERIC WRIGHT |2.5 |L 18|W 24|L 21|W 61|L 8|D 51|L 25|
## 143 MI | 12533115 / R: 1362
## 144
## 145 48 | DANIEL KHAIN |2.5 |L 17|W 63|H |D 52|H |L 29|L 35|
## 146 MI | 14369165 / R: 1382
## 147
## 148 49 | MICHAEL J MARTIN |2.5 |L 26|L 20|D 63|D 64|W 58|H |U |
## 149 MI | 12531685 / R: 1291P12
## 150
## 151 50 | SHIVAM JHA |2.5 |L 29|W 42|L 33|W 46|H |L 31|L 30|
## 152 MI | 14773178 / R: 1056
## 153
## 154 51 | TEJAS AYYAGARI |2.5 |L 27|W 45|L 36|W 57|L 32|D 47|L 33|
## 155 MI | 15205474 / R: 1011
## 156
## 157 52 | ETHAN GUO |2.5 |W 30|D 22|L 19|D 48|L 29|D 35|L 34|
## 158 MI | 14918803 / R: 935
## 159
## 160 53 | JOSE C YBARRA |2.0 |H |L 25|H |L 44|U |W 57|U |
## 161 MI | 12578849 / R: 1393
## 162
## 163 54 | LARRY HODGE |2.0 |L 14|L 39|L 61|B |L 15|L 59|W 64|
## 164 MI | 12836773 / R: 1270
## 165
## 166 55 | ALEX KONG |2.0 |L 62|D 31|L 10|L 30|B |D 45|L 43|
## 167 MI | 15412571 / R: 1186
## 168
## 169 56 | MARISA RICCI |2.0 |H |L 11|L 35|W 45|H |L 40|L 42|
## 170 MI | 14679887 / R: 1153
## 171
## 172 57 | MICHAEL LU |2.0 |L 7|L 36|W 42|L 51|L 35|L 53|B |
## 173 MI | 15113330 / R: 1092
## 174
## 175 58 | VIRAJ MOHILE |2.0 |W 31|L 2|L 41|L 23|L 49|B |L 45|
## 176 MI | 14700365 / R: 917
## 177
## 178 59 | SEAN M MC CORMICK |2.0 |L 41|B |L 9|L 40|L 43|W 54|L 44|
## 179 MI | 12841036 / R: 853
## 180
## 181 60 | JULIA SHEN |1.5 |L 33|L 34|D 45|D 42|L 24|H |U |
## 182 MI | 14579262 / R: 967
## 183
## 184 61 | JEZZEL FARKAS |1.5 |L 32|L 3|W 54|L 47|D 42|L 30|L 37|
## 185 ON | 15771592 / R: 955P11
## 186
## 187 62 | ASHWIN BALAJI |1.0 |W 55|U |U |U |U |U |U |
## 188 MI | 15219542 / R: 1530
## 189
## 190 63 | THOMAS JOSEPH HOSMER |1.0 |L 2|L 48|D 49|L 43|L 45|H |U |
## 191 MI | 15057092 / R: 1175
## 192
## 193 64 | BEN LI |1.0 |L 22|D 30|L 31|D 49|L 46|L 42|L 54|
## 194 MI | 15006561 / R: 1163
## 195
## chess_data_raw.X.1
## 1
## 2 >Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
## 3
## 4
## 5 >1817 |N:2 |W |B |W |B |W |B |W |
## 6
## 7
## 8 >1663 |N:2 |B |W |B |W |B |W |B |
## 9
## 10
## 11 >1640 |N:2 |W |B |W |B |W |B |W |
## 12
## 13
## 14 >1744 |N:2 |W |B |W |B |W |B |B |
## 15
## 16
## 17 >1690 |N:2 |B |W |B |W |B |W |B |
## 18
## 19
## 20 >1687 |N:3 |W |B |W |B |B |W |B |
## 21
## 22
## 23 >1673 |N:3 |W |B |W |B |B |W |W |
## 24
## 25
## 26 >1657P24 |N:3 |B |W |B |W |B |W |W |
## 27
## 28
## 29 >1564 |N:2 |W |B |W |B |W |B |B |
## 30
## 31
## 32 >1544 |N:3 |W |W |B |B |W |B |W |
## 33
## 34
## 35 >1696 |N:3 |B |W |B |W |B |W |B |
## 36
## 37
## 38 >1670 |N:3 |W |B |W |B | |W |B |
## 39
## 40
## 41 >1662 |N:3 |B |W |B |B |W |W |B |
## 42
## 43
## 44 >1618 |N:3 |W |B |W |W |B |B |W |
## 45
## 46
## 47 >1416P20 |N:3 |B |B |W |W |B |B |W |
## 48
## 49
## 50 >1613 |N:3 |B |W | |B |W |B | |
## 51
## 52
## 53 >1610 |N:3 |W |B |W |B |W |B |W |
## 54
## 55
## 56 >1600 |N:3 |B |W |B |W |B |W |B |
## 57
## 58
## 59 >1570 |N:3 |W |B |W |B |W |W |B |
## 60
## 61
## 62 >1569 |N:4 |W |B |W |B |W |B |W |
## 63
## 64
## 65 >1562 |N:3 |B |W |B |W |W |B |W |
## 66
## 67
## 68 >1529 |N:4 |W |B |W |B | |W |B |
## 69
## 70
## 71 >1371 | |B |W |B |W |B |W |B |
## 72
## 73
## 74 >1300 |N:4 |B |W |B |B |W |W |B |
## 75
## 76
## 77 >1681 |N:4 |B |W |B |W |B |W |B |
## 78
## 79
## 80 >1564 |N:4 |B |W |B |W |B |W |W |
## 81
## 82
## 83 >1539 |N:4 |W |B |W |B |W |B | |
## 84
## 85
## 86 >1513 |N:3 |W |W |B |W |B |B |W |
## 87
## 88
## 89 >1508P12 |N:4 |B |W |B |W |W |B | |
## 90
## 91
## 92 >1444 | |W |B |B |W |W |B |B |
## 93
## 94
## 95 >1444 | |B |W |B |W |B |W |B |
## 96
## 97
## 98 >1433 |N:4 |W |B |W |B |W |B |W |
## 99
## 100
## 101 >1421 | |B |W |B |W |B |W |B |
## 102
## 103
## 104 >1400 | |B |W |B |B |W |B |W |
## 105
## 106
## 107 >1392 | |W |W |B |W |B |B |W |
## 108
## 109
## 110 >1367 |N:4 |W |B |W |B | |W |B |
## 111
## 112
## 113 >1077P17 | | |B |W |W | |B |W |
## 114
## 115
## 116 >1439 |N:4 |W |B |W |W | |B |B |
## 117
## 118
## 119 >1413 |N:4 |B |W |B |W |B |W |W |
## 120
## 121
## 122 >1346 | |B |B |W |W |B |W |W |
## 123
## 124
## 125 >1341P9 | |B |W |B |W | | | |
## 126
## 127
## 128 >1256 | |B |W |B |B |W |W |B |
## 129
## 130
## 131 >1244 | |W |B |W |W |B |B |W |
## 132
## 133
## 134 >1199 | | |W |B |B |W |B |W |
## 135
## 136
## 137 >1191 | |W |B |W |B |W |B |W |
## 138
## 139
## 140 >1076P10 | |B |W |B |W |B |W |W |
## 141
## 142
## 143 >1341 | |W |B |W |B |W |B |W |
## 144
## 145
## 146 >1335 | |B |W | |B | |W |B |
## 147
## 148
## 149 >1259P17 | |W |W |B |W |B | | |
## 150
## 151
## 152 >1111 | |W |B |W |B | |B |W |
## 153
## 154
## 155 >1097 | |B |W |B |W |B |W |W |
## 156
## 157
## 158 >1092 |N:4 |B |W |B |W |B |W |B |
## 159
## 160
## 161 >1359 | | |B | |W | |W | |
## 162
## 163
## 164 >1200 | |B |B |W | |W |B |W |
## 165
## 166
## 167 >1163 | |W |B |W |B | |W |B |
## 168
## 169
## 170 >1140 | | |B |W |W | |B |W |
## 171
## 172
## 173 >1079 | |B |W |W |B |W |B | |
## 174
## 175
## 176 > 941 | |W |B |W |B |W | |B |
## 177
## 178
## 179 > 878 | |W | |B |B |W |W |B |
## 180
## 181
## 182 > 984 | |W |B |B |W |B | | |
## 183
## 184
## 185 > 979P18 | |B |W |B |W |B |W |B |
## 186
## 187
## 188 >1535 | |B | | | | | | |
## 189
## 190
## 191 >1125 | |W |B |W |B |B | | |
## 192
## 193
## 194 >1112 | |B |W |W |B |W |B |B |
## 195
Now to use stringr to pull the desired data from the data frame, and put it into a better organized data frame. Using str_extract() will produce a lot of NA’s. In this application these just mean an entry did not match our regular expression. I use !is.na() to remove these data. Also note that the regular expression that captures the name also captures it’s row header. I reorganize that list to omit the unwanted data. The regular expressions (regex) used are explained in the code comments
I adapted removing NA’s from the lists from: https://stackoverflow.com/questions/8184483/how-to-remove-all-the-na-from-a-vector
library(stringr)
## Warning: package 'stringr' was built under R version 3.4.1
#Names are upper case letters seperated by spaces that go into another upper case letter with a possible underscore.
chess_names <- str_extract(chess_data$chess_data_raw.X, "[[:upper:][:blank:]]{4,}[[:upper:]][_[:upper:]]+")
chess_names <- chess_names[!is.na(chess_names)]
chess_names <- chess_names[2:65]
#States were all either Michigan, Ontario and 1 from Ohio. I felt comfortable being more specific with these strings since I did not want to accidently match part of a name. The string is sourounded by whitespace which is alo reflected in the regex.
states <- str_extract(chess_data$chess_data_raw.X, " [MI]{2} | [ONH]{2} ")
states <- states[!is.na(states)]
#Points were decimal numbers so we look for digits with a period in between.
chess_points <- str_extract(chess_data$chess_data_raw.X, "([0-9]{1}\\.{1}[0-9]{1})")
chess_points <- chess_points[!is.na(chess_points)]
#Pre touney ELOs are preceded by a ":" and white space followed by digits.
pre_rating <- str_extract(chess_data$chess_data_raw.X, ": +[0-9]+")
pre_rating <- pre_rating[!is.na(pre_rating)]
#We can now put together the data frame with column titles
chess_df <- data.frame("Names" = chess_names, "State" = states, "Points" = chess_points, "Pre-ELO" = pre_rating)
#Clean up the data by dropping the colons picked up in the str_extract
chess_df[,4] <- str_replace(chess_df[,4], pattern = ": ", replacement = "")
#Since the hyphen is no longer a problem, we can replace the underscore
chess_df[,1] <- str_replace(chess_df[,1], pattern = "_", replacement = "-")
# We also need to clean the data by converting from factors to appropraite data type, numeric for Score and Integer for ELO
chess_df[,3] <- as.numeric(as.character(chess_df[,3]))
chess_df[,4] <- as.integer(chess_df[,4])
chess_df
## Names State Points Pre.ELO
## 1 GARY HUA ON 6.0 1794
## 2 DAKSHESH DARURI MI 6.0 1553
## 3 ADITYA BAJAJ MI 6.0 1384
## 4 PATRICK H SCHILLING MI 5.5 1716
## 5 HANSHI ZUO MI 5.5 1655
## 6 HANSEN SONG OH 5.0 1686
## 7 GARY DEE SWATHELL MI 5.0 1649
## 8 EZEKIEL HOUGHTON MI 5.0 1641
## 9 STEFANO LEE ON 5.0 1411
## 10 ANVIT RAO MI 5.0 1365
## 11 CAMERON WILLIAM MC LEMAN MI 4.5 1712
## 12 KENNETH J TACK MI 4.5 1663
## 13 TORRANCE HENRY JR MI 4.5 1666
## 14 BRADLEY SHAW MI 4.5 1610
## 15 ZACHARY JAMES HOUGHTON MI 4.5 1220
## 16 MIKE NIKITIN MI 4.0 1604
## 17 RONALD GRZEGORCZYK MI 4.0 1629
## 18 DAVID SUNDEEN MI 4.0 1600
## 19 DIPANKAR ROY MI 4.0 1564
## 20 JASON ZHENG MI 4.0 1595
## 21 DINH DANG BUI ON 4.0 1563
## 22 EUGENE L MCCLURE MI 4.0 1555
## 23 ALAN BUI ON 4.0 1363
## 24 MICHAEL R ALDRICH MI 4.0 1229
## 25 LOREN SCHWIEBERT MI 3.5 1745
## 26 MAX ZHU ON 3.5 1579
## 27 GAURAV GIDWANI MI 3.5 1552
## 28 SOFIA ADINA STANESCU-BELLU MI 3.5 1507
## 29 CHIEDOZIE OKORIE MI 3.5 1602
## 30 GEORGE AVERY JONES ON 3.5 1522
## 31 RISHI SHETTY MI 3.5 1494
## 32 JOSHUA PHILIP MATHEWS ON 3.5 1441
## 33 JADE GE MI 3.5 1449
## 34 MICHAEL JEFFERY THOMAS MI 3.5 1399
## 35 JOSHUA DAVID LEE MI 3.5 1438
## 36 SIDDHARTH JHA MI 3.5 1355
## 37 AMIYATOSH PWNANANDAM MI 3.5 980
## 38 BRIAN LIU MI 3.0 1423
## 39 JOEL R HENDON MI 3.0 1436
## 40 FOREST ZHANG MI 3.0 1348
## 41 KYLE WILLIAM MURPHY MI 3.0 1403
## 42 JARED GE MI 3.0 1332
## 43 ROBERT GLEN VASEY MI 3.0 1283
## 44 JUSTIN D SCHILLING MI 3.0 1199
## 45 DEREK YAN MI 3.0 1242
## 46 JACOB ALEXANDER LAVALLEY MI 3.0 377
## 47 ERIC WRIGHT MI 2.5 1362
## 48 DANIEL KHAIN MI 2.5 1382
## 49 MICHAEL J MARTIN MI 2.5 1291
## 50 SHIVAM JHA MI 2.5 1056
## 51 TEJAS AYYAGARI MI 2.5 1011
## 52 ETHAN GUO MI 2.5 935
## 53 JOSE C YBARRA MI 2.0 1393
## 54 LARRY HODGE MI 2.0 1270
## 55 ALEX KONG MI 2.0 1186
## 56 MARISA RICCI MI 2.0 1153
## 57 MICHAEL LU MI 2.0 1092
## 58 VIRAJ MOHILE MI 2.0 917
## 59 SEAN M MC CORMICK MI 2.0 853
## 60 JULIA SHEN MI 1.5 967
## 61 JEZZEL FARKAS ON 1.5 955
## 62 ASHWIN BALAJI MI 1.0 1530
## 63 THOMAS JOSEPH HOSMER MI 1.0 1175
## 64 BEN LI MI 1.0 1163
Just out of curiosity I want to know the distribution of the score and the ELO ratings.
hist(chess_df[,3], xlab = "Score", main= "Histogram of Score")
hist(chess_df[,4], xlab= "ELO Rating", main = "Histogram of ELO Rating")
We see the ELO ratings are left skewed with an outlier \(< 500\) which might explain the slight asymmetry to the left in the score. The top two bins in ELO may have been dominating the bottom 4 bins.
Now We will calculate the average opponent ELO score. This requires more treatment than the other categories so I opted to put it in it’s own section.
#First I extract all the 1 or 2 digit numbers ending with a "|"
op_id <- str_extract_all(chess_data$chess_data_raw.X, "\\d{1,2}[\\|]")
#Next I get rid of all the empty lists, I adapted this line of code from: https://stackoverflow.com/questions/19023446/remove-empty-elements-from-list-with-character0
op_id <- op_id[lapply(op_id, length)>0]
#Next I get rid of the pipes one line had only one element so I had to make an or statement for that case. Also when I saved op_id as an integer data frame it was set to chacater type instead of factor, which made the following code easier.
op_id <- as.data.frame.integer(gsub("\"(\\d{1,2})\\|\"|(\\d{1,2})[\\|]", "\\1 \\2" ,op_id))
#I am going to use embedded for loops to parse through a list of lists. Once I have the oponent ID, I'll reference chess_df to get thier ELO and add them up and divide by total number of oppenents.
op_ave <- integer(0) #Initialized here for scope.
for(i in 1:length(op_id[,1])){
numbers <- as.vector(str_extract_all(op_id[i,1], "\\d{1,2}")) #Removes spaces in the strings.
for(n in numbers){
tot = 0 #total opponent score
for(j in n){
#This gets the ELO from the op_id and totals them
tot = tot + chess_df[as.integer(j),4]
}
#This gets the average from number of opponents, and stores it in a vector.
ave = as.integer(tot/length(n))
op_ave[i] <- ave
}
}
#Now I update my chess_df
chess_df$Op_Ave <- op_ave
chess_df
## Names State Points Pre.ELO Op_Ave
## 1 GARY HUA ON 6.0 1794 1605
## 2 DAKSHESH DARURI MI 6.0 1553 1469
## 3 ADITYA BAJAJ MI 6.0 1384 1563
## 4 PATRICK H SCHILLING MI 5.5 1716 1573
## 5 HANSHI ZUO MI 5.5 1655 1500
## 6 HANSEN SONG OH 5.0 1686 1518
## 7 GARY DEE SWATHELL MI 5.0 1649 1372
## 8 EZEKIEL HOUGHTON MI 5.0 1641 1468
## 9 STEFANO LEE ON 5.0 1411 1523
## 10 ANVIT RAO MI 5.0 1365 1554
## 11 CAMERON WILLIAM MC LEMAN MI 4.5 1712 1467
## 12 KENNETH J TACK MI 4.5 1663 1506
## 13 TORRANCE HENRY JR MI 4.5 1666 1497
## 14 BRADLEY SHAW MI 4.5 1610 1515
## 15 ZACHARY JAMES HOUGHTON MI 4.5 1220 1483
## 16 MIKE NIKITIN MI 4.0 1604 1385
## 17 RONALD GRZEGORCZYK MI 4.0 1629 1498
## 18 DAVID SUNDEEN MI 4.0 1600 1480
## 19 DIPANKAR ROY MI 4.0 1564 1426
## 20 JASON ZHENG MI 4.0 1595 1410
## 21 DINH DANG BUI ON 4.0 1563 1470
## 22 EUGENE L MCCLURE MI 4.0 1555 1300
## 23 ALAN BUI ON 4.0 1363 1213
## 24 MICHAEL R ALDRICH MI 4.0 1229 1357
## 25 LOREN SCHWIEBERT MI 3.5 1745 1363
## 26 MAX ZHU ON 3.5 1579 1506
## 27 GAURAV GIDWANI MI 3.5 1552 1221
## 28 SOFIA ADINA STANESCU-BELLU MI 3.5 1507 1522
## 29 CHIEDOZIE OKORIE MI 3.5 1602 1313
## 30 GEORGE AVERY JONES ON 3.5 1522 1144
## 31 RISHI SHETTY MI 3.5 1494 1259
## 32 JOSHUA PHILIP MATHEWS ON 3.5 1441 1378
## 33 JADE GE MI 3.5 1449 1276
## 34 MICHAEL JEFFERY THOMAS MI 3.5 1399 1375
## 35 JOSHUA DAVID LEE MI 3.5 1438 1149
## 36 SIDDHARTH JHA MI 3.5 1355 1388
## 37 AMIYATOSH PWNANANDAM MI 3.5 980 1384
## 38 BRIAN LIU MI 3.0 1423 1539
## 39 JOEL R HENDON MI 3.0 1436 1429
## 40 FOREST ZHANG MI 3.0 1348 1390
## 41 KYLE WILLIAM MURPHY MI 3.0 1403 1248
## 42 JARED GE MI 3.0 1332 1149
## 43 ROBERT GLEN VASEY MI 3.0 1283 1106
## 44 JUSTIN D SCHILLING MI 3.0 1199 1327
## 45 DEREK YAN MI 3.0 1242 1152
## 46 JACOB ALEXANDER LAVALLEY MI 3.0 377 1357
## 47 ERIC WRIGHT MI 2.5 1362 1392
## 48 DANIEL KHAIN MI 2.5 1382 1355
## 49 MICHAEL J MARTIN MI 2.5 1291 1285
## 50 SHIVAM JHA MI 2.5 1056 1296
## 51 TEJAS AYYAGARI MI 2.5 1011 1356
## 52 ETHAN GUO MI 2.5 935 1494
## 53 JOSE C YBARRA MI 2.0 1393 1345
## 54 LARRY HODGE MI 2.0 1270 1206
## 55 ALEX KONG MI 2.0 1186 1406
## 56 MARISA RICCI MI 2.0 1153 1414
## 57 MICHAEL LU MI 2.0 1092 1363
## 58 VIRAJ MOHILE MI 2.0 917 1391
## 59 SEAN M MC CORMICK MI 2.0 853 1319
## 60 JULIA SHEN MI 1.5 967 1330
## 61 JEZZEL FARKAS ON 1.5 955 1327
## 62 ASHWIN BALAJI MI 1.0 1530 1186
## 63 THOMAS JOSEPH HOSMER MI 1.0 1175 1350
## 64 BEN LI MI 1.0 1163 1263
I’m curious how fair they made it so I want to look at the ratio of Player ELO to Opponent Average ELO.
ELO_ratio <- chess_df$Pre.ELO/chess_df$Op_Ave
hist(ELO_ratio, xlab = "ELO/(Ave Oponent ELO")
plot(chess_df$Points, ELO_ratio)
fit <- lm(ELO_ratio ~ chess_df$Points)
summary(fit)
##
## Call:
## lm(formula = ELO_ratio ~ chess_df$Points)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.70202 -0.09233 0.02735 0.10655 0.41944
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.81600 0.06762 12.067 < 2e-16 ***
## chess_df$Points 0.05461 0.01853 2.947 0.00452 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1814 on 62 degrees of freedom
## Multiple R-squared: 0.1228, Adjusted R-squared: 0.1087
## F-statistic: 8.683 on 1 and 62 DF, p-value: 0.00452
The distribution is centered on 1, which seems pretty fair. The linear model does seem to show a significant relationship between Player ELO and Average Opponent ELO. If Player ELO is greater than average opponent ELO, you would expect a higher win percentage. Relative ELO does correspond to win percentage. The ELO system seems to work pretty well and it seems that organizers did a fairly good job matching players.
Finally to make the .csv
write.csv(chess_df, "tournamentinfo.csv")