DATA 607 Project 1

Pre-treatment of the data.

First we must load the data from a .txt file. The hyphen in Stanescu-Bellu’s name was trouble so I replaced it with an underscore in the command line (Windows 10 PowerShell):

get-content .\tournamentinfo1.txt |%{$_ -replace “-”, “_“}|Set-Content .\tournamentinfo2.txt

I’m more familiar with sed() in Linux, so I had install PowerShell and get the syntax from https://stackoverflow.com/questions/15295958/get-content-multiple-replacements

The regular expressions means to look for a hyphen between word boundaries and replace it with an underscore. This can be fixed back at a latter time with the same command, albeit with the underscore and hyphen switched. The file name has a 1 added. I copied the data with a different name in case the replace function did not behave as expected, I could re-copy the original. Since this worked, I’ll use tournamentinfo2.txt from here on out.

Reading in the .txt file

You can change the separator of the read.csv function and trial-and-error indicated that the hyphens used to separate rows was the best character to use. This lead to me changing the hyphenated name above, since it was putting that person’s data into a different column.

library(readtext)

## Warning: package 'readtext' was built under R version 3.4.1

chess_data_raw <- read.csv("C:\\Users\\Nate\\Documents\\DataSet\\tournamentinfo2.txt", sep = "-")
#chess_data_raw
chess_data <- data.frame(chess_data_raw$X,chess_data_raw$X.1)
chess_data

##                                                                               chess_data_raw.X
## 1    Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 
## 2                                                                    Num  | USCF ID / Rtg (Pre
## 3                                                                                             
## 4        1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## 5                                                                   ON | 15445895 / R: 1794   
## 6                                                                                             
## 7        2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|
## 8                                                                   MI | 14598900 / R: 1553   
## 9                                                                                             
## 10       3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12|
## 11                                                                  MI | 14959604 / R: 1384   
## 12                                                                                            
## 13       4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1|
## 14                                                                  MI | 12616049 / R: 1716   
## 15                                                                                            
## 16       5 | HANSHI ZUO                      |5.5  |W  45|W  37|D  12|D  13|D   4|W  14|W  17|
## 17                                                                  MI | 14601533 / R: 1655   
## 18                                                                                            
## 19       6 | HANSEN SONG                     |5.0  |W  34|D  29|L  11|W  35|D  10|W  27|W  21|
## 20                                                                  OH | 15055204 / R: 1686   
## 21                                                                                            
## 22       7 | GARY DEE SWATHELL               |5.0  |W  57|W  46|W  13|W  11|L   1|W   9|L   2|
## 23                                                                  MI | 11146376 / R: 1649   
## 24                                                                                            
## 25       8 | EZEKIEL HOUGHTON                |5.0  |W   3|W  32|L  14|L   9|W  47|W  28|W  19|
## 26                                                                  MI | 15142253 / R: 1641P17
## 27                                                                                            
## 28       9 | STEFANO LEE                     |5.0  |W  25|L  18|W  59|W   8|W  26|L   7|W  20|
## 29                                                                  ON | 14954524 / R: 1411   
## 30                                                                                            
## 31      10 | ANVIT RAO                       |5.0  |D  16|L  19|W  55|W  31|D   6|W  25|W  18|
## 32                                                                  MI | 14150362 / R: 1365   
## 33                                                                                            
## 34      11 | CAMERON WILLIAM MC LEMAN        |4.5  |D  38|W  56|W   6|L   7|L   3|W  34|W  26|
## 35                                                                  MI | 12581589 / R: 1712   
## 36                                                                                            
## 37      12 | KENNETH J TACK                  |4.5  |W  42|W  33|D   5|W  38|H    |D   1|L   3|
## 38                                                                  MI | 12681257 / R: 1663   
## 39                                                                                            
## 40      13 | TORRANCE HENRY JR               |4.5  |W  36|W  27|L   7|D   5|W  33|L   3|W  32|
## 41                                                                  MI | 15082995 / R: 1666   
## 42                                                                                            
## 43      14 | BRADLEY SHAW                    |4.5  |W  54|W  44|W   8|L   1|D  27|L   5|W  31|
## 44                                                                  MI | 10131499 / R: 1610   
## 45                                                                                            
## 46      15 | ZACHARY JAMES HOUGHTON          |4.5  |D  19|L  16|W  30|L  22|W  54|W  33|W  38|
## 47                                                                  MI | 15619130 / R: 1220P13
## 48                                                                                            
## 49      16 | MIKE NIKITIN                    |4.0  |D  10|W  15|H    |W  39|L   2|W  36|U    |
## 50                                                                  MI | 10295068 / R: 1604   
## 51                                                                                            
## 52      17 | RONALD GRZEGORCZYK              |4.0  |W  48|W  41|L  26|L   2|W  23|W  22|L   5|
## 53                                                                  MI | 10297702 / R: 1629   
## 54                                                                                            
## 55      18 | DAVID SUNDEEN                   |4.0  |W  47|W   9|L   1|W  32|L  19|W  38|L  10|
## 56                                                                  MI | 11342094 / R: 1600   
## 57                                                                                            
## 58      19 | DIPANKAR ROY                    |4.0  |D  15|W  10|W  52|D  28|W  18|L   4|L   8|
## 59                                                                  MI | 14862333 / R: 1564   
## 60                                                                                            
## 61      20 | JASON ZHENG                     |4.0  |L  40|W  49|W  23|W  41|W  28|L   2|L   9|
## 62                                                                  MI | 14529060 / R: 1595   
## 63                                                                                            
## 64      21 | DINH DANG BUI                   |4.0  |W  43|L   1|W  47|L   3|W  40|W  39|L   6|
## 65                                                                  ON | 15495066 / R: 1563P22
## 66                                                                                            
## 67      22 | EUGENE L MCCLURE                |4.0  |W  64|D  52|L  28|W  15|H    |L  17|W  40|
## 68                                                                  MI | 12405534 / R: 1555   
## 69                                                                                            
## 70      23 | ALAN BUI                        |4.0  |L   4|W  43|L  20|W  58|L  17|W  37|W  46|
## 71                                                                  ON | 15030142 / R: 1363   
## 72                                                                                            
## 73      24 | MICHAEL R ALDRICH               |4.0  |L  28|L  47|W  43|L  25|W  60|W  44|W  39|
## 74                                                                  MI | 13469010 / R: 1229   
## 75                                                                                            
## 76      25 | LOREN SCHWIEBERT                |3.5  |L   9|W  53|L   3|W  24|D  34|L  10|W  47|
## 77                                                                  MI | 12486656 / R: 1745   
## 78                                                                                            
## 79      26 | MAX ZHU                         |3.5  |W  49|W  40|W  17|L   4|L   9|D  32|L  11|
## 80                                                                  ON | 15131520 / R: 1579   
## 81                                                                                            
## 82      27 | GAURAV GIDWANI                  |3.5  |W  51|L  13|W  46|W  37|D  14|L   6|U    |
## 83                                                                  MI | 14476567 / R: 1552   
## 84                                                                                            
## 85      28 | SOFIA ADINA STANESCU_BELLU      |3.5  |W  24|D   4|W  22|D  19|L  20|L   8|D  36|
## 86                                                                  MI | 14882954 / R: 1507   
## 87                                                                                            
## 88      29 | CHIEDOZIE OKORIE                |3.5  |W  50|D   6|L  38|L  34|W  52|W  48|U    |
## 89                                                                  MI | 15323285 / R: 1602P6 
## 90                                                                                            
## 91      30 | GEORGE AVERY JONES              |3.5  |L  52|D  64|L  15|W  55|L  31|W  61|W  50|
## 92                                                                  ON | 12577178 / R: 1522   
## 93                                                                                            
## 94      31 | RISHI SHETTY                    |3.5  |L  58|D  55|W  64|L  10|W  30|W  50|L  14|
## 95                                                                  MI | 15131618 / R: 1494   
## 96                                                                                            
## 97      32 | JOSHUA PHILIP MATHEWS           |3.5  |W  61|L   8|W  44|L  18|W  51|D  26|L  13|
## 98                                                                  ON | 14073750 / R: 1441   
## 99                                                                                            
## 100     33 | JADE GE                         |3.5  |W  60|L  12|W  50|D  36|L  13|L  15|W  51|
## 101                                                                 MI | 14691842 / R: 1449   
## 102                                                                                           
## 103     34 | MICHAEL JEFFERY THOMAS          |3.5  |L   6|W  60|L  37|W  29|D  25|L  11|W  52|
## 104                                                                 MI | 15051807 / R: 1399   
## 105                                                                                           
## 106     35 | JOSHUA DAVID LEE                |3.5  |L  46|L  38|W  56|L   6|W  57|D  52|W  48|
## 107                                                                 MI | 14601397 / R: 1438   
## 108                                                                                           
## 109     36 | SIDDHARTH JHA                   |3.5  |L  13|W  57|W  51|D  33|H    |L  16|D  28|
## 110                                                                 MI | 14773163 / R: 1355   
## 111                                                                                           
## 112     37 | AMIYATOSH PWNANANDAM            |3.5  |B    |L   5|W  34|L  27|H    |L  23|W  61|
## 113                                                                 MI | 15489571 / R:  980P12
## 114                                                                                           
## 115     38 | BRIAN LIU                       |3.0  |D  11|W  35|W  29|L  12|H    |L  18|L  15|
## 116                                                                 MI | 15108523 / R: 1423   
## 117                                                                                           
## 118     39 | JOEL R HENDON                   |3.0  |L   1|W  54|W  40|L  16|W  44|L  21|L  24|
## 119                                                                 MI | 12923035 / R: 1436P23
## 120                                                                                           
## 121     40 | FOREST ZHANG                    |3.0  |W  20|L  26|L  39|W  59|L  21|W  56|L  22|
## 122                                                                 MI | 14892710 / R: 1348   
## 123                                                                                           
## 124     41 | KYLE WILLIAM MURPHY             |3.0  |W  59|L  17|W  58|L  20|X    |U    |U    |
## 125                                                                 MI | 15761443 / R: 1403P5 
## 126                                                                                           
## 127     42 | JARED GE                        |3.0  |L  12|L  50|L  57|D  60|D  61|W  64|W  56|
## 128                                                                 MI | 14462326 / R: 1332   
## 129                                                                                           
## 130     43 | ROBERT GLEN VASEY               |3.0  |L  21|L  23|L  24|W  63|W  59|L  46|W  55|
## 131                                                                 MI | 14101068 / R: 1283   
## 132                                                                                           
## 133     44 | JUSTIN D SCHILLING              |3.0  |B    |L  14|L  32|W  53|L  39|L  24|W  59|
## 134                                                                 MI | 15323504 / R: 1199   
## 135                                                                                           
## 136     45 | DEREK YAN                       |3.0  |L   5|L  51|D  60|L  56|W  63|D  55|W  58|
## 137                                                                 MI | 15372807 / R: 1242   
## 138                                                                                           
## 139     46 | JACOB ALEXANDER LAVALLEY        |3.0  |W  35|L   7|L  27|L  50|W  64|W  43|L  23|
## 140                                                                 MI | 15490981 / R:  377P3 
## 141                                                                                           
## 142     47 | ERIC WRIGHT                     |2.5  |L  18|W  24|L  21|W  61|L   8|D  51|L  25|
## 143                                                                 MI | 12533115 / R: 1362   
## 144                                                                                           
## 145     48 | DANIEL KHAIN                    |2.5  |L  17|W  63|H    |D  52|H    |L  29|L  35|
## 146                                                                 MI | 14369165 / R: 1382   
## 147                                                                                           
## 148     49 | MICHAEL J MARTIN                |2.5  |L  26|L  20|D  63|D  64|W  58|H    |U    |
## 149                                                                 MI | 12531685 / R: 1291P12
## 150                                                                                           
## 151     50 | SHIVAM JHA                      |2.5  |L  29|W  42|L  33|W  46|H    |L  31|L  30|
## 152                                                                 MI | 14773178 / R: 1056   
## 153                                                                                           
## 154     51 | TEJAS AYYAGARI                  |2.5  |L  27|W  45|L  36|W  57|L  32|D  47|L  33|
## 155                                                                 MI | 15205474 / R: 1011   
## 156                                                                                           
## 157     52 | ETHAN GUO                       |2.5  |W  30|D  22|L  19|D  48|L  29|D  35|L  34|
## 158                                                                 MI | 14918803 / R:  935   
## 159                                                                                           
## 160     53 | JOSE C YBARRA                   |2.0  |H    |L  25|H    |L  44|U    |W  57|U    |
## 161                                                                 MI | 12578849 / R: 1393   
## 162                                                                                           
## 163     54 | LARRY HODGE                     |2.0  |L  14|L  39|L  61|B    |L  15|L  59|W  64|
## 164                                                                 MI | 12836773 / R: 1270   
## 165                                                                                           
## 166     55 | ALEX KONG                       |2.0  |L  62|D  31|L  10|L  30|B    |D  45|L  43|
## 167                                                                 MI | 15412571 / R: 1186   
## 168                                                                                           
## 169     56 | MARISA RICCI                    |2.0  |H    |L  11|L  35|W  45|H    |L  40|L  42|
## 170                                                                 MI | 14679887 / R: 1153   
## 171                                                                                           
## 172     57 | MICHAEL LU                      |2.0  |L   7|L  36|W  42|L  51|L  35|L  53|B    |
## 173                                                                 MI | 15113330 / R: 1092   
## 174                                                                                           
## 175     58 | VIRAJ MOHILE                    |2.0  |W  31|L   2|L  41|L  23|L  49|B    |L  45|
## 176                                                                 MI | 14700365 / R:  917   
## 177                                                                                           
## 178     59 | SEAN M MC CORMICK               |2.0  |L  41|B    |L   9|L  40|L  43|W  54|L  44|
## 179                                                                 MI | 12841036 / R:  853   
## 180                                                                                           
## 181     60 | JULIA SHEN                      |1.5  |L  33|L  34|D  45|D  42|L  24|H    |U    |
## 182                                                                 MI | 14579262 / R:  967   
## 183                                                                                           
## 184     61 | JEZZEL FARKAS                   |1.5  |L  32|L   3|W  54|L  47|D  42|L  30|L  37|
## 185                                                                 ON | 15771592 / R:  955P11
## 186                                                                                           
## 187     62 | ASHWIN BALAJI                   |1.0  |W  55|U    |U    |U    |U    |U    |U    |
## 188                                                                 MI | 15219542 / R: 1530   
## 189                                                                                           
## 190     63 | THOMAS JOSEPH HOSMER            |1.0  |L   2|L  48|D  49|L  43|L  45|H    |U    |
## 191                                                                 MI | 15057092 / R: 1175   
## 192                                                                                           
## 193     64 | BEN LI                          |1.0  |L  22|D  30|L  31|D  49|L  46|L  42|L  54|
## 194                                                                 MI | 15006561 / R: 1163   
## 195                                                                                           
##                                                  chess_data_raw.X.1
## 1                                                                  
## 2   >Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## 3                                                                  
## 4                                                                  
## 5       >1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |
## 6                                                                  
## 7                                                                  
## 8       >1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |
## 9                                                                  
## 10                                                                 
## 11      >1640     |N:2  |W    |B    |W    |B    |W    |B    |W    |
## 12                                                                 
## 13                                                                 
## 14      >1744     |N:2  |W    |B    |W    |B    |W    |B    |B    |
## 15                                                                 
## 16                                                                 
## 17      >1690     |N:2  |B    |W    |B    |W    |B    |W    |B    |
## 18                                                                 
## 19                                                                 
## 20      >1687     |N:3  |W    |B    |W    |B    |B    |W    |B    |
## 21                                                                 
## 22                                                                 
## 23      >1673     |N:3  |W    |B    |W    |B    |B    |W    |W    |
## 24                                                                 
## 25                                                                 
## 26      >1657P24  |N:3  |B    |W    |B    |W    |B    |W    |W    |
## 27                                                                 
## 28                                                                 
## 29      >1564     |N:2  |W    |B    |W    |B    |W    |B    |B    |
## 30                                                                 
## 31                                                                 
## 32      >1544     |N:3  |W    |W    |B    |B    |W    |B    |W    |
## 33                                                                 
## 34                                                                 
## 35      >1696     |N:3  |B    |W    |B    |W    |B    |W    |B    |
## 36                                                                 
## 37                                                                 
## 38      >1670     |N:3  |W    |B    |W    |B    |     |W    |B    |
## 39                                                                 
## 40                                                                 
## 41      >1662     |N:3  |B    |W    |B    |B    |W    |W    |B    |
## 42                                                                 
## 43                                                                 
## 44      >1618     |N:3  |W    |B    |W    |W    |B    |B    |W    |
## 45                                                                 
## 46                                                                 
## 47      >1416P20  |N:3  |B    |B    |W    |W    |B    |B    |W    |
## 48                                                                 
## 49                                                                 
## 50      >1613     |N:3  |B    |W    |     |B    |W    |B    |     |
## 51                                                                 
## 52                                                                 
## 53      >1610     |N:3  |W    |B    |W    |B    |W    |B    |W    |
## 54                                                                 
## 55                                                                 
## 56      >1600     |N:3  |B    |W    |B    |W    |B    |W    |B    |
## 57                                                                 
## 58                                                                 
## 59      >1570     |N:3  |W    |B    |W    |B    |W    |W    |B    |
## 60                                                                 
## 61                                                                 
## 62      >1569     |N:4  |W    |B    |W    |B    |W    |B    |W    |
## 63                                                                 
## 64                                                                 
## 65      >1562     |N:3  |B    |W    |B    |W    |W    |B    |W    |
## 66                                                                 
## 67                                                                 
## 68      >1529     |N:4  |W    |B    |W    |B    |     |W    |B    |
## 69                                                                 
## 70                                                                 
## 71      >1371     |     |B    |W    |B    |W    |B    |W    |B    |
## 72                                                                 
## 73                                                                 
## 74      >1300     |N:4  |B    |W    |B    |B    |W    |W    |B    |
## 75                                                                 
## 76                                                                 
## 77      >1681     |N:4  |B    |W    |B    |W    |B    |W    |B    |
## 78                                                                 
## 79                                                                 
## 80      >1564     |N:4  |B    |W    |B    |W    |B    |W    |W    |
## 81                                                                 
## 82                                                                 
## 83      >1539     |N:4  |W    |B    |W    |B    |W    |B    |     |
## 84                                                                 
## 85                                                                 
## 86      >1513     |N:3  |W    |W    |B    |W    |B    |B    |W    |
## 87                                                                 
## 88                                                                 
## 89      >1508P12  |N:4  |B    |W    |B    |W    |W    |B    |     |
## 90                                                                 
## 91                                                                 
## 92      >1444     |     |W    |B    |B    |W    |W    |B    |B    |
## 93                                                                 
## 94                                                                 
## 95      >1444     |     |B    |W    |B    |W    |B    |W    |B    |
## 96                                                                 
## 97                                                                 
## 98      >1433     |N:4  |W    |B    |W    |B    |W    |B    |W    |
## 99                                                                 
## 100                                                                
## 101     >1421     |     |B    |W    |B    |W    |B    |W    |B    |
## 102                                                                
## 103                                                                
## 104     >1400     |     |B    |W    |B    |B    |W    |B    |W    |
## 105                                                                
## 106                                                                
## 107     >1392     |     |W    |W    |B    |W    |B    |B    |W    |
## 108                                                                
## 109                                                                
## 110     >1367     |N:4  |W    |B    |W    |B    |     |W    |B    |
## 111                                                                
## 112                                                                
## 113     >1077P17  |     |     |B    |W    |W    |     |B    |W    |
## 114                                                                
## 115                                                                
## 116     >1439     |N:4  |W    |B    |W    |W    |     |B    |B    |
## 117                                                                
## 118                                                                
## 119     >1413     |N:4  |B    |W    |B    |W    |B    |W    |W    |
## 120                                                                
## 121                                                                
## 122     >1346     |     |B    |B    |W    |W    |B    |W    |W    |
## 123                                                                
## 124                                                                
## 125     >1341P9   |     |B    |W    |B    |W    |     |     |     |
## 126                                                                
## 127                                                                
## 128     >1256     |     |B    |W    |B    |B    |W    |W    |B    |
## 129                                                                
## 130                                                                
## 131     >1244     |     |W    |B    |W    |W    |B    |B    |W    |
## 132                                                                
## 133                                                                
## 134     >1199     |     |     |W    |B    |B    |W    |B    |W    |
## 135                                                                
## 136                                                                
## 137     >1191     |     |W    |B    |W    |B    |W    |B    |W    |
## 138                                                                
## 139                                                                
## 140     >1076P10  |     |B    |W    |B    |W    |B    |W    |W    |
## 141                                                                
## 142                                                                
## 143     >1341     |     |W    |B    |W    |B    |W    |B    |W    |
## 144                                                                
## 145                                                                
## 146     >1335     |     |B    |W    |     |B    |     |W    |B    |
## 147                                                                
## 148                                                                
## 149     >1259P17  |     |W    |W    |B    |W    |B    |     |     |
## 150                                                                
## 151                                                                
## 152     >1111     |     |W    |B    |W    |B    |     |B    |W    |
## 153                                                                
## 154                                                                
## 155     >1097     |     |B    |W    |B    |W    |B    |W    |W    |
## 156                                                                
## 157                                                                
## 158     >1092     |N:4  |B    |W    |B    |W    |B    |W    |B    |
## 159                                                                
## 160                                                                
## 161     >1359     |     |     |B    |     |W    |     |W    |     |
## 162                                                                
## 163                                                                
## 164     >1200     |     |B    |B    |W    |     |W    |B    |W    |
## 165                                                                
## 166                                                                
## 167     >1163     |     |W    |B    |W    |B    |     |W    |B    |
## 168                                                                
## 169                                                                
## 170     >1140     |     |     |B    |W    |W    |     |B    |W    |
## 171                                                                
## 172                                                                
## 173     >1079     |     |B    |W    |W    |B    |W    |B    |     |
## 174                                                                
## 175                                                                
## 176     > 941     |     |W    |B    |W    |B    |W    |     |B    |
## 177                                                                
## 178                                                                
## 179     > 878     |     |W    |     |B    |B    |W    |W    |B    |
## 180                                                                
## 181                                                                
## 182     > 984     |     |W    |B    |B    |W    |B    |     |     |
## 183                                                                
## 184                                                                
## 185     > 979P18  |     |B    |W    |B    |W    |B    |W    |B    |
## 186                                                                
## 187                                                                
## 188     >1535     |     |B    |     |     |     |     |     |     |
## 189                                                                
## 190                                                                
## 191     >1125     |     |W    |B    |W    |B    |B    |     |     |
## 192                                                                
## 193                                                                
## 194     >1112     |     |B    |W    |W    |B    |W    |B    |B    |
## 195

Reorganizing the data.

Now to use stringr to pull the desired data from the data frame, and put it into a better organized data frame. Using str_extract() will produce a lot of NA’s. In this application these just mean an entry did not match our regular expression. I use !is.na() to remove these data. Also note that the regular expression that captures the name also captures it’s row header. I reorganize that list to omit the unwanted data. The regular expressions (regex) used are explained in the code comments

I adapted removing NA’s from the lists from: https://stackoverflow.com/questions/8184483/how-to-remove-all-the-na-from-a-vector

library(stringr)

## Warning: package 'stringr' was built under R version 3.4.1

#Names are upper case letters seperated by spaces that go into another upper case letter with a possible underscore.
chess_names <- str_extract(chess_data$chess_data_raw.X, "[[:upper:][:blank:]]{4,}[[:upper:]][_[:upper:]]+")
chess_names <- chess_names[!is.na(chess_names)]
chess_names <- chess_names[2:65]
#States were all either Michigan, Ontario and 1 from Ohio. I felt comfortable being more specific with these strings since I did not want to accidently match part of a name. The string is sourounded by whitespace which is alo reflected in the regex. 
states <- str_extract(chess_data$chess_data_raw.X, " [MI]{2} | [ONH]{2} ")
states <- states[!is.na(states)]
#Points were decimal numbers so we look for digits with a period in between.
chess_points <- str_extract(chess_data$chess_data_raw.X, "([0-9]{1}\\.{1}[0-9]{1})")
chess_points <- chess_points[!is.na(chess_points)]
#Pre touney ELOs are preceded by a ":" and white space followed by digits.
pre_rating <- str_extract(chess_data$chess_data_raw.X, ": +[0-9]+")
pre_rating <- pre_rating[!is.na(pre_rating)]
#We can now put together the data frame with column titles
chess_df <- data.frame("Names" = chess_names, "State" = states, "Points" = chess_points, "Pre-ELO" = pre_rating)
#Clean up the data by dropping the colons picked up in the str_extract
chess_df[,4] <- str_replace(chess_df[,4], pattern = ": ", replacement = "")
#Since the hyphen is no longer a problem, we can replace the underscore
chess_df[,1] <- str_replace(chess_df[,1], pattern = "_", replacement = "-")
# We also need to clean the data by converting from factors to appropraite data type, numeric for Score and Integer for ELO
chess_df[,3] <- as.numeric(as.character(chess_df[,3]))
chess_df[,4] <- as.integer(chess_df[,4])
chess_df

##                          Names State Points Pre.ELO
## 1                     GARY HUA   ON     6.0    1794
## 2              DAKSHESH DARURI   MI     6.0    1553
## 3                 ADITYA BAJAJ   MI     6.0    1384
## 4          PATRICK H SCHILLING   MI     5.5    1716
## 5                   HANSHI ZUO   MI     5.5    1655
## 6                  HANSEN SONG   OH     5.0    1686
## 7            GARY DEE SWATHELL   MI     5.0    1649
## 8             EZEKIEL HOUGHTON   MI     5.0    1641
## 9                  STEFANO LEE   ON     5.0    1411
## 10                   ANVIT RAO   MI     5.0    1365
## 11    CAMERON WILLIAM MC LEMAN   MI     4.5    1712
## 12              KENNETH J TACK   MI     4.5    1663
## 13           TORRANCE HENRY JR   MI     4.5    1666
## 14                BRADLEY SHAW   MI     4.5    1610
## 15      ZACHARY JAMES HOUGHTON   MI     4.5    1220
## 16                MIKE NIKITIN   MI     4.0    1604
## 17          RONALD GRZEGORCZYK   MI     4.0    1629
## 18               DAVID SUNDEEN   MI     4.0    1600
## 19                DIPANKAR ROY   MI     4.0    1564
## 20                 JASON ZHENG   MI     4.0    1595
## 21               DINH DANG BUI   ON     4.0    1563
## 22            EUGENE L MCCLURE   MI     4.0    1555
## 23                    ALAN BUI   ON     4.0    1363
## 24           MICHAEL R ALDRICH   MI     4.0    1229
## 25            LOREN SCHWIEBERT   MI     3.5    1745
## 26                     MAX ZHU   ON     3.5    1579
## 27              GAURAV GIDWANI   MI     3.5    1552
## 28  SOFIA ADINA STANESCU-BELLU   MI     3.5    1507
## 29            CHIEDOZIE OKORIE   MI     3.5    1602
## 30          GEORGE AVERY JONES   ON     3.5    1522
## 31                RISHI SHETTY   MI     3.5    1494
## 32       JOSHUA PHILIP MATHEWS   ON     3.5    1441
## 33                     JADE GE   MI     3.5    1449
## 34      MICHAEL JEFFERY THOMAS   MI     3.5    1399
## 35            JOSHUA DAVID LEE   MI     3.5    1438
## 36               SIDDHARTH JHA   MI     3.5    1355
## 37        AMIYATOSH PWNANANDAM   MI     3.5     980
## 38                   BRIAN LIU   MI     3.0    1423
## 39               JOEL R HENDON   MI     3.0    1436
## 40                FOREST ZHANG   MI     3.0    1348
## 41         KYLE WILLIAM MURPHY   MI     3.0    1403
## 42                    JARED GE   MI     3.0    1332
## 43           ROBERT GLEN VASEY   MI     3.0    1283
## 44          JUSTIN D SCHILLING   MI     3.0    1199
## 45                   DEREK YAN   MI     3.0    1242
## 46    JACOB ALEXANDER LAVALLEY   MI     3.0     377
## 47                 ERIC WRIGHT   MI     2.5    1362
## 48                DANIEL KHAIN   MI     2.5    1382
## 49            MICHAEL J MARTIN   MI     2.5    1291
## 50                  SHIVAM JHA   MI     2.5    1056
## 51              TEJAS AYYAGARI   MI     2.5    1011
## 52                   ETHAN GUO   MI     2.5     935
## 53               JOSE C YBARRA   MI     2.0    1393
## 54                 LARRY HODGE   MI     2.0    1270
## 55                   ALEX KONG   MI     2.0    1186
## 56                MARISA RICCI   MI     2.0    1153
## 57                  MICHAEL LU   MI     2.0    1092
## 58                VIRAJ MOHILE   MI     2.0     917
## 59           SEAN M MC CORMICK   MI     2.0     853
## 60                  JULIA SHEN   MI     1.5     967
## 61               JEZZEL FARKAS   ON     1.5     955
## 62               ASHWIN BALAJI   MI     1.0    1530
## 63        THOMAS JOSEPH HOSMER   MI     1.0    1175
## 64                      BEN LI   MI     1.0    1163

Initial Visualizations

Just out of curiosity I want to know the distribution of the score and the ELO ratings.

hist(chess_df[,3], xlab = "Score", main= "Histogram of Score")

hist(chess_df[,4], xlab= "ELO Rating", main = "Histogram of ELO Rating")

Calculating the Average Opponent Score

We see the ELO ratings are left skewed with an outlier $< 500$ which might explain the slight asymmetry to the left in the score. The top two bins in ELO may have been dominating the bottom 4 bins.

Now We will calculate the average opponent ELO score. This requires more treatment than the other categories so I opted to put it in it’s own section.

#First I extract all the 1 or 2 digit numbers ending with a "|"
op_id <- str_extract_all(chess_data$chess_data_raw.X, "\\d{1,2}[\\|]")
#Next I get rid of all the empty lists, I adapted this line of code from: https://stackoverflow.com/questions/19023446/remove-empty-elements-from-list-with-character0
op_id <- op_id[lapply(op_id, length)>0]
#Next I get rid of the pipes one line had only one element so I had to make an or statement for that case. Also when I saved op_id as an integer data frame it was set to chacater type instead of factor, which made the following code easier.
op_id <- as.data.frame.integer(gsub("\"(\\d{1,2})\\|\"|(\\d{1,2})[\\|]", "\\1 \\2" ,op_id))
#I am going to use embedded for loops to parse through a list of lists. Once I have the oponent ID, I'll reference chess_df to get thier ELO and add them up and divide by total number of oppenents. 
op_ave <- integer(0) #Initialized here for scope.
for(i in 1:length(op_id[,1])){
  numbers <- as.vector(str_extract_all(op_id[i,1], "\\d{1,2}")) #Removes spaces in the strings.
  for(n in numbers){
    tot = 0 #total opponent score
    for(j in n){
      #This gets the ELO from the op_id and totals them
      tot = tot + chess_df[as.integer(j),4]
    }
    #This gets the average from number of opponents, and stores it in a vector.
    ave = as.integer(tot/length(n))
    op_ave[i] <- ave
  }
}
#Now I update my chess_df
chess_df$Op_Ave <- op_ave
chess_df

##                          Names State Points Pre.ELO Op_Ave
## 1                     GARY HUA   ON     6.0    1794   1605
## 2              DAKSHESH DARURI   MI     6.0    1553   1469
## 3                 ADITYA BAJAJ   MI     6.0    1384   1563
## 4          PATRICK H SCHILLING   MI     5.5    1716   1573
## 5                   HANSHI ZUO   MI     5.5    1655   1500
## 6                  HANSEN SONG   OH     5.0    1686   1518
## 7            GARY DEE SWATHELL   MI     5.0    1649   1372
## 8             EZEKIEL HOUGHTON   MI     5.0    1641   1468
## 9                  STEFANO LEE   ON     5.0    1411   1523
## 10                   ANVIT RAO   MI     5.0    1365   1554
## 11    CAMERON WILLIAM MC LEMAN   MI     4.5    1712   1467
## 12              KENNETH J TACK   MI     4.5    1663   1506
## 13           TORRANCE HENRY JR   MI     4.5    1666   1497
## 14                BRADLEY SHAW   MI     4.5    1610   1515
## 15      ZACHARY JAMES HOUGHTON   MI     4.5    1220   1483
## 16                MIKE NIKITIN   MI     4.0    1604   1385
## 17          RONALD GRZEGORCZYK   MI     4.0    1629   1498
## 18               DAVID SUNDEEN   MI     4.0    1600   1480
## 19                DIPANKAR ROY   MI     4.0    1564   1426
## 20                 JASON ZHENG   MI     4.0    1595   1410
## 21               DINH DANG BUI   ON     4.0    1563   1470
## 22            EUGENE L MCCLURE   MI     4.0    1555   1300
## 23                    ALAN BUI   ON     4.0    1363   1213
## 24           MICHAEL R ALDRICH   MI     4.0    1229   1357
## 25            LOREN SCHWIEBERT   MI     3.5    1745   1363
## 26                     MAX ZHU   ON     3.5    1579   1506
## 27              GAURAV GIDWANI   MI     3.5    1552   1221
## 28  SOFIA ADINA STANESCU-BELLU   MI     3.5    1507   1522
## 29            CHIEDOZIE OKORIE   MI     3.5    1602   1313
## 30          GEORGE AVERY JONES   ON     3.5    1522   1144
## 31                RISHI SHETTY   MI     3.5    1494   1259
## 32       JOSHUA PHILIP MATHEWS   ON     3.5    1441   1378
## 33                     JADE GE   MI     3.5    1449   1276
## 34      MICHAEL JEFFERY THOMAS   MI     3.5    1399   1375
## 35            JOSHUA DAVID LEE   MI     3.5    1438   1149
## 36               SIDDHARTH JHA   MI     3.5    1355   1388
## 37        AMIYATOSH PWNANANDAM   MI     3.5     980   1384
## 38                   BRIAN LIU   MI     3.0    1423   1539
## 39               JOEL R HENDON   MI     3.0    1436   1429
## 40                FOREST ZHANG   MI     3.0    1348   1390
## 41         KYLE WILLIAM MURPHY   MI     3.0    1403   1248
## 42                    JARED GE   MI     3.0    1332   1149
## 43           ROBERT GLEN VASEY   MI     3.0    1283   1106
## 44          JUSTIN D SCHILLING   MI     3.0    1199   1327
## 45                   DEREK YAN   MI     3.0    1242   1152
## 46    JACOB ALEXANDER LAVALLEY   MI     3.0     377   1357
## 47                 ERIC WRIGHT   MI     2.5    1362   1392
## 48                DANIEL KHAIN   MI     2.5    1382   1355
## 49            MICHAEL J MARTIN   MI     2.5    1291   1285
## 50                  SHIVAM JHA   MI     2.5    1056   1296
## 51              TEJAS AYYAGARI   MI     2.5    1011   1356
## 52                   ETHAN GUO   MI     2.5     935   1494
## 53               JOSE C YBARRA   MI     2.0    1393   1345
## 54                 LARRY HODGE   MI     2.0    1270   1206
## 55                   ALEX KONG   MI     2.0    1186   1406
## 56                MARISA RICCI   MI     2.0    1153   1414
## 57                  MICHAEL LU   MI     2.0    1092   1363
## 58                VIRAJ MOHILE   MI     2.0     917   1391
## 59           SEAN M MC CORMICK   MI     2.0     853   1319
## 60                  JULIA SHEN   MI     1.5     967   1330
## 61               JEZZEL FARKAS   ON     1.5     955   1327
## 62               ASHWIN BALAJI   MI     1.0    1530   1186
## 63        THOMAS JOSEPH HOSMER   MI     1.0    1175   1350
## 64                      BEN LI   MI     1.0    1163   1263

I’m curious how fair they made it so I want to look at the ratio of Player ELO to Opponent Average ELO.

ELO_ratio <- chess_df$Pre.ELO/chess_df$Op_Ave
hist(ELO_ratio, xlab = "ELO/(Ave Oponent ELO")

plot(chess_df$Points, ELO_ratio)

fit <- lm(ELO_ratio ~ chess_df$Points)
summary(fit)

## 
## Call:
## lm(formula = ELO_ratio ~ chess_df$Points)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.70202 -0.09233  0.02735  0.10655  0.41944 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.81600    0.06762  12.067  < 2e-16 ***
## chess_df$Points  0.05461    0.01853   2.947  0.00452 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1814 on 62 degrees of freedom
## Multiple R-squared:  0.1228, Adjusted R-squared:  0.1087 
## F-statistic: 8.683 on 1 and 62 DF,  p-value: 0.00452

The distribution is centered on 1, which seems pretty fair. The linear model does seem to show a significant relationship between Player ELO and Average Opponent ELO. If Player ELO is greater than average opponent ELO, you would expect a higher win percentage. Relative ELO does correspond to win percentage. The ELO system seems to work pretty well and it seems that organizers did a fairly good job matching players.

Finally to make the .csv

write.csv(chess_df, "tournamentinfo.csv")