In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players:

Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents

For the first player, the information would be:

Gary Hua, ON, 6.0, 1794, 1605

1605 was calculated by using the pre-tournament opponents’ ratings of 1436, 1563, 1600, 1610, 1649, 1663, 1716, and dividing by the total number of games played.

Answer:

Lets load the content of the file and see the data

library(stringr)

dschess <- readLines("./tournamentinfo.txt")
head(dschess)
## [1] "-----------------------------------------------------------------------------------------" 
## [2] " Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| "
## [3] " Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | "
## [4] "-----------------------------------------------------------------------------------------" 
## [5] "    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|" 
## [6] "   ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |"
tail(dschess)
## [1] "   63 | THOMAS JOSEPH HOSMER            |1.0  |L   2|L  48|D  49|L  43|L  45|H    |U    |"
## [2] "   MI | 15057092 / R: 1175   ->1125     |     |W    |B    |W    |B    |B    |     |     |"
## [3] "-----------------------------------------------------------------------------------------"
## [4] "   64 | BEN LI                          |1.0  |L  22|D  30|L  31|D  49|L  46|L  42|L  54|"
## [5] "   MI | 15006561 / R: 1163   ->1112     |     |B    |W    |W    |B    |W    |B    |B    |"
## [6] "-----------------------------------------------------------------------------------------"

This data has to be cleaned up. We have to remove dashes. We can start by removing the header in the first 4 lines.

ds_cp_chess <- dschess[-c(0:4)]
head(ds_cp_chess, 20)
##  [1] "    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|"
##  [2] "   ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |"
##  [3] "-----------------------------------------------------------------------------------------"
##  [4] "    2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|"
##  [5] "   MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |"
##  [6] "-----------------------------------------------------------------------------------------"
##  [7] "    3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12|"
##  [8] "   MI | 14959604 / R: 1384   ->1640     |N:2  |W    |B    |W    |B    |W    |B    |W    |"
##  [9] "-----------------------------------------------------------------------------------------"
## [10] "    4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1|"
## [11] "   MI | 12616049 / R: 1716   ->1744     |N:2  |W    |B    |W    |B    |W    |B    |B    |"
## [12] "-----------------------------------------------------------------------------------------"
## [13] "    5 | HANSHI ZUO                      |5.5  |W  45|W  37|D  12|D  13|D   4|W  14|W  17|"
## [14] "   MI | 14601533 / R: 1655   ->1690     |N:2  |B    |W    |B    |W    |B    |W    |B    |"
## [15] "-----------------------------------------------------------------------------------------"
## [16] "    6 | HANSEN SONG                     |5.0  |W  34|D  29|L  11|W  35|D  10|W  27|W  21|"
## [17] "   OH | 15055204 / R: 1686   ->1687     |N:3  |W    |B    |W    |B    |B    |W    |B    |"
## [18] "-----------------------------------------------------------------------------------------"
## [19] "    7 | GARY DEE SWATHELL               |5.0  |W  57|W  46|W  13|W  11|L   1|W   9|L   2|"
## [20] "   MI | 11146376 / R: 1649   ->1673     |N:3  |W    |B    |W    |B    |B    |W    |W    |"

Trim characters

ds_cp_chess <- ds_cp_chess[sapply(ds_cp_chess, nchar) > 0]

Extract line that contains rows with names into a variable. We can use seq() method to do this. This method returns row numbers from 1 to total length (192 rows) and skips by 3. Following are the rows that we will get.

data_1 <- c(seq(1, length(ds_cp_chess), 3))
data_1
##  [1]   1   4   7  10  13  16  19  22  25  28  31  34  37  40  43  46  49
## [18]  52  55  58  61  64  67  70  73  76  79  82  85  88  91  94  97 100
## [35] 103 106 109 112 115 118 121 124 127 130 133 136 139 142 145 148 151
## [52] 154 157 160 163 166 169 172 175 178 181 184 187 190

Apply it to the dataset

data_r1 <- ds_cp_chess[data_1]
head(data_r1)
## [1] "    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|"
## [2] "    2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|"
## [3] "    3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12|"
## [4] "    4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1|"
## [5] "    5 | HANSHI ZUO                      |5.5  |W  45|W  37|D  12|D  13|D   4|W  14|W  17|"
## [6] "    6 | HANSEN SONG                     |5.0  |W  34|D  29|L  11|W  35|D  10|W  27|W  21|"

Extract name using regex

name <- str_extract(data_r1, "[[:alpha:]]{2,}([[:blank:]][[:alpha:]]{1,}){1,}")
head(name)
## [1] "GARY HUA"            "DAKSHESH DARURI"     "ADITYA BAJAJ"       
## [4] "PATRICK H SCHILLING" "HANSHI ZUO"          "HANSEN SONG"

Extract the rows in the second row. Use the same technique as above.

data_2 <- c(seq(2, length(ds_cp_chess), 3))
data_2
##  [1]   2   5   8  11  14  17  20  23  26  29  32  35  38  41  44  47  50
## [18]  53  56  59  62  65  68  71  74  77  80  83  86  89  92  95  98 101
## [35] 104 107 110 113 116 119 122 125 128 131 134 137 140 143 146 149 152
## [52] 155 158 161 164 167 170 173 176 179 182 185 188 191

Apply it to the dataset

data_r2 <- ds_cp_chess[data_2]
head(data_r2)
## [1] "   ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |"
## [2] "   MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |"
## [3] "   MI | 14959604 / R: 1384   ->1640     |N:2  |W    |B    |W    |B    |W    |B    |W    |"
## [4] "   MI | 12616049 / R: 1716   ->1744     |N:2  |W    |B    |W    |B    |W    |B    |B    |"
## [5] "   MI | 14601533 / R: 1655   ->1690     |N:2  |B    |W    |B    |W    |B    |W    |B    |"
## [6] "   OH | 15055204 / R: 1686   ->1687     |N:3  |W    |B    |W    |B    |B    |W    |B    |"

Extract state using regex

state <- str_extract(data_r2, "[[:alpha:]]{2}")
state
##  [1] "ON" "MI" "MI" "MI" "MI" "OH" "MI" "MI" "ON" "MI" "MI" "MI" "MI" "MI"
## [15] "MI" "MI" "MI" "MI" "MI" "MI" "ON" "MI" "ON" "MI" "MI" "ON" "MI" "MI"
## [29] "MI" "ON" "MI" "ON" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI"
## [43] "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI"
## [57] "MI" "MI" "MI" "MI" "ON" "MI" "MI" "MI"

Extract points using regex

pts <- str_extract(data_r1, "[[:digit:]]+\\.[[:digit:]]")
pts <- as.numeric(as.character(pts))
pts
##  [1] 6.0 6.0 6.0 5.5 5.5 5.0 5.0 5.0 5.0 5.0 4.5 4.5 4.5 4.5 4.5 4.0 4.0
## [18] 4.0 4.0 4.0 4.0 4.0 4.0 4.0 3.5 3.5 3.5 3.5 3.5 3.5 3.5 3.5 3.5 3.5
## [35] 3.5 3.5 3.5 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 2.5 2.5 2.5 2.5 2.5
## [52] 2.5 2.0 2.0 2.0 2.0 2.0 2.0 2.0 1.5 1.5 1.0 1.0 1.0

Extract pre rating using regex

prertg <- str_extract(data_r2, ".\\: \\s?[[:digit:]]{3,4}")
prertg
##  [1] "R: 1794" "R: 1553" "R: 1384" "R: 1716" "R: 1655" "R: 1686" "R: 1649"
##  [8] "R: 1641" "R: 1411" "R: 1365" "R: 1712" "R: 1663" "R: 1666" "R: 1610"
## [15] "R: 1220" "R: 1604" "R: 1629" "R: 1600" "R: 1564" "R: 1595" "R: 1563"
## [22] "R: 1555" "R: 1363" "R: 1229" "R: 1745" "R: 1579" "R: 1552" "R: 1507"
## [29] "R: 1602" "R: 1522" "R: 1494" "R: 1441" "R: 1449" "R: 1399" "R: 1438"
## [36] "R: 1355" "R:  980" "R: 1423" "R: 1436" "R: 1348" "R: 1403" "R: 1332"
## [43] "R: 1283" "R: 1199" "R: 1242" "R:  377" "R: 1362" "R: 1382" "R: 1291"
## [50] "R: 1056" "R: 1011" "R:  935" "R: 1393" "R: 1270" "R: 1186" "R: 1153"
## [57] "R: 1092" "R:  917" "R:  853" "R:  967" "R:  955" "R: 1530" "R: 1175"
## [64] "R: 1163"

Extract digits using regex and convert it to numeric

prertg <- as.numeric(str_extract(prertg, "\\(?[0-9,.]+\\)?"))
prertg
##  [1] 1794 1553 1384 1716 1655 1686 1649 1641 1411 1365 1712 1663 1666 1610
## [15] 1220 1604 1629 1600 1564 1595 1563 1555 1363 1229 1745 1579 1552 1507
## [29] 1602 1522 1494 1441 1449 1399 1438 1355  980 1423 1436 1348 1403 1332
## [43] 1283 1199 1242  377 1362 1382 1291 1056 1011  935 1393 1270 1186 1153
## [57] 1092  917  853  967  955 1530 1175 1163

Extract opponent number using regex. This data can be used to find opponents prerating average

oppnum <- str_extract_all(data_r1, "[[:digit:]]{1,2}\\|")
oppnum <- str_extract_all(oppnum, "[[:digit:]]{1,2}")
oppnum <- lapply(oppnum, as.numeric)
head(oppnum)
## [[1]]
## [1] 39 21 18 14  7 12  4
## 
## [[2]]
## [1] 63 58  4 17 16 20  7
## 
## [[3]]
## [1]  8 61 25 21 11 13 12
## 
## [[4]]
## [1] 23 28  2 26  5 19  1
## 
## [[5]]
## [1] 45 37 12 13  4 14 17
## 
## [[6]]
## [1] 34 29 11 35 10 27 21

Calculate prerating average for the opponent

opppreratingavg <- list()

for (i in 1:length(oppnum)){
  opppreratingavg[i] <- round(mean(prertg[unlist(oppnum[i])]),2)
}
opppreratingavg <- lapply(opppreratingavg, as.numeric)
opppreratingavg <- data.frame(unlist(opppreratingavg))

df_final <- cbind.data.frame(name, state, pts, prertg, opppreratingavg)
colnames(df_final) <- c("Name", "State", "Points", "Pre_Rating", "Opp_Pre_Rating")
df_final
##                        Name State Points Pre_Rating Opp_Pre_Rating
## 1                  GARY HUA    ON    6.0       1794        1605.29
## 2           DAKSHESH DARURI    MI    6.0       1553        1469.29
## 3              ADITYA BAJAJ    MI    6.0       1384        1563.57
## 4       PATRICK H SCHILLING    MI    5.5       1716        1573.57
## 5                HANSHI ZUO    MI    5.5       1655        1500.86
## 6               HANSEN SONG    OH    5.0       1686        1518.71
## 7         GARY DEE SWATHELL    MI    5.0       1649        1372.14
## 8          EZEKIEL HOUGHTON    MI    5.0       1641        1468.43
## 9               STEFANO LEE    ON    5.0       1411        1523.14
## 10                ANVIT RAO    MI    5.0       1365        1554.14
## 11 CAMERON WILLIAM MC LEMAN    MI    4.5       1712        1467.57
## 12           KENNETH J TACK    MI    4.5       1663        1506.17
## 13        TORRANCE HENRY JR    MI    4.5       1666        1497.86
## 14             BRADLEY SHAW    MI    4.5       1610        1515.00
## 15   ZACHARY JAMES HOUGHTON    MI    4.5       1220        1483.86
## 16             MIKE NIKITIN    MI    4.0       1604        1385.80
## 17       RONALD GRZEGORCZYK    MI    4.0       1629        1498.57
## 18            DAVID SUNDEEN    MI    4.0       1600        1480.00
## 19             DIPANKAR ROY    MI    4.0       1564        1426.29
## 20              JASON ZHENG    MI    4.0       1595        1410.86
## 21            DINH DANG BUI    ON    4.0       1563        1470.43
## 22         EUGENE L MCCLURE    MI    4.0       1555        1300.33
## 23                 ALAN BUI    ON    4.0       1363        1213.86
## 24        MICHAEL R ALDRICH    MI    4.0       1229        1357.00
## 25         LOREN SCHWIEBERT    MI    3.5       1745        1363.29
## 26                  MAX ZHU    ON    3.5       1579        1506.86
## 27           GAURAV GIDWANI    MI    3.5       1552        1221.67
## 28     SOFIA ADINA STANESCU    MI    3.5       1507        1522.14
## 29         CHIEDOZIE OKORIE    MI    3.5       1602        1313.50
## 30       GEORGE AVERY JONES    ON    3.5       1522        1144.14
## 31             RISHI SHETTY    MI    3.5       1494        1259.86
## 32    JOSHUA PHILIP MATHEWS    ON    3.5       1441        1378.71
## 33                  JADE GE    MI    3.5       1449        1276.86
## 34   MICHAEL JEFFERY THOMAS    MI    3.5       1399        1375.29
## 35         JOSHUA DAVID LEE    MI    3.5       1438        1149.71
## 36            SIDDHARTH JHA    MI    3.5       1355        1388.17
## 37     AMIYATOSH PWNANANDAM    MI    3.5        980        1384.80
## 38                BRIAN LIU    MI    3.0       1423        1539.17
## 39            JOEL R HENDON    MI    3.0       1436        1429.57
## 40             FOREST ZHANG    MI    3.0       1348        1390.57
## 41      KYLE WILLIAM MURPHY    MI    3.0       1403        1248.50
## 42                 JARED GE    MI    3.0       1332        1149.86
## 43        ROBERT GLEN VASEY    MI    3.0       1283        1106.57
## 44       JUSTIN D SCHILLING    MI    3.0       1199        1327.00
## 45                DEREK YAN    MI    3.0       1242        1152.00
## 46 JACOB ALEXANDER LAVALLEY    MI    3.0        377        1357.71
## 47              ERIC WRIGHT    MI    2.5       1362        1392.00
## 48             DANIEL KHAIN    MI    2.5       1382        1355.80
## 49         MICHAEL J MARTIN    MI    2.5       1291        1285.80
## 50               SHIVAM JHA    MI    2.5       1056        1296.00
## 51           TEJAS AYYAGARI    MI    2.5       1011        1356.14
## 52                ETHAN GUO    MI    2.5        935        1494.57
## 53            JOSE C YBARRA    MI    2.0       1393        1345.33
## 54              LARRY HODGE    MI    2.0       1270        1206.17
## 55                ALEX KONG    MI    2.0       1186        1406.00
## 56             MARISA RICCI    MI    2.0       1153        1414.40
## 57               MICHAEL LU    MI    2.0       1092        1363.00
## 58             VIRAJ MOHILE    MI    2.0        917        1391.00
## 59        SEAN M MC CORMICK    MI    2.0        853        1319.00
## 60               JULIA SHEN    MI    1.5        967        1330.20
## 61            JEZZEL FARKAS    ON    1.5        955        1327.29
## 62            ASHWIN BALAJI    MI    1.0       1530        1186.00
## 63     THOMAS JOSEPH HOSMER    MI    1.0       1175        1350.20
## 64                   BEN LI    MI    1.0       1163        1263.00

Write the output to a file

write.csv(df_final, "./ChessResults.csv")

View stats

library(ggplot2)

ggplot(df_final, aes(x=Pre_Rating)) + geom_histogram(binwidth = 50)

ggplot(df_final, aes(x=Opp_Pre_Rating)) + geom_histogram(binwidth = 50)

ggplot(data = df_final, aes(x = Pre_Rating, y = Opp_Pre_Rating)) + 
  geom_point(color='blue') +
  geom_smooth(method = "lm", se = FALSE)