In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players:
Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents
For the first player, the information would be:
Gary Hua, ON, 6.0, 1794, 1605
1605 was calculated by using the pre-tournament opponents’ ratings of 1436, 1563, 1600, 1610, 1649, 1663, 1716, and dividing by the total number of games played.
Answer:
Lets load the content of the file and see the data
library(stringr)
dschess <- readLines("./tournamentinfo.txt")
head(dschess)
## [1] "-----------------------------------------------------------------------------------------"
## [2] " Pair | Player Name |Total|Round|Round|Round|Round|Round|Round|Round| "
## [3] " Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 | "
## [4] "-----------------------------------------------------------------------------------------"
## [5] " 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|"
## [6] " ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |"
tail(dschess)
## [1] " 63 | THOMAS JOSEPH HOSMER |1.0 |L 2|L 48|D 49|L 43|L 45|H |U |"
## [2] " MI | 15057092 / R: 1175 ->1125 | |W |B |W |B |B | | |"
## [3] "-----------------------------------------------------------------------------------------"
## [4] " 64 | BEN LI |1.0 |L 22|D 30|L 31|D 49|L 46|L 42|L 54|"
## [5] " MI | 15006561 / R: 1163 ->1112 | |B |W |W |B |W |B |B |"
## [6] "-----------------------------------------------------------------------------------------"
Trim characters
ds_cp_chess <- ds_cp_chess[sapply(ds_cp_chess, nchar) > 0]
Extract line that contains rows with names into a variable. We can use seq() method to do this. This method returns row numbers from 1 to total length (192 rows) and skips by 3. Following are the rows that we will get.
data_1 <- c(seq(1, length(ds_cp_chess), 3))
data_1
## [1] 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
## [18] 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100
## [35] 103 106 109 112 115 118 121 124 127 130 133 136 139 142 145 148 151
## [52] 154 157 160 163 166 169 172 175 178 181 184 187 190
Apply it to the dataset
data_r1 <- ds_cp_chess[data_1]
head(data_r1)
## [1] " 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|"
## [2] " 2 | DAKSHESH DARURI |6.0 |W 63|W 58|L 4|W 17|W 16|W 20|W 7|"
## [3] " 3 | ADITYA BAJAJ |6.0 |L 8|W 61|W 25|W 21|W 11|W 13|W 12|"
## [4] " 4 | PATRICK H SCHILLING |5.5 |W 23|D 28|W 2|W 26|D 5|W 19|D 1|"
## [5] " 5 | HANSHI ZUO |5.5 |W 45|W 37|D 12|D 13|D 4|W 14|W 17|"
## [6] " 6 | HANSEN SONG |5.0 |W 34|D 29|L 11|W 35|D 10|W 27|W 21|"
Apply it to the dataset
data_r2 <- ds_cp_chess[data_2]
head(data_r2)
## [1] " ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |"
## [2] " MI | 14598900 / R: 1553 ->1663 |N:2 |B |W |B |W |B |W |B |"
## [3] " MI | 14959604 / R: 1384 ->1640 |N:2 |W |B |W |B |W |B |W |"
## [4] " MI | 12616049 / R: 1716 ->1744 |N:2 |W |B |W |B |W |B |B |"
## [5] " MI | 14601533 / R: 1655 ->1690 |N:2 |B |W |B |W |B |W |B |"
## [6] " OH | 15055204 / R: 1686 ->1687 |N:3 |W |B |W |B |B |W |B |"
Extract digits using regex and convert it to numeric
prertg <- as.numeric(str_extract(prertg, "\\(?[0-9,.]+\\)?"))
prertg
## [1] 1794 1553 1384 1716 1655 1686 1649 1641 1411 1365 1712 1663 1666 1610
## [15] 1220 1604 1629 1600 1564 1595 1563 1555 1363 1229 1745 1579 1552 1507
## [29] 1602 1522 1494 1441 1449 1399 1438 1355 980 1423 1436 1348 1403 1332
## [43] 1283 1199 1242 377 1362 1382 1291 1056 1011 935 1393 1270 1186 1153
## [57] 1092 917 853 967 955 1530 1175 1163
Calculate prerating average for the opponent
opppreratingavg <- list()
for (i in 1:length(oppnum)){
opppreratingavg[i] <- round(mean(prertg[unlist(oppnum[i])]),2)
}
opppreratingavg <- lapply(opppreratingavg, as.numeric)
opppreratingavg <- data.frame(unlist(opppreratingavg))
df_final <- cbind.data.frame(name, state, pts, prertg, opppreratingavg)
colnames(df_final) <- c("Name", "State", "Points", "Pre_Rating", "Opp_Pre_Rating")
df_final
## Name State Points Pre_Rating Opp_Pre_Rating
## 1 GARY HUA ON 6.0 1794 1605.29
## 2 DAKSHESH DARURI MI 6.0 1553 1469.29
## 3 ADITYA BAJAJ MI 6.0 1384 1563.57
## 4 PATRICK H SCHILLING MI 5.5 1716 1573.57
## 5 HANSHI ZUO MI 5.5 1655 1500.86
## 6 HANSEN SONG OH 5.0 1686 1518.71
## 7 GARY DEE SWATHELL MI 5.0 1649 1372.14
## 8 EZEKIEL HOUGHTON MI 5.0 1641 1468.43
## 9 STEFANO LEE ON 5.0 1411 1523.14
## 10 ANVIT RAO MI 5.0 1365 1554.14
## 11 CAMERON WILLIAM MC LEMAN MI 4.5 1712 1467.57
## 12 KENNETH J TACK MI 4.5 1663 1506.17
## 13 TORRANCE HENRY JR MI 4.5 1666 1497.86
## 14 BRADLEY SHAW MI 4.5 1610 1515.00
## 15 ZACHARY JAMES HOUGHTON MI 4.5 1220 1483.86
## 16 MIKE NIKITIN MI 4.0 1604 1385.80
## 17 RONALD GRZEGORCZYK MI 4.0 1629 1498.57
## 18 DAVID SUNDEEN MI 4.0 1600 1480.00
## 19 DIPANKAR ROY MI 4.0 1564 1426.29
## 20 JASON ZHENG MI 4.0 1595 1410.86
## 21 DINH DANG BUI ON 4.0 1563 1470.43
## 22 EUGENE L MCCLURE MI 4.0 1555 1300.33
## 23 ALAN BUI ON 4.0 1363 1213.86
## 24 MICHAEL R ALDRICH MI 4.0 1229 1357.00
## 25 LOREN SCHWIEBERT MI 3.5 1745 1363.29
## 26 MAX ZHU ON 3.5 1579 1506.86
## 27 GAURAV GIDWANI MI 3.5 1552 1221.67
## 28 SOFIA ADINA STANESCU MI 3.5 1507 1522.14
## 29 CHIEDOZIE OKORIE MI 3.5 1602 1313.50
## 30 GEORGE AVERY JONES ON 3.5 1522 1144.14
## 31 RISHI SHETTY MI 3.5 1494 1259.86
## 32 JOSHUA PHILIP MATHEWS ON 3.5 1441 1378.71
## 33 JADE GE MI 3.5 1449 1276.86
## 34 MICHAEL JEFFERY THOMAS MI 3.5 1399 1375.29
## 35 JOSHUA DAVID LEE MI 3.5 1438 1149.71
## 36 SIDDHARTH JHA MI 3.5 1355 1388.17
## 37 AMIYATOSH PWNANANDAM MI 3.5 980 1384.80
## 38 BRIAN LIU MI 3.0 1423 1539.17
## 39 JOEL R HENDON MI 3.0 1436 1429.57
## 40 FOREST ZHANG MI 3.0 1348 1390.57
## 41 KYLE WILLIAM MURPHY MI 3.0 1403 1248.50
## 42 JARED GE MI 3.0 1332 1149.86
## 43 ROBERT GLEN VASEY MI 3.0 1283 1106.57
## 44 JUSTIN D SCHILLING MI 3.0 1199 1327.00
## 45 DEREK YAN MI 3.0 1242 1152.00
## 46 JACOB ALEXANDER LAVALLEY MI 3.0 377 1357.71
## 47 ERIC WRIGHT MI 2.5 1362 1392.00
## 48 DANIEL KHAIN MI 2.5 1382 1355.80
## 49 MICHAEL J MARTIN MI 2.5 1291 1285.80
## 50 SHIVAM JHA MI 2.5 1056 1296.00
## 51 TEJAS AYYAGARI MI 2.5 1011 1356.14
## 52 ETHAN GUO MI 2.5 935 1494.57
## 53 JOSE C YBARRA MI 2.0 1393 1345.33
## 54 LARRY HODGE MI 2.0 1270 1206.17
## 55 ALEX KONG MI 2.0 1186 1406.00
## 56 MARISA RICCI MI 2.0 1153 1414.40
## 57 MICHAEL LU MI 2.0 1092 1363.00
## 58 VIRAJ MOHILE MI 2.0 917 1391.00
## 59 SEAN M MC CORMICK MI 2.0 853 1319.00
## 60 JULIA SHEN MI 1.5 967 1330.20
## 61 JEZZEL FARKAS ON 1.5 955 1327.29
## 62 ASHWIN BALAJI MI 1.0 1530 1186.00
## 63 THOMAS JOSEPH HOSMER MI 1.0 1175 1350.20
## 64 BEN LI MI 1.0 1163 1263.00
Write the output to a file
write.csv(df_final, "./ChessResults.csv")
View stats
library(ggplot2)
ggplot(df_final, aes(x=Pre_Rating)) + geom_histogram(binwidth = 50)

ggplot(df_final, aes(x=Opp_Pre_Rating)) + geom_histogram(binwidth = 50)

ggplot(data = df_final, aes(x = Pre_Rating, y = Opp_Pre_Rating)) +
geom_point(color='blue') +
geom_smooth(method = "lm", se = FALSE)
