In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players:
Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents For the first player, the information would be:
Gary Hua, ON, 6.0, 1794, 1605
1605 was calculated by using the pre-tournament opponents’ ratings of 1436, 1563, 1600, 1610, 1649, 1663, 1716, and dividing by the total number of games played
library(RCurl)
library(knitr)
library(kableExtra)
library(tidyverse)
library(stringr)chessdata <- readLines("https://raw.githubusercontent.com/mandiemannz/Data-607--Fall-18/master/project1txt", warn = F)
head(chessdata, 22)## [1] "-----------------------------------------------------------------------------------------"
## [2] " Pair | Player Name |Total|Round|Round|Round|Round|Round|Round|Round| "
## [3] " Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 | "
## [4] "-----------------------------------------------------------------------------------------"
## [5] " 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|"
## [6] " ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |"
## [7] "-----------------------------------------------------------------------------------------"
## [8] " 2 | DAKSHESH DARURI |6.0 |W 63|W 58|L 4|W 17|W 16|W 20|W 7|"
## [9] " MI | 14598900 / R: 1553 ->1663 |N:2 |B |W |B |W |B |W |B |"
## [10] "-----------------------------------------------------------------------------------------"
## [11] " 3 | ADITYA BAJAJ |6.0 |L 8|W 61|W 25|W 21|W 11|W 13|W 12|"
## [12] " MI | 14959604 / R: 1384 ->1640 |N:2 |W |B |W |B |W |B |W |"
## [13] "-----------------------------------------------------------------------------------------"
## [14] " 4 | PATRICK H SCHILLING |5.5 |W 23|D 28|W 2|W 26|D 5|W 19|D 1|"
## [15] " MI | 12616049 / R: 1716 ->1744 |N:2 |W |B |W |B |W |B |B |"
## [16] "-----------------------------------------------------------------------------------------"
## [17] " 5 | HANSHI ZUO |5.5 |W 45|W 37|D 12|D 13|D 4|W 14|W 17|"
## [18] " MI | 14601533 / R: 1655 ->1690 |N:2 |B |W |B |W |B |W |B |"
## [19] "-----------------------------------------------------------------------------------------"
## [20] " 6 | HANSEN SONG |5.0 |W 34|D 29|L 11|W 35|D 10|W 27|W 21|"
## [21] " OH | 15055204 / R: 1686 ->1687 |N:3 |W |B |W |B |B |W |B |"
## [22] "-----------------------------------------------------------------------------------------"
From looking at the data from the txt file provided, we can notice that a lot can be omitted from our task. We can correct/clean the data by the following:
If we were to run the following code: seq(from, to, by) this will let us pull only the rows that have data that is needed. Therefore, this will make it possible to manipulate with regular expressions and extract information from the data.
#begin at column 5, this will increase to the length of the data, by 3
chessdata1 <- c(seq(5, length(chessdata),3))
#begin at column 6, the will increase to the length of the data, by 3
chessdata2 <- c(seq(6, length(chessdata),3))The expressions used to pull out necessary data (name, state, total_points, pre_raiting):
#regex to pull out only the name from chessdata1
name <- str_replace_all(str_extract(chessdata[chessdata1],"([|]).+?\\1"),"[|]","")
#extracted state from chessdata2
state <- str_extract(chessdata[chessdata2], "[A-Z]{2}" )
#extracted total points from chessdata1
total_points <- str_extract(chessdata[chessdata1], "\\d.\\d")
#extracted pre raiting.
pre_raiting1 <- as.integer(str_replace_all(str_extract(chessdata[chessdata2], "R: \\s?\\d{3,4}"), "R:\\s", ""))
#extracted player number
player_num <- as.integer(str_extract(chessdata[chessdata1], "\\d+"))df <- data.frame(name, state, player_num, total_points, pre_raiting1)kable(head(df, 20), "html", escape = F) %>%
kable_styling("striped", full_width = F, font_size = 15) %>%
column_spec(1:2, bold = T)| name | state | player_num | total_points | pre_raiting1 |
|---|---|---|---|---|
| GARY HUA | ON | 1 | 6.0 | 1794 |
| DAKSHESH DARURI | MI | 2 | 6.0 | 1553 |
| ADITYA BAJAJ | MI | 3 | 6.0 | 1384 |
| PATRICK H SCHILLING | MI | 4 | 5.5 | 1716 |
| HANSHI ZUO | MI | 5 | 5.5 | 1655 |
| HANSEN SONG | OH | 6 | 5.0 | 1686 |
| GARY DEE SWATHELL | MI | 7 | 5.0 | 1649 |
| EZEKIEL HOUGHTON | MI | 8 | 5.0 | 1641 |
| STEFANO LEE | ON | 9 | 5.0 | 1411 |
| ANVIT RAO | MI | 10 | 5.0 | 1365 |
| CAMERON WILLIAM MC LEMAN | MI | 11 | 4.5 | 1712 |
| KENNETH J TACK | MI | 12 | 4.5 | 1663 |
| TORRANCE HENRY JR | MI | 13 | 4.5 | 1666 |
| BRADLEY SHAW | MI | 14 | 4.5 | 1610 |
| ZACHARY JAMES HOUGHTON | MI | 15 | 4.5 | 1220 |
| MIKE NIKITIN | MI | 16 | 4.0 | 1604 |
| RONALD GRZEGORCZYK | MI | 17 | 4.0 | 1629 |
| DAVID SUNDEEN | MI | 18 | 4.0 | 1600 |
| DIPANKAR ROY | MI | 19 | 4.0 | 1564 |
| JASON ZHENG | MI | 20 | 4.0 | 1595 |
challenger1 <- str_extract_all(chessdata[chessdata1], "\\d+\\|")
challenger <- str_extract_all(challenger1,"\\d+")
head(challenger)## [[1]]
## [1] "39" "21" "18" "14" "7" "12" "4"
##
## [[2]]
## [1] "63" "58" "4" "17" "16" "20" "7"
##
## [[3]]
## [1] "8" "61" "25" "21" "11" "13" "12"
##
## [[4]]
## [1] "23" "28" "2" "26" "5" "19" "1"
##
## [[5]]
## [1] "45" "37" "12" "13" "4" "14" "17"
##
## [[6]]
## [1] "34" "29" "11" "35" "10" "27" "21"
Created a loop in order to calculate the challengers pre raiting score
challenger_preraiting <- numeric(length(chessdata1))
for (i in 1:(length(chessdata1))){
challenger_preraiting[i] <- mean(pre_raiting1[as.numeric(unlist(challenger[i]))])
}
challenger_preraiting <- round(challenger_preraiting,0)df1 <- data.frame(name, state, player_num, total_points, pre_raiting1, challenger_preraiting)
kable(df1, "html", escape = F) %>%
kable_styling("striped", full_width = F, font_size = 15) %>%
column_spec(1:2, bold = T)| name | state | player_num | total_points | pre_raiting1 | challenger_preraiting |
|---|---|---|---|---|---|
| GARY HUA | ON | 1 | 6.0 | 1794 | 1605 |
| DAKSHESH DARURI | MI | 2 | 6.0 | 1553 | 1469 |
| ADITYA BAJAJ | MI | 3 | 6.0 | 1384 | 1564 |
| PATRICK H SCHILLING | MI | 4 | 5.5 | 1716 | 1574 |
| HANSHI ZUO | MI | 5 | 5.5 | 1655 | 1501 |
| HANSEN SONG | OH | 6 | 5.0 | 1686 | 1519 |
| GARY DEE SWATHELL | MI | 7 | 5.0 | 1649 | 1372 |
| EZEKIEL HOUGHTON | MI | 8 | 5.0 | 1641 | 1468 |
| STEFANO LEE | ON | 9 | 5.0 | 1411 | 1523 |
| ANVIT RAO | MI | 10 | 5.0 | 1365 | 1554 |
| CAMERON WILLIAM MC LEMAN | MI | 11 | 4.5 | 1712 | 1468 |
| KENNETH J TACK | MI | 12 | 4.5 | 1663 | 1506 |
| TORRANCE HENRY JR | MI | 13 | 4.5 | 1666 | 1498 |
| BRADLEY SHAW | MI | 14 | 4.5 | 1610 | 1515 |
| ZACHARY JAMES HOUGHTON | MI | 15 | 4.5 | 1220 | 1484 |
| MIKE NIKITIN | MI | 16 | 4.0 | 1604 | 1386 |
| RONALD GRZEGORCZYK | MI | 17 | 4.0 | 1629 | 1499 |
| DAVID SUNDEEN | MI | 18 | 4.0 | 1600 | 1480 |
| DIPANKAR ROY | MI | 19 | 4.0 | 1564 | 1426 |
| JASON ZHENG | MI | 20 | 4.0 | 1595 | 1411 |
| DINH DANG BUI | ON | 21 | 4.0 | 1563 | 1470 |
| EUGENE L MCCLURE | MI | 22 | 4.0 | 1555 | 1300 |
| ALAN BUI | ON | 23 | 4.0 | 1363 | 1214 |
| MICHAEL R ALDRICH | MI | 24 | 4.0 | 1229 | 1357 |
| LOREN SCHWIEBERT | MI | 25 | 3.5 | 1745 | 1363 |
| MAX ZHU | ON | 26 | 3.5 | 1579 | 1507 |
| GAURAV GIDWANI | MI | 27 | 3.5 | 1552 | 1222 |
| SOFIA ADINA STANESCU-BELLU | MI | 28 | 3.5 | 1507 | 1522 |
| CHIEDOZIE OKORIE | MI | 29 | 3.5 | 1602 | 1314 |
| GEORGE AVERY JONES | ON | 30 | 3.5 | 1522 | 1144 |
| RISHI SHETTY | MI | 31 | 3.5 | 1494 | 1260 |
| JOSHUA PHILIP MATHEWS | ON | 32 | 3.5 | 1441 | 1379 |
| JADE GE | MI | 33 | 3.5 | 1449 | 1277 |
| MICHAEL JEFFERY THOMAS | MI | 34 | 3.5 | 1399 | 1375 |
| JOSHUA DAVID LEE | MI | 35 | 3.5 | 1438 | 1150 |
| SIDDHARTH JHA | MI | 36 | 3.5 | 1355 | 1388 |
| AMIYATOSH PWNANANDAM | MI | 37 | 3.5 | 980 | 1385 |
| BRIAN LIU | MI | 38 | 3.0 | 1423 | 1539 |
| JOEL R HENDON | MI | 39 | 3.0 | 1436 | 1430 |
| FOREST ZHANG | MI | 40 | 3.0 | 1348 | 1391 |
| KYLE WILLIAM MURPHY | MI | 41 | 3.0 | 1403 | 1248 |
| JARED GE | MI | 42 | 3.0 | 1332 | 1150 |
| ROBERT GLEN VASEY | MI | 43 | 3.0 | 1283 | 1107 |
| JUSTIN D SCHILLING | MI | 44 | 3.0 | 1199 | 1327 |
| DEREK YAN | MI | 45 | 3.0 | 1242 | 1152 |
| JACOB ALEXANDER LAVALLEY | MI | 46 | 3.0 | 377 | 1358 |
| ERIC WRIGHT | MI | 47 | 2.5 | 1362 | 1392 |
| DANIEL KHAIN | MI | 48 | 2.5 | 1382 | 1356 |
| MICHAEL J MARTIN | MI | 49 | 2.5 | 1291 | 1286 |
| SHIVAM JHA | MI | 50 | 2.5 | 1056 | 1296 |
| TEJAS AYYAGARI | MI | 51 | 2.5 | 1011 | 1356 |
| ETHAN GUO | MI | 52 | 2.5 | 935 | 1495 |
| JOSE C YBARRA | MI | 53 | 2.0 | 1393 | 1345 |
| LARRY HODGE | MI | 54 | 2.0 | 1270 | 1206 |
| ALEX KONG | MI | 55 | 2.0 | 1186 | 1406 |
| MARISA RICCI | MI | 56 | 2.0 | 1153 | 1414 |
| MICHAEL LU | MI | 57 | 2.0 | 1092 | 1363 |
| VIRAJ MOHILE | MI | 58 | 2.0 | 917 | 1391 |
| SEAN M MC CORMICK | MI | 59 | 2.0 | 853 | 1319 |
| JULIA SHEN | MI | 60 | 1.5 | 967 | 1330 |
| JEZZEL FARKAS | ON | 61 | 1.5 | 955 | 1327 |
| ASHWIN BALAJI | MI | 62 | 1.0 | 1530 | 1186 |
| THOMAS JOSEPH HOSMER | MI | 63 | 1.0 | 1175 | 1350 |
| BEN LI | MI | 64 | 1.0 | 1163 | 1263 |
write.table(df1, file = "C:/Users/manda/Desktop/Data607/DATA_607_Project1.csv", sep = ",", col.names = T)