Project Summary

In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players: Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents For the first player, the information would be: Gary Hua, ON, 6.0, 1794, 1605

Data Import

Text file imported from local folder
tdata <- read.csv(paste0("C:\\Users\\26291\\Documents\\Data_607\\tournamentinfo.txt"), header =F)
head(tdata)
##                                                                                           V1
## 1  -----------------------------------------------------------------------------------------
## 2  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 
## 3  Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## 4  -----------------------------------------------------------------------------------------
## 5      1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## 6     ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |
tail(tdata)
##                                                                                            V1
## 191    63 | THOMAS JOSEPH HOSMER            |1.0  |L   2|L  48|D  49|L  43|L  45|H    |U    |
## 192    MI | 15057092 / R: 1175   ->1125     |     |W    |B    |W    |B    |B    |     |     |
## 193 -----------------------------------------------------------------------------------------
## 194    64 | BEN LI                          |1.0  |L  22|D  30|L  31|D  49|L  46|L  42|L  54|
## 195    MI | 15006561 / R: 1163   ->1112     |     |B    |W    |W    |B    |W    |B    |B    |
## 196 -----------------------------------------------------------------------------------------

Data Preprocessing and Wrangling

Data Extraction

library(stringr)
pairNo <- as.integer(str_extract(pInfo, "\\d+"))
Name <- str_trim(str_extract(pInfo, "(\\w+\\s){2,3}"))
Region <- str_extract(rInfo, "\\w+")
Points <- as.numeric(str_extract(pInfo, "\\d+\\.\\d+"))
Rating <- as.integer(str_extract(str_extract(rInfo, "[^\\d]\\d{3,4}[^\\d]"), "\\d+"))
Opponents <- str_extract_all(str_extract_all(pInfo, "\\d+\\|"), "\\d+")
Won <- str_count(pInfo, "\\Q|W  \\E")
Loose <- str_count(pInfo, "\\Q|L  \\E")
Draw <- str_count(pInfo, "\\Q|D  \\E")

Average Opponent Calculation

to calculate average Rating, we add all the Opponents Pre Torunamanet Ratings and divide it by the total number of games played by the player.
avgRating <- length(pInfo)
for (i in 1:length(pInfo)) { 
  avgRating[i] <- round(mean(Rating[as.numeric(unlist(Opponents[pairNo[i]]))]), digits = 0)
}
fdata <- data.frame(Name, Region, Points, Rating, avgRating);

Final DataSet

with the five attributes
colnames(fdata) <- c("Player's Name", "Player's State", "Total Number of Points", "Player's Pre-Rating", " Average Pre Chess Rating of Opponents")
library(DT)
datatable(fdata)

Create csv file of the final dataset

write.csv(fdata, file = "chessdata.csv")