Project summary

In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players: Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents For the first player, the information would be: Gary Hua, ON, 6.0, 1794, 1605

Data Import

Lets import data from the text file and validate using head and tail.

tournamentinfo <- read.csv(paste0("C:/Users/admin-server/Documents/tournamentinfo.txt"), header=F)
head(tournamentinfo)
##                                                                                           V1
## 1  -----------------------------------------------------------------------------------------
## 2  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 
## 3  Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## 4  -----------------------------------------------------------------------------------------
## 5      1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## 6     ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |
tail(tournamentinfo)
##                                                                                            V1
## 191    63 | THOMAS JOSEPH HOSMER            |1.0  |L   2|L  48|D  49|L  43|L  45|H    |U    |
## 192    MI | 15057092 / R: 1175   ->1125     |     |W    |B    |W    |B    |B    |     |     |
## 193 -----------------------------------------------------------------------------------------
## 194    64 | BEN LI                          |1.0  |L  22|D  30|L  31|D  49|L  46|L  42|L  54|
## 195    MI | 15006561 / R: 1163   ->1112     |     |B    |W    |W    |B    |W    |B    |B    |
## 196 -----------------------------------------------------------------------------------------

Data Wrangling

As seen above, we need to remove the header i.e row 1 to row 4.Without removing header the subsequent code of retrieving Player Info and Rating Info will not work.

tournamentinfo <- tournamentinfo[-c(1:4),]
head(tournamentinfo)
## [1]     1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## [2]    ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |
## [3] -----------------------------------------------------------------------------------------
## [4]     2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|
## [5]    MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |
## [6] -----------------------------------------------------------------------------------------
## 131 Levels: ----------------------------------------------------------------------------------------- ...

If you see carefully, we need to extract every 1 + 3nth row for Player Name and 2 + 3nth row for Player ratings.

playerInfo <- tournamentinfo[seq(1, length(tournamentinfo), 3)]
ratingInfo <- tournamentinfo[seq(2, length(tournamentinfo), 3)]

Extract Data using Regular Expression

pairNo <- as.integer(str_extract(playerInfo, "\\d+"))
Name <- str_trim(str_extract(playerInfo, "(\\w+\\s){2,3}"))
Region <- str_extract(ratingInfo, "\\w+")
Points <- as.numeric(str_extract(playerInfo, "\\d+\\.\\d+"))
Rating <- as.integer(str_extract(str_extract(ratingInfo, "[^\\d]\\d{3,4}[^\\d]"), "\\d+"))
Opponents <- str_extract_all(str_extract_all(playerInfo, "\\d+\\|"), "\\d+")
Won <- str_count(playerInfo, "\\Q|W  \\E")
Loose <- str_count(playerInfo, "\\Q|L  \\E")
Draw <- str_count(playerInfo, "\\Q|D  \\E")

Calculate Opponents Mean Rating

To calculate Mean Rating, we add all the Opponents Pre Torunamanet Ratings and divide it by the total number of games played by the player. For example, consider the case of Player 1: Gary Hua. To calculate Mean Rating we need to do the following in R : (1463 +1563 + 1600 + 1610 + 1649 + 1663 + 1716)/7 = 1605

mRating <- length(playerInfo)
for (i in 1:length(playerInfo)) { 
  mRating[i] <- round(mean(Rating[as.numeric(unlist(Opponents[pairNo[i]]))]), digits = 0) 
}
opData <- data.frame(Name, Region, Points, Rating, mRating, Won, Loose, Draw);

Lets see the data

colnames(opData) <- c("Player's Name", "Player's State", "Total Number of Points", "Player's Pre-Rating", " Average Pre Chess Rating of Opponents", "Won", "Lost", "Draw")
datatable(opData)

Create CSV output file

The below writes the data opData into a CSV format that it creates by the name of chessInfo.csv. Please Note that the file is written to your working directory.

write.csv(opData, file = "chessInfo.csv")