Project Description

In this project, I was given text a file about a chess tournament result. I have uploaded the file to my github repository and created an .Rmd file with the following information for all of the players:
  
Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents

The outcome of the code is in the following format:
Gary Hua, ON, 6.0, 1794, 1605


Here, 1605 was calculated by using the pre-tournament opponents’ ratings of 1436, 1563, 1600, 1610, 1649, 1663, 1716, and dividing by the total number of games played.


All of my codes are in an R markdown file (and published to rpubs.com); with my data accessible for the person running the script.

Calling Necessary Libraries and Importing Data

suppressWarnings(library(readr))
suppressWarnings(library(stringr))
suppressWarnings(library(dplyr))
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Next I will read in the chess data using readr’s read_lines function which loads each line into a character vector.

lines <- read_lines("https://raw.githubusercontent.com/JHALAK-sps/Chess/master/tournamentinfo.txt")
lines[1:10]
##  [1] "-----------------------------------------------------------------------------------------" 
##  [2] " Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| "
##  [3] " Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | "
##  [4] "-----------------------------------------------------------------------------------------" 
##  [5] "    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|" 
##  [6] "   ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |" 
##  [7] "-----------------------------------------------------------------------------------------" 
##  [8] "    2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|" 
##  [9] "   MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |" 
## [10] "-----------------------------------------------------------------------------------------"
theUrl <- 'https://raw.githubusercontent.com/dcorrea614/MSDS/master/tournamentinfo.txt'

dfTournament <- read_csv(file = theUrl,col_names = FALSE)
## Rows: 196 Columns: 1
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): X1
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

Interpreting the Data

If we notice, the first 4 lines are not part of the dataset. Followed by it the player information and games played, repeating after every 3 lines. Thus, I, separated the data into 2 matrices to make it a traditional data set.

mTournament <- matrix(unlist(dfTournament), byrow=TRUE)

m1 <- mTournament[seq(5,length(mTournament),3)]
head(m1)
## [1] "1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|"
## [2] "2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|"
## [3] "3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12|"
## [4] "4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1|"
## [5] "5 | HANSHI ZUO                      |5.5  |W  45|W  37|D  12|D  13|D   4|W  14|W  17|"
## [6] "6 | HANSEN SONG                     |5.0  |W  34|D  29|L  11|W  35|D  10|W  27|W  21|"
m2 <- mTournament[seq(6,length(mTournament),3)]
head(m2)
## [1] "ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |"
## [2] "MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |"
## [3] "MI | 14959604 / R: 1384   ->1640     |N:2  |W    |B    |W    |B    |W    |B    |W    |"
## [4] "MI | 12616049 / R: 1716   ->1744     |N:2  |W    |B    |W    |B    |W    |B    |B    |"
## [5] "MI | 14601533 / R: 1655   ->1690     |N:2  |B    |W    |B    |W    |B    |W    |B    |"
## [6] "OH | 15055204 / R: 1686   ->1687     |N:3  |W    |B    |W    |B    |B    |W    |B    |"

Regular Expression and String Manupulation

Here I have extracted all the feature values in vector using regular expressions.

# matching first numbers
ID <- as.numeric(str_extract(m1, '\\d+'))

# matching the first combination of a letter, any amount of characters and "|"
Name <- str_extract(m1, '[A-z].{1,32}') 

# extracting the name
Name <- str_trim(str_extract(Name, '.+\\s{2,}'))

# matching the first two letters (state) in the second matrix 
State <- str_extract(m2, '[A-Z]{2}') 

# matching at least 1 number, a period, and 1 number
TotalNumberofPoints <- as.numeric(str_extract(m1, '\\d+\\.\\d'))

# matching the combination of "R", any characters and "-"
PreRating <- str_extract(m2, 'R:.{8,}-')

# matching first 4 numbers
PreRating <- as.numeric(str_extract(PreRating, '\\d{1,4}'))

# matching all combinations of 1 letter 2 spaces and any numbers
Rounds <- str_extract_all(m1, '[A-Z]\\s{2,}\\d+')

# matching numbers
Rounds <- str_extract_all(Rounds, '\\d+')
## Warning in stri_extract_all_regex(string, pattern, simplify = simplify, :
## argument is not an atomic vector; coercing

Calculating Average Pre Chess Rating of Opponents

Using two vectors from the previous step, I am able to calculate the Average Pre Chess Rating of Opponents using a loop.

AvgOppPreChessRating <- c()
for(i in c(1:length(Rounds))){
  AvgOppPreChessRating[i] <- round(mean(PreRating[as.numeric(Rounds[[i]])]),0)
}
AvgOppPreChessRating
##  [1] 1605 1469 1564 1574 1501 1519 1372 1468 1523 1554 1468 1506 1498 1515 1484
## [16] 1386 1499 1480 1426 1411 1470 1300 1214 1357 1363 1507 1222 1522 1314 1144
## [31] 1260 1379 1277 1375 1150 1388 1385 1539 1430 1391 1248 1150 1107 1327 1152
## [46] 1358 1392 1356 1286 1296 1356 1495 1345 1206 1406 1414 1363 1391 1319 1330
## [61] 1327 1186 1350 1263

Constructing the Data Frame

Project1 <- data.frame(ID,Name,State,TotalNumberofPoints,PreRating,AvgOppPreChessRating)
head(Project1)
##   ID                Name State TotalNumberofPoints PreRating
## 1  1            GARY HUA    ON                 6.0      1794
## 2  2     DAKSHESH DARURI    MI                 6.0      1553
## 3  3        ADITYA BAJAJ    MI                 6.0      1384
## 4  4 PATRICK H SCHILLING    MI                 5.5      1716
## 5  5          HANSHI ZUO    MI                 5.5      1655
## 6  6         HANSEN SONG    OH                 5.0      1686
##   AvgOppPreChessRating
## 1                 1605
## 2                 1469
## 3                 1564
## 4                 1574
## 5                 1501
## 6                 1519

Writing the CSV File

write_csv(Project1, 'tournament.csv' , append = FALSE)