In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players:
-Player’s Name -Player’s State, -Total Number of Points, -Player’s Pre-Rating, -Average Pre Chess Rating of Opponent
To deal with the data’s ETL process, I will create several r functions to avoid code repetition. These functions will be listed here.
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------ tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v dplyr 1.0.0
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts --------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
#returns a list of rows in a data frame
rows = function(tab) lapply(
seq_len(nrow(tab)),
function(i) unclass(tab[i,,drop=F])
)
# creates an empty data frame
create_empty_table <- function(num_rows, num_cols, type_vec) {
frame <- data.frame(matrix(NA, nrow = num_rows, ncol = num_cols))
for(i in 1:ncol(frame)) {
print(type_vec[i])
if(type_vec[i] == 'numeric') {frame[,i] <- as.numeric(frame[,i])}
if(type_vec[i] == 'character') {frame[,i] <- as.character(frame[,i])}
if(type_vec[i] == 'logical') {frame[,i] <- as.logical(frame[,i])}
if(type_vec[i] == 'factor') {frame[,i] <- as.factor(frame[,i])}
}
return(frame)
}
# removes the pipe character and trims the strings
clean_text <- function(text ) {
text <- str_trim(gsub("\\|", "", text))
return(text)
}
#gets the rating of an opponent based on the result
get_opp_rating <- function( df, result) {
#extract player's ID from result
opp_num <- as.numeric(str_extract_all(result, "\\d+"))
opp_rating <- df[opp_num,]$pre_rating
return(opp_rating)
}
# gets the average opponent rating by looking at each
# opponent's ratings
get_avg_opponent_rtg <- function( df, id) {
opp_ratings <- c( get_opp_rating(df, df[id, ]$round1),
get_opp_rating(df, df[id, ]$round2),
get_opp_rating(df, df[id, ]$round3),
get_opp_rating(df, df[id, ]$round4),
get_opp_rating(df, df[id, ]$round5),
get_opp_rating(df, df[id, ]$round6),
get_opp_rating(df, df[id, ]$round7))
return(mean(opp_ratings, na.rm = TRUE))
}
The data lives in my github, and follows this format: -----------------------------------------------------------------------------------------
Pair | Player Name |Total|Round|Round|Round|Round|Round|Round|Round|
Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
-----------------------------------------------------------------------------------------
1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|
ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |
-----------------------------------------------------------------------------------------
We will first get the data into a dataframe called tournament. We will remove the lines with only dashes
#getting the data from the file
theURL <- "https://raw.githubusercontent.com/georg4re/DS607/master/project1/tournamentinfo.txt"
#using read.fwf will help me with the fixed width nature of the document
tournament <- read.fwf(theURL, width = c(7,34,6,6,6,6,6,6,6,6))
#removing the separating lines between player information
nEnd <- nrow(tournament)
if ((nEnd -1) %% 3 > 0 ) {
nEnd = nEnd - ((nEnd -1) %% 3) + 1
}
removeRows = seq(from=1, to= nEnd, by=3)
tournament <- tournament %>% slice(-removeRows)
# Remove the title values
clean_tournament <- tournament %>% slice(-c(1,2))
head(clean_tournament,10)
## V1 V2 V3 V4 V5 V6
## 1 1 | GARY HUA | 6.0 | W 39| W 21| W 18|
## 2 ON | 15445895 / R: 1794 ->1817 | N:2 | W | B | W |
## 3 2 | DAKSHESH DARURI | 6.0 | W 63| W 58| L 4|
## 4 MI | 14598900 / R: 1553 ->1663 | N:2 | B | W | B |
## 5 3 | ADITYA BAJAJ | 6.0 | L 8| W 61| W 25|
## 6 MI | 14959604 / R: 1384 ->1640 | N:2 | W | B | W |
## 7 4 | PATRICK H SCHILLING | 5.5 | W 23| D 28| W 2|
## 8 MI | 12616049 / R: 1716 ->1744 | N:2 | W | B | W |
## 9 5 | HANSHI ZUO | 5.5 | W 45| W 37| D 12|
## 10 MI | 14601533 / R: 1655 ->1690 | N:2 | B | W | B |
## V7 V8 V9 V10
## 1 W 14| W 7| D 12| D 4|
## 2 B | W | B | W |
## 3 W 17| W 16| W 20| W 7|
## 4 W | B | W | B |
## 5 W 21| W 11| W 13| W 12|
## 6 B | W | B | W |
## 7 W 26| D 5| W 19| D 1|
## 8 B | W | B | B |
## 9 D 13| D 4| W 14| W 17|
## 10 W | B | W | B |