Project Outline

In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players:

-Player’s Name -Player’s State, -Total Number of Points, -Player’s Pre-Rating, -Average Pre Chess Rating of Opponent

Setting up supporting functions

To deal with the data’s ETL process, I will create several r functions to avoid code repetition. These functions will be listed here.

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------ tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v dplyr   1.0.0
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## -- Conflicts --------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
#returns a list of rows in a data frame
rows = function(tab) lapply(
  seq_len(nrow(tab)),
  function(i) unclass(tab[i,,drop=F])
)

# creates an empty data frame
create_empty_table <- function(num_rows, num_cols, type_vec) {
  frame <- data.frame(matrix(NA, nrow = num_rows, ncol = num_cols))
  for(i in 1:ncol(frame)) {
    print(type_vec[i])
    if(type_vec[i] == 'numeric') {frame[,i] <- as.numeric(frame[,i])}
    if(type_vec[i] == 'character') {frame[,i] <- as.character(frame[,i])}
    if(type_vec[i] == 'logical') {frame[,i] <- as.logical(frame[,i])}
    if(type_vec[i] == 'factor') {frame[,i] <- as.factor(frame[,i])}
  }
  return(frame)
}

# removes the pipe character and trims the strings
clean_text <- function(text ) { 
  text <- str_trim(gsub("\\|", "", text))
  return(text)
}

#gets the rating of an opponent based on the result
get_opp_rating <- function( df, result) { 
  #extract player's ID from result 
  opp_num <- as.numeric(str_extract_all(result, "\\d+"))
  opp_rating <- df[opp_num,]$pre_rating
  
  return(opp_rating)
}

# gets the average opponent rating by looking at each 
# opponent's ratings
get_avg_opponent_rtg <- function( df, id) { 
    opp_ratings <- c( get_opp_rating(df, df[id, ]$round1), 
                      get_opp_rating(df, df[id, ]$round2),
                      get_opp_rating(df, df[id, ]$round3),
                      get_opp_rating(df, df[id, ]$round4),
                      get_opp_rating(df, df[id, ]$round5),
                      get_opp_rating(df, df[id, ]$round6),
                      get_opp_rating(df, df[id, ]$round7))
    return(mean(opp_ratings, na.rm = TRUE))
}

Extract the data

The data lives in my github, and follows this format: ----------------------------------------------------------------------------------------- Pair | Player Name |Total|Round|Round|Round|Round|Round|Round|Round| Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 | ----------------------------------------------------------------------------------------- 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4| ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W | -----------------------------------------------------------------------------------------

Getting the data

We will first get the data into a dataframe called tournament. We will remove the lines with only dashes

#getting the data from the file
theURL <- "https://raw.githubusercontent.com/georg4re/DS607/master/project1/tournamentinfo.txt"

#using read.fwf will help me with the fixed width nature of the document
tournament <- read.fwf(theURL, width = c(7,34,6,6,6,6,6,6,6,6))

#removing the separating lines between player information
nEnd <- nrow(tournament)
if ((nEnd -1) %% 3 > 0 ) { 
  nEnd = nEnd - ((nEnd -1) %% 3) + 1
} 

removeRows = seq(from=1, to= nEnd, by=3)
tournament <- tournament %>% slice(-removeRows)

# Remove the title values
clean_tournament <- tournament %>% slice(-c(1,2))
head(clean_tournament,10)
##         V1                                 V2     V3     V4     V5     V6
## 1      1 |  GARY HUA                        | 6.0  | W  39| W  21| W  18|
## 2     ON |  15445895 / R: 1794   ->1817     | N:2  | W    | B    | W    |
## 3      2 |  DAKSHESH DARURI                 | 6.0  | W  63| W  58| L   4|
## 4     MI |  14598900 / R: 1553   ->1663     | N:2  | B    | W    | B    |
## 5      3 |  ADITYA BAJAJ                    | 6.0  | L   8| W  61| W  25|
## 6     MI |  14959604 / R: 1384   ->1640     | N:2  | W    | B    | W    |
## 7      4 |  PATRICK H SCHILLING             | 5.5  | W  23| D  28| W   2|
## 8     MI |  12616049 / R: 1716   ->1744     | N:2  | W    | B    | W    |
## 9      5 |  HANSHI ZUO                      | 5.5  | W  45| W  37| D  12|
## 10    MI |  14601533 / R: 1655   ->1690     | N:2  | B    | W    | B    |
##        V7     V8     V9    V10
## 1  W  14| W   7| D  12| D   4|
## 2  B    | W    | B    | W    |
## 3  W  17| W  16| W  20| W   7|
## 4  W    | B    | W    | B    |
## 5  W  21| W  11| W  13| W  12|
## 6  B    | W    | B    | W    |
## 7  W  26| D   5| W  19| D   1|
## 8  B    | W    | B    | B    |
## 9  D  13| D   4| W  14| W  17|
## 10 W    | B    | W    | B    |