Below is the assignment description.
In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players:
* Player’s Name
* Player’s State
* Total Number of Points
* Player’s Pre-Rating, and
* Average Pre Chess Rating of Opponents
The chess tournament data was provided as a text file which has been saved to my github account.
I can extract the player number, name, state, points won, and pre-tournament ELO score by turning the text file into a table. We can use this to calculate the average pre-tournament elo score of the opponents.
We are operating in the tidyverse.
# Load packages --------------------------------------
library(tidyverse)
Here we extract the relevant data from the text file into a dataframe.
Section Code Summary
# Load data --------------------------------------
<- readLines("https://raw.githubusercontent.com/pkofy/DATA607/main/DATA607Project1.txt") txt_data
# Extract rows into the two types, ignoring headers, and put them into tables
<- txt_data[seq(5, length(txt_data), 3)]
row1 <- txt_data[seq(6, length(txt_data), 3)]
row2 <- read.table(text = row1, sep = "|")
t1 <- read.table(text = row2, sep = "|") t2
# Create dataframe and rename columns
<- data.frame(t1$V1, t1$V2, t2$V1, t1$V3, t2$V2, t1$V4, t1$V5, t1$V6, t1$V7, t1$V8, t1$V9, t1$V10)
ctdf <- ctdf %>% rename(number = t1.V1, name = t1.V2, state = t2.V1, pointswon = t1.V3, elo = t2.V2, g1 = t1.V4, g2 = t1.V5, g3 = t1.V6, g4 = t1.V7, g5 = t1.V8, g6 = t1.V9, g7 = t1.V10) ctdf
# Extract elo begin rating
$elo_begin <- str_extract(ctdf$elo, "(R: ....)")
ctdf$elo_begin <- str_extract(ctdf$elo_begin, "....$")
ctdf
# Extract elo end rating
$elo_end <- str_extract(ctdf$elo, "(->....)")
ctdf$elo_end <- str_extract(ctdf$elo_end, "....$")
ctdf
# Rearrange and Remove elo initial column
<- ctdf %>% relocate(elo_end, .after = pointswon)
ctdf <- ctdf %>% relocate(elo_begin, .after = pointswon)
ctdf <- subset(ctdf, select = -elo) ctdf
# Display initial dataframe
head(ctdf, n=5)
Here we calculate the average elo score of the opponents by looking up each opponents elo score at the beginning of the tournament and then taking the average of them.
Section Code Summary
# Create new columns with the opponent numbers
$o1 <- str_extract(ctdf$g1, "..$")
ctdf$o2 <- str_extract(ctdf$g2, "..$")
ctdf$o3 <- str_extract(ctdf$g3, "..$")
ctdf$o4 <- str_extract(ctdf$g4, "..$")
ctdf$o5 <- str_extract(ctdf$g5, "..$")
ctdf$o6 <- str_extract(ctdf$g6, "..$")
ctdf$o7 <- str_extract(ctdf$g7, "..$")
ctdf
# Convert the new columns of player numbers from strings to integers
$o1 <- strtoi(ctdf$o1)
ctdf$o2 <- strtoi(ctdf$o2)
ctdf$o3 <- strtoi(ctdf$o3)
ctdf$o4 <- strtoi(ctdf$o4)
ctdf$o5 <- strtoi(ctdf$o5)
ctdf$o6 <- strtoi(ctdf$o6)
ctdf$o7 <- strtoi(ctdf$o7) ctdf
# Replace the player numbers with the player elo_begin scores
$o1 <- ctdf$elo_begin[ctdf$o1]
ctdf$o2 <- ctdf$elo_begin[ctdf$o2]
ctdf$o3 <- ctdf$elo_begin[ctdf$o3]
ctdf$o4 <- ctdf$elo_begin[ctdf$o4]
ctdf$o5 <- ctdf$elo_begin[ctdf$o5]
ctdf$o6 <- ctdf$elo_begin[ctdf$o6]
ctdf$o7 <- ctdf$elo_begin[ctdf$o7]
ctdf
# Convert the player elo_begin scores from strings to integers
$o1 <- strtoi(ctdf$o1)
ctdf$o2 <- strtoi(ctdf$o2)
ctdf$o3 <- strtoi(ctdf$o3)
ctdf$o4 <- strtoi(ctdf$o4)
ctdf$o5 <- strtoi(ctdf$o5)
ctdf$o6 <- strtoi(ctdf$o6)
ctdf$o7 <- strtoi(ctdf$o7) ctdf
# Take the row average of these and assign it to a new column
$begavgofopp <- rowMeans(ctdf[ , c(14:20)], na.rm=TRUE) ctdf
# Display the required subset of requested data
<- ctdf[ , c(2,3,4,5,21)]
ctdf_final ctdf_final
Here we write the final dataframe to a .csv to be saved to the github folder.
# Writes to csv file
write.csv(ctdf_final, file = "DATA607Project1.csv")
I think I can standardize my code better by doing more piping.
I probably could have been more sophisticated in my approach. I built up the approach piece by piece so I think I just need to do more projects to become fluent with these techniques.
I was happy with my vectorized approach to looking up the beginning ELO ratings however I could have done a better job of reducing white space and representing numerical values as numbers instead of characters in the initial dataframe setup.
The referenced text and csv files and the R Markdown file for this document are saved here, github.com/pkofy/DATA607, with the name “DATA607Project1”.