Introduction
The objective of the project is to read the given text file, clean and process data, and generate csv of the processed data.
The given .txt file has Chess tournament result. Each result has two rows, first row has Pair number, Player name, Total points and rounds played. Second row has State to whom player belongs, pre-rating.
We need to calculate the Average pre-rating of the opponent
Loading of the required packages
knitr::opts_chunk$set(eval = TRUE, results = FALSE,
fig.show = "hide", message = FALSE)
if (!require("stringr")) install.packages('stringr')
## Loading required package: stringr
if (!require("DT")) install.packages('DT')
## Loading required package: DT
if (!require("ggplot2")) install.packages('ggplot2')
## Loading required package: ggplot2
Reading of the text file
The text file with chess results is read from GitHub
#Read the txt file from GitHub
rawdata <- readLines("https://raw.githubusercontent.com/petferns/607-Project1/master/tournamentinfo.txt")
#Get the count of rows
rowlen <- length(rawdata)
Table - Playername
We create a table with rows starting with Player name. From the text file we see the needed data starts from row 5th and every 3rd row we have row of dashes which we donโt need
#Rows that start with player names
PlayerNameRows <- rawdata[seq(5, rowlen, 3)]
Table - Playerstate
We also create a table with rows starting with player state. These row start from 6th row in the text file and we skip the dashes
#Rows that start with player states
PlayerStateRows <- rawdata[seq(6, rowlen, 3)]
Player Name
#Get player name
PlayerName <- str_trim(str_extract(PlayerNameRows, "(\\w+\\s){2,3}"))
Player Total points
#Get player total points
TotalPoints <- as.numeric(str_extract(PlayerNameRows, "\\d+\\.\\d+"))
Player State
#Get player State
PlayerState <- str_extract(PlayerStateRows, "\\w+")
Player Chess Pre-rating
#Get player pre-rating
PlayerPreRating <- str_extract(PlayerStateRows, "[^\\d]\\d{3,4}[^\\d]")
PlayerPreRating <- as.integer(str_extract(PlayerPreRating, "\\d+"))
Opponent Chess Pre-rating
#Get the opponent
GetOpponents <- str_extract_all(PlayerNameRows, "\\d+\\|")
GetOpponents <- str_extract_all(GetOpponents, "\\d+")
#Calculate Opponent avg pre rating
Pair <- as.integer(str_extract(PlayerNameRows, "\\d+"))
AvgOpponentRating <- Pair
for (i in 1:NROW(Pair)) {
AvgOpponentRating[i] <- round(mean(PlayerPreRating[as.numeric(unlist(GetOpponents[Pair[i]]))]))
}
Summarize the data chunks into df
#Summarize all the data into data frame
SummaryData <- data.frame(PlayerName, PlayerState, TotalPoints, PlayerPreRating, AvgOpponentRating)
Creating CSV of data
write.csv(SummaryData, file = "c:/peter/tournament.csv")
Visualizing the player pre-ratings
