Introduction

The objective of the project is to read the given text file, clean and process data, and generate csv of the processed data.

The given .txt file has Chess tournament result. Each result has two rows, first row has Pair number, Player name, Total points and rounds played. Second row has State to whom player belongs, pre-rating.

We need to calculate the Average pre-rating of the opponent

Loading of the required packages

knitr::opts_chunk$set(eval = TRUE, results = FALSE, 
                      fig.show = "hide", message = FALSE)
if (!require("stringr")) install.packages('stringr')
## Loading required package: stringr
if (!require("DT")) install.packages('DT')
## Loading required package: DT
if (!require("ggplot2")) install.packages('ggplot2')
## Loading required package: ggplot2

Reading of the text file

The text file with chess results is read from GitHub

#Read the txt file from GitHub
rawdata <- readLines("https://raw.githubusercontent.com/petferns/607-Project1/master/tournamentinfo.txt")

#Get the count of rows
rowlen <- length(rawdata)

Table - Playername

We create a table with rows starting with Player name. From the text file we see the needed data starts from row 5th and every 3rd row we have row of dashes which we donโ€™t need

#Rows that start with player names
PlayerNameRows <- rawdata[seq(5, rowlen, 3)]

Table - Playerstate

We also create a table with rows starting with player state. These row start from 6th row in the text file and we skip the dashes

#Rows that start with player states
PlayerStateRows <- rawdata[seq(6, rowlen, 3)]

Player Name

#Get player name
PlayerName <- str_trim(str_extract(PlayerNameRows, "(\\w+\\s){2,3}"))

Player Total points

#Get player total points
TotalPoints <- as.numeric(str_extract(PlayerNameRows, "\\d+\\.\\d+"))

Player State

#Get player State
PlayerState <- str_extract(PlayerStateRows, "\\w+")

Player Chess Pre-rating

#Get player pre-rating

PlayerPreRating <- str_extract(PlayerStateRows, "[^\\d]\\d{3,4}[^\\d]")
PlayerPreRating <- as.integer(str_extract(PlayerPreRating, "\\d+"))

Opponent Chess Pre-rating

#Get the opponent

GetOpponents <- str_extract_all(PlayerNameRows, "\\d+\\|")
GetOpponents <- str_extract_all(GetOpponents, "\\d+")

#Calculate Opponent avg pre rating

Pair <- as.integer(str_extract(PlayerNameRows, "\\d+"))
AvgOpponentRating <- Pair
for (i in 1:NROW(Pair)) { 
  AvgOpponentRating[i] <- round(mean(PlayerPreRating[as.numeric(unlist(GetOpponents[Pair[i]]))])) 
}

Summarize the data chunks into df

#Summarize all the data into  data frame

SummaryData <- data.frame(PlayerName, PlayerState, TotalPoints, PlayerPreRating, AvgOpponentRating)

Creating CSV of data

write.csv(SummaryData, file = "c:/peter/tournament.csv")

Visualizing the player pre-ratings