Introduction: In this project, we are given a text file with chess
tournament results. This text file has a specific structure. Our goal is
to create an R Markdown file that generates a .CSV file with the
following info for all of the players: “Player’s Name, Player’s State,
Total Number of Points, Player’s Pre-Rating, and Average Pre Chess
Rating of Opponents”. First, I will load all necessary libraries and
attempt to clean the data as much as I can.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
library(stringr)
# Read the data without headers
raw_data <- read.table("/Users/leslie/Downloads/tournamentinfo.txt",
sep = "|",
fill = TRUE,
stringsAsFactors = FALSE)
# Check the structure of the raw data
str(raw_data)
## 'data.frame': 196 obs. of 11 variables:
## $ V1 : chr "-----------------------------------------------------------------------------------------" " Pair " " Num " "-----------------------------------------------------------------------------------------" ...
## $ V2 : chr "" " Player Name " " USCF ID / Rtg (Pre->Post) " "" ...
## $ V3 : chr "" "Total" " Pts " "" ...
## $ V4 : chr "" "Round" " 1 " "" ...
## $ V5 : chr "" "Round" " 2 " "" ...
## $ V6 : chr "" "Round" " 3 " "" ...
## $ V7 : chr "" "Round" " 4 " "" ...
## $ V8 : chr "" "Round" " 5 " "" ...
## $ V9 : chr "" "Round" " 6 " "" ...
## $ V10: chr "" "Round" " 7 " "" ...
## $ V11: logi NA NA NA NA NA NA ...
# Initial column assignment
colnames(raw_data) <- c("Pair_Num", "Player_Name", "Total", "Round_1", "Round_2", "Round_3", "Round_4", "Round_5", "Round_6", "Round_7")
# Remove the first four rows using indexing
raw_data <- raw_data[-c(1:4), ]
# Remove the last column if it contains only NA values
if (all(is.na(raw_data[, ncol(raw_data)]))) {
raw_data <- raw_data[, -ncol(raw_data)]
}
# Remove rows that contain only dashes
raw_data <- raw_data[!grepl("^\\s*-{2,}\\s*$", raw_data$Pair_Num), ]
View(raw_data)
Conclusion: I was able to load the text file, clean it up a little,
and create a csv file. But now I have this problem were I still have
data I do not need in my results. Ive tried to remove the USCF ID/ Rtg
(Pre->Post) but whenever I do, it messes up my column with the
pre-rating. This is leading to extra rows that I do not need with “NA”
values. I cannot remove entire row with “NA” or “NaN” because it still
has important information that I cannot remove from my data table. I’ve
tried many different things and nothing has worked. I am not sure what
else to do here.