This project’s purpose is to display the ability to convert a text file with specific formatting into a CSV file with a subset of the fields present in the initial text file, in addition, to the calculated average of each chess player’s average opponent rating. The main challenge this text file presents is that an individual chess player’s information takes up two rows, so I will have to find a creative solution that allows me to work with both rows for the player. I also recognize that this project will serve as an opportunity to work on my Regex skills, as I will need to use some average function to create the average opponent rating, and the initial file is all text.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stringr)
library(tidyr)
chess <- 'https://raw.githubusercontent.com/cocodono/Project-1-Data-607/main/Chess%20Players'
chess <- readLines(chess)
chess <- data.frame(chess)
chess <- data.frame(chess[chess != '-----------------------------------------------------------------------------------------'])
row_odd <- seq_len(nrow(chess)) %% 2
odd_chess <- data.frame(chess[row_odd == 1,])[-1,]
even_chess <- data.frame(chess[row_odd == 0,])[-1,]
odd_cols <- c('Player_Number','Player_Name','Total','Round_1_opponent','Round_2_opponent','Round_3_opponent','Round_4_opponent','Round_5_opponent','Round_6_opponent','Round_7_opponent')
separated_odd <- odd_chess %>%
as.data.frame() %>%
separate(1, into = odd_cols, sep = "\\|")
## Warning: Expected 10 pieces. Additional pieces discarded in 64 rows [1, 2, 3, 4, 5, 6,
## 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
row_id <- c(1:nrow(separated_odd))
separated_odd$row_id <- row_id
even_cols <- c('State','USCF_ID / Rtg (Pre->Post)','N','Round_1_result','Round_2_result','Round_3_result','Round_4_result','Round_5_result','Round_6_result','Round_7_result')
separated_even <- even_chess %>%
as.data.frame() %>%
separate(1, into = even_cols, sep = "\\|")
## Warning: Expected 10 pieces. Additional pieces discarded in 64 rows [1, 2, 3, 4, 5, 6,
## 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
row_id <- c(1:nrow(separated_even))
separated_even$row_id <- row_id
full_chess <- merge(separated_odd, separated_even, by = 'row_id') %>%
subset(select = -c(row_id)) %>%
separate('USCF_ID / Rtg (Pre->Post)', c('USCF_ID', 'Rtg_Pre_Post'), '\\/') %>%
separate('Rtg_Pre_Post', c('pre_rating','post_rating'), '->')
full_chess$pre_rating <- strtoi(str_extract(full_chess$pre_rating, '[0-9]+'))
full_chess$post_rating <- strtoi(str_extract(full_chess$post_rating, '[0-9]+'))
# creative solution: switch the chess player number with the chess player's pre_rating!
for (item in colnames(full_chess)[grepl("opponent", colnames(full_chess))]) {
full_chess[[item]] <- full_chess$pre_rating[strtoi(str_extract(full_chess[[item]], '[0-9]+'))]
}
full_chess$avg_opp_rate <- NA
for (i in 1:nrow(full_chess)) {
full_chess$avg_opp_rate[i] <- round(rowMeans(full_chess[i,grep('Round_1_opponent',colnames(full_chess)):grep("Round_7_opponent",colnames(full_chess))], na.rm=TRUE))
}
filtered_chess <- full_chess %>%
select(Player_Name, State, Total, pre_rating, avg_opp_rate)
write.csv(filtered_chess, 'chess_stats.csv')
I was correct in my assessment that Regex would be indispensable in this project. I was also correct in my assertion that I would have to think of creative solutions due to the two-line-per-player observation format (in my case, I split the data set into two and then merged the two back together, resulting in one row per player). To create the average opponent rating, I decided the most efficient option was to replace each chess player’s number with their pre-rating, so I could easily take the mean across the rows. I would be interested to see a better solution to the even and odd rows problem present due to the two-rows-per-player format of the original file. My method worked, and I feel there may be a better solution.