This project involves transforming a semi-structured chess tournament text file into a clean, tabular CSV format using R Markdown.
To fulfill this assignment, I have designed a step-by-step workflow in R to parse the semi-structured tournamentinfo.txt file.
I first read the raw text and removed the decorative dashed lines that act as separators. I also strip out the initial header rows to isolate the player data.
I observe that each player’s data is split across two rows. I use indexing to separate these into two distinct vectors: one for primary data (Name and Points) and one for secondary data (State and Rating).
I use Regular Expressions to target specific fields:
Name: Characters between the first and second pipe symbols.
State: The two uppercase letters starting the second row.
Total Points: The numeric value in the “Total” column.
Pre-Rating: The digits immediately following “R:”.
I extract the numeric IDs of every opponent a player faced. I then use these IDs to look up their corresponding pre-ratings and calculate the mean for each player.
I read the file and remove the non-data elements like dashed lines and headers.
# Read the provided text file
raw_txt <- readLines("tournamentinfo.txt")
# Remove dashed separator lines
clean_txt <- raw_txt[!str_detect(raw_txt, "^-+$")]
# Remove the header rows
clean_txt <- clean_txt[-(1:2)]
Since each player has two rows of data, I separate them into two vectors.
# Line 1 contains Name and Points
line1 <- clean_txt[seq(1, length(clean_txt), 2)]
# Line 2 contains State and Pre-Rating
line2 <- clean_txt[seq(2, length(clean_txt), 2)]
I extract the five required data points for my final CSV.
# Extract Name, State, and Points
player_name <- str_trim(str_extract(line1, "(?<=\\d\\s\\|\\s)[^|]+"))
player_state <- str_extract(line2, "[A-Z]{2}")
total_pts <- as.numeric(str_extract(line1, "\\d+\\.\\d"))
# Extract Pre-Rating (handling the 'R:' prefix and potential provisional 'P' ratings)
pre_rating <- as.numeric(str_extract(str_extract(line2, "R:\\s*\\d+"), "\\d+"))
# Extract Opponent IDs for the average calculation
# I look for the ID numbers following the results W, L, or D
opp_ids <- str_extract_all(line1, "(W|L|D)\\s+(\\d+)") %>%
map(~ as.numeric(str_extract(.x, "\\d+")))
I look up each opponent’s rating and calculate the mean for every player.
avg_opp_rating <- sapply(opp_ids, function(ids) {
round(mean(pre_rating[ids], na.rm = TRUE))
})
I combine my results and generate the required file
chess_df <- data.frame(
Name = player_name,
State = player_state,
Total_Points = total_pts,
Pre_Rating = pre_rating,
Avg_Opp_Rating = avg_opp_rating
)
# Export to CSV for submission
write.csv(chess_df, "chess_tournament_results.csv", row.names = FALSE)
# Display the first few rows for validation
knitr::kable(head(chess_df))
| Name | State | Total_Points | Pre_Rating | Avg_Opp_Rating |
|---|---|---|---|---|
| GARY HUA | ON | 6.0 | 1794 | 1605 |
| DAKSHESH DARURI | MI | 6.0 | 1553 | 1469 |
| ADITYA BAJAJ | MI | 6.0 | 1384 | 1564 |
| PATRICK H SCHILLING | MI | 5.5 | 1716 | 1574 |
| HANSHI ZUO | MI | 5.5 | 1655 | 1501 |
| HANSEN SONG | OH | 5.0 | 1686 | 1519 |