For this project we’re given a text file with chess tournament results where the information has some structure. The idea is to create an R Markdown file that generates a .CSV file with the following information for all of the players:
Player’s Name Player’s State Total Number of Points Player’s Pre-Rating Average Pre Chess Rating of Opponents
For the first player, the information would be: Gary Hua, ON, 6.0, 1794, 1605
1605 was calculated by using the pre-tournament opponents’ ratings of 1436, 1563, 1600, 1610, 1649, 1663, 1716, and dividing by the total number of games played.
Most of this code is entirely replicable, only the final chunk of code needs to be modified to match your machine.
First we’ll load needed libraries.
library(stringr)
library(data.table)
library(knitr)
Second we’ll load data, in this case from github.
chesstour <- readLines("https://raw.githubusercontent.com/Lfirenzeg/msds607labs/refs/heads/main/tournamentinfo.txt")
## Warning in
## readLines("https://raw.githubusercontent.com/Lfirenzeg/msds607labs/refs/heads/main/tournamentinfo.txt"):
## incomplete final line found on
## 'https://raw.githubusercontent.com/Lfirenzeg/msds607labs/refs/heads/main/tournamentinfo.txt'
# View the first few lines to inspect the data
head(chesstour)
## [1] "-----------------------------------------------------------------------------------------"
## [2] " Pair | Player Name |Total|Round|Round|Round|Round|Round|Round|Round| "
## [3] " Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 | "
## [4] "-----------------------------------------------------------------------------------------"
## [5] " 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|"
## [6] " ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |"
However, the text file needs to be cleaned first. So the first 4 rows can be removed in order to have the data we need as the first row.
# Remove the header lines
chesstour <- chesstour[-c(1:4)]
# Inspect the data
head(chesstour)
## [1] " 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|"
## [2] " ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |"
## [3] "-----------------------------------------------------------------------------------------"
## [4] " 2 | DAKSHESH DARURI |6.0 |W 63|W 58|L 4|W 17|W 16|W 20|W 7|"
## [5] " MI | 14598900 / R: 1553 ->1663 |N:2 |B |W |B |W |B |W |B |"
## [6] "-----------------------------------------------------------------------------------------"
Then we’ll create the variables we’ll use
# Initialize Variables
num_players <- length(chesstour) / 3
player_number <- vector()
player_name <- vector()
total_points <- vector()
num_games_played <- vector()
opponents <- vector("list", num_players) # stores opponents as a list
state <- vector()
pre_tour_rating <- vector()
Since the data is structured in lines the idea is to break down the process of where the data needed can be found to fill each vector. Each players has 3 lines of raw data. More specifically, lines 1 and 2 for each player are useful and the third one is just dashes and can be jumped.
# Loop through records in chesstour 3 lines at a time
for (i in seq(1, length(chesstour), by = 3)) {
rawlinedata <- chesstour[i:(i + 1)] # Get the first 2 lines for each player
# Filling the data for player number and name
player_number <- c(player_number, as.numeric(str_extract(substr(rawlinedata[1], 3, 7), '\\d{1,2}'))) #This means to look for the string on line 1, between columns 3 and 7 for each iteration to find the player number (and return it as a number). This process is repeated for all vectors, just updating the location of the data.
player_name <- c(player_name, trimws(substr(rawlinedata[1], 9, 40)))
# Filling the data for total points and games played
total_points <- c(total_points, as.numeric(substr(rawlinedata[1], 42, 44)))
num_games_played <- c(num_games_played, length(unlist(str_extract_all(substr(rawlinedata[1], 44, nchar(rawlinedata[1])), "[WLD]")))) #In this case we are creating a list with str_extract_all() of all the WLD characters in between column 44 and the end of the string. Then its converted into a flat vector with unlist(), and then lenght() is used to count how many times any of WLD characters were found, effectively counting number of games played. This is a way to get around of missing data.
# Opponents
opponents[[i / 3 + 1]] <- as.numeric(unlist(str_extract_all(substr(rawlinedata[1], 45, nchar(rawlinedata[1])), "\\d{1,2}"))) #Like with num_games_played, we are extracting the numbers from line 1 for each player as a list, and then turning into a numeric vector.
# State and pre-tournament rating
state <- c(state, trimws(substr(rawlinedata[2], 3, 6)))
pre_tour_rating <- c(pre_tour_rating, as.numeric(unlist(str_extract_all(rawlinedata[2], "[:space:]\\d{3,4}"))[2]))
}
Once the variables are created we’ll arrange them in a table.
# Create the players data table
players <- data.table(
player_number = player_number,
player_name = player_name,
total_points = total_points,
num_games_played = num_games_played,
opponents = opponents,
state = state,
pre_tour_rating = pre_tour_rating,
opp_pretour_average = 1
)
# Now we'll calculate the average rating of each player's opponents
for (i in 1:nrow(players)) {
opponents_list <- as.numeric(players$opponents[[i]]) # Extract opponents for the current player
opponents_ratings <- players[player_number %in% opponents_list, pre_tour_rating] # Get opponents' ratings based on the list of opponents
# Calculate the average rating
avg_rating <- mean(opponents_ratings, na.rm = TRUE) #In case there are missing values we'll remove those from the avg.
# Update the average opponent pre-tournament rating, since it was loaded with value 1 before, and we'll round up so it only includes integers.
players[i, opp_pretour_average := round(avg_rating, 0)]
}
Let’s check the head of the table so far to make sure the data is looking the way we want it.
# Display the players table
head(players)
## player_number player_name total_points num_games_played
## <num> <char> <num> <int>
## 1: 1 GARY HUA 6.0 7
## 2: 2 DAKSHESH DARURI 6.0 7
## 3: 3 ADITYA BAJAJ 6.0 7
## 4: 4 PATRICK H SCHILLING 5.5 7
## 5: 5 HANSHI ZUO 5.5 7
## 6: 6 HANSEN SONG 5.0 7
## opponents state pre_tour_rating opp_pretour_average
## <list> <char> <num> <num>
## 1: 39,21,18,14, 7,12,... ON 1794 1605
## 2: 63,58, 4,17,16,20,... MI 1553 1469
## 3: 8,61,25,21,11,13,... MI 1384 1564
## 4: 23,28, 2,26, 5,19,... MI 1716 1574
## 5: 45,37,12,13, 4,14,... MI 1655 1501
## 6: 34,29,11,35,10,27,... OH 1686 1519
Before we create the csv file we’ll update the names for easier reading.
## Update the columns' names.
colnames(players) <- c("Player Number", "Player Name", "Total Number of Points", "Games Played", "Opponents", "State", "Player Pre-Rating", "Avg Pre Chess Rating of Opponents")
Check head of table once again to visualize name of columns.
## Display few rows of the summary results.
head(players)
## Player Number Player Name Total Number of Points Games Played
## <num> <char> <num> <int>
## 1: 1 GARY HUA 6.0 7
## 2: 2 DAKSHESH DARURI 6.0 7
## 3: 3 ADITYA BAJAJ 6.0 7
## 4: 4 PATRICK H SCHILLING 5.5 7
## 5: 5 HANSHI ZUO 5.5 7
## 6: 6 HANSEN SONG 5.0 7
## Opponents State Player Pre-Rating
## <list> <char> <num>
## 1: 39,21,18,14, 7,12,... ON 1794
## 2: 63,58, 4,17,16,20,... MI 1553
## 3: 8,61,25,21,11,13,... MI 1384
## 4: 23,28, 2,26, 5,19,... MI 1716
## 5: 45,37,12,13, 4,14,... MI 1655
## 6: 34,29,11,35,10,27,... OH 1686
## Avg Pre Chess Rating of Opponents
## <num>
## 1: 1605
## 2: 1469
## 3: 1564
## 4: 1574
## 5: 1501
## 6: 1519
If replicating this code in your machine please make sure to replace your directory to your desired location. Also, ensure the use of double backslash \ Instead of C:\Users\lucho\OneDrive\Documents\ChessSummary.csv you can replace it to whatever location, and name the file however you want it.
# Convert opponents list column to a comma-separated string
players[, Opponents := sapply(opponents, function(x) paste(x, collapse = ", "))]
#Before the CSV file can be created, the opponent list needs to be converted into a flat vector.
## Export the summary to a CSV file.
write.csv(players,"C:\\Users\\lucho\\OneDrive\\Documents\\ChessSummary.csv", row.names = FALSE)
If running this code multiple times, you might need to close the CSV file you created or it might not work properly.