This project will use the following packages:
library(readr)
library(stringr)
library(dplyr)
library(tidyr)
library(tidyverse)
This file will be available on my GitHub Page.
This project will primarily focus on manipulating a dataset from a
.txt file named tournamentinfo.txt. This
rmd file will create a .csv file after
analyzing the data within the file.
Initially we will extract the following data from the
.txt file to prepare to analyze the data before exporting
it into a .csv file.
# Now use this function to load the data
tournamentData <- suppressWarnings(readLines("https://raw.githubusercontent.com/spacerome/Data607_Project1/refs/heads/main/tournamentinfo.txt"))
# Preview the data after removing the first four lines
head(tournamentData)
## [1] "-----------------------------------------------------------------------------------------"
## [2] " Pair | Player Name |Total|Round|Round|Round|Round|Round|Round|Round| "
## [3] " Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 | "
## [4] "-----------------------------------------------------------------------------------------"
## [5] " 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|"
## [6] " ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |"
The following code blocks will be using the following data from
tournamentData and converting it into two matrices for
manipulation, d1 will consist of the names and
d2 will consist of the other values which we will use later
on.
tdm <- matrix(unlist(tournamentData), byrow = TRUE)
d1 <- tdm[seq(5, length(tdm),3)]
head(d1)
## [1] " 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|"
## [2] " 2 | DAKSHESH DARURI |6.0 |W 63|W 58|L 4|W 17|W 16|W 20|W 7|"
## [3] " 3 | ADITYA BAJAJ |6.0 |L 8|W 61|W 25|W 21|W 11|W 13|W 12|"
## [4] " 4 | PATRICK H SCHILLING |5.5 |W 23|D 28|W 2|W 26|D 5|W 19|D 1|"
## [5] " 5 | HANSHI ZUO |5.5 |W 45|W 37|D 12|D 13|D 4|W 14|W 17|"
## [6] " 6 | HANSEN SONG |5.0 |W 34|D 29|L 11|W 35|D 10|W 27|W 21|"
d2 <- tdm[seq(6, length(tdm),3)]
head(d2)
## [1] " ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |"
## [2] " MI | 14598900 / R: 1553 ->1663 |N:2 |B |W |B |W |B |W |B |"
## [3] " MI | 14959604 / R: 1384 ->1640 |N:2 |W |B |W |B |W |B |W |"
## [4] " MI | 12616049 / R: 1716 ->1744 |N:2 |W |B |W |B |W |B |B |"
## [5] " MI | 14601533 / R: 1655 ->1690 |N:2 |B |W |B |W |B |W |B |"
## [6] " OH | 15055204 / R: 1686 ->1687 |N:3 |W |B |W |B |B |W |B |"
The following code block will modify the following strings for us to
utilize the following variables playerName for player name,
playerState for the state of the player,
totalPoints for the total points of the player,
preRating for the pre-rating, and numRounds
for the number of rounds. This will be utilized to help us initialize
the data to calculate the Average rating of the player’s opponents. The
following matrices d1 and d2 will be utilized
here for the values for each variable such as playerName,
playerState, totalPoints,
preRating, and numRounds.
playerNum <- as.numeric(str_extract(d1, '\\d+'))
pName <- str_extract(d1, '[A-z].{1,32}')
playerName <- str_trim(str_extract(pName, '.+\\s{2,}'))
playerState <- str_extract(d2, '[A-Z]{2}')
totalPoints <- as.numeric(str_extract(d1, '\\d+\\.\\d'))
preRatingraw <- str_extract(d2,'R:.{8,}-')
preRating <- as.numeric(str_extract(preRatingraw, '\\d{1,4}'))
numRoundsraw <- suppressWarnings(str_extract_all(d1,'[A-Z]\\s{2,}\\d+'))
numRounds <- suppressWarnings(str_extract_all(numRoundsraw, '\\d+'))
The following code block here will calculate the average chess
opponent ratings for each player by utilizing a for loop where it
calculates the mean value for each player. The value will be stored in
avg_chess_opp_rating where i is the location
for each player.
avg_chess_opp_rating <- c()
for(i in c(1:length(numRounds))){
avg_chess_opp_rating[i] <- round(mean(preRating[as.numeric(numRounds[[i]])]),0)
}
head(avg_chess_opp_rating)
## [1] 1605 1469 1564 1574 1501 1519
After getting the ratings, the following values are then exported
into modifiedtournamentData to be prepared to be exported
into a csv file. The following values will be stored into a data frame:
playerName, playerState,
totalPoints, preRating, and
avg_chess_opp_rating.
modifiedtournamentData <- data.frame(playerName, playerState, totalPoints, preRating, avg_chess_opp_rating)
head(modifiedtournamentData)
## playerName playerState totalPoints preRating avg_chess_opp_rating
## 1 GARY HUA ON 6.0 1794 1605
## 2 DAKSHESH DARURI MI 6.0 1553 1469
## 3 ADITYA BAJAJ MI 6.0 1384 1564
## 4 PATRICK H SCHILLING MI 5.5 1716 1574
## 5 HANSHI ZUO MI 5.5 1655 1501
## 6 HANSEN SONG OH 5.0 1686 1519
Lastly, the code block will export the values into a
.csv file named new_tournament_info.csv.
write.csv(modifiedtournamentData, file = "new_tournament_info.csv")
Overall, this project was not that bad, I did attempt utilizing a function, which kept failing and kept outputing its values incorrectly after the first part, so I made it simpler. To make the function work, I will continue to troubleshoot it to make it perform better, and it will be included in my GitHub Page.