This project will use the following packages:
library(readr)
library(stringr)
library(dplyr)
library(tidyr)
library(tidyverse)
This file will be available on my GitHub Page.
This project will primarily focus on manipulating a dataset from a
.txt
file named tournamentinfo.txt
. This
rmd
file will create a .csv
file after
analyzing the data within the file.
Initially we will extract the following data from the
.txt
file to prepare to analyze the data before exporting
it into a .csv
file.
# Now use this function to load the data
tournamentData <- suppressWarnings(readLines("https://raw.githubusercontent.com/spacerome/Data607_Project1/refs/heads/main/tournamentinfo.txt"))
# Preview the data after removing the first four lines
head(tournamentData)
## [1] "-----------------------------------------------------------------------------------------"
## [2] " Pair | Player Name |Total|Round|Round|Round|Round|Round|Round|Round| "
## [3] " Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 | "
## [4] "-----------------------------------------------------------------------------------------"
## [5] " 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|"
## [6] " ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |"
The following code blocks will be using the following data from
tournamentData
and converting it into two matrices for
manipulation, d1
will consist of the names and
d2
will consist of the other values which we will use later
on.
tdm <- matrix(unlist(tournamentData), byrow = TRUE)
d1 <- tdm[seq(5, length(tdm),3)]
head(d1)
## [1] " 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|"
## [2] " 2 | DAKSHESH DARURI |6.0 |W 63|W 58|L 4|W 17|W 16|W 20|W 7|"
## [3] " 3 | ADITYA BAJAJ |6.0 |L 8|W 61|W 25|W 21|W 11|W 13|W 12|"
## [4] " 4 | PATRICK H SCHILLING |5.5 |W 23|D 28|W 2|W 26|D 5|W 19|D 1|"
## [5] " 5 | HANSHI ZUO |5.5 |W 45|W 37|D 12|D 13|D 4|W 14|W 17|"
## [6] " 6 | HANSEN SONG |5.0 |W 34|D 29|L 11|W 35|D 10|W 27|W 21|"
d2 <- tdm[seq(6, length(tdm),3)]
head(d2)
## [1] " ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |"
## [2] " MI | 14598900 / R: 1553 ->1663 |N:2 |B |W |B |W |B |W |B |"
## [3] " MI | 14959604 / R: 1384 ->1640 |N:2 |W |B |W |B |W |B |W |"
## [4] " MI | 12616049 / R: 1716 ->1744 |N:2 |W |B |W |B |W |B |B |"
## [5] " MI | 14601533 / R: 1655 ->1690 |N:2 |B |W |B |W |B |W |B |"
## [6] " OH | 15055204 / R: 1686 ->1687 |N:3 |W |B |W |B |B |W |B |"
The following code block will modify the following strings for us to
utilize the following variables playerName
for player name,
playerState
for the state of the player,
totalPoints
for the total points of the player,
preRating
for the pre-rating, and numRounds
for the number of rounds. This will be utilized to help us initialize
the data to calculate the Average rating of the player’s opponents. The
following matrices d1
and d2
will be utilized
here for the values for each variable such as playerName
,
playerState
, totalPoints
,
preRating
, and numRounds
.
playerNum <- as.numeric(str_extract(d1, '\\d+'))
pName <- str_extract(d1, '[A-z].{1,32}')
playerName <- str_trim(str_extract(pName, '.+\\s{2,}'))
playerState <- str_extract(d2, '[A-Z]{2}')
totalPoints <- as.numeric(str_extract(d1, '\\d+\\.\\d'))
preRatingraw <- str_extract(d2,'R:.{8,}-')
preRating <- as.numeric(str_extract(preRatingraw, '\\d{1,4}'))
numRoundsraw <- suppressWarnings(str_extract_all(d1,'[A-Z]\\s{2,}\\d+'))
numRounds <- suppressWarnings(str_extract_all(numRoundsraw, '\\d+'))
The following code block here will calculate the average chess
opponent ratings for each player by utilizing a for loop where it
calculates the mean value for each player. The value will be stored in
avg_chess_opp_rating
where i
is the location
for each player.
avg_chess_opp_rating <- c()
for(i in c(1:length(numRounds))){
avg_chess_opp_rating[i] <- round(mean(preRating[as.numeric(numRounds[[i]])]),0)
}
head(avg_chess_opp_rating)
## [1] 1605 1469 1564 1574 1501 1519
After getting the ratings, the following values are then exported
into modifiedtournamentData
to be prepared to be exported
into a csv file. The following values will be stored into a data frame:
playerName
, playerState
,
totalPoints
, preRating
, and
avg_chess_opp_rating
.
modifiedtournamentData <- data.frame(playerName, playerState, totalPoints, preRating, avg_chess_opp_rating)
head(modifiedtournamentData)
## playerName playerState totalPoints preRating avg_chess_opp_rating
## 1 GARY HUA ON 6.0 1794 1605
## 2 DAKSHESH DARURI MI 6.0 1553 1469
## 3 ADITYA BAJAJ MI 6.0 1384 1564
## 4 PATRICK H SCHILLING MI 5.5 1716 1574
## 5 HANSHI ZUO MI 5.5 1655 1501
## 6 HANSEN SONG OH 5.0 1686 1519
Lastly, the code block will export the values into a
.csv
file named new_tournament_info.csv
.
write.csv(modifiedtournamentData, file = "new_tournament_info.csv")
Overall, this project was not that bad, I did attempt utilizing a function, which kept failing and kept outputing its values incorrectly after the first part, so I made it simpler. To make the function work, I will continue to troubleshoot it to make it perform better, and it will be included in my GitHub Page.