This project will use the following packages:

library(readr)
library(stringr)
library(dplyr)
library(tidyr)
library(tidyverse)

General Overview

This file will be available on my GitHub Page.

This project will primarily focus on manipulating a dataset from a .txt file named tournamentinfo.txt. This rmd file will create a .csv file after analyzing the data within the file.

Chess Tournament Results Extraction

Initially we will extract the following data from the .txt file to prepare to analyze the data before exporting it into a .csv file.

# Now use this function to load the data
tournamentData <- suppressWarnings(readLines("https://raw.githubusercontent.com/spacerome/Data607_Project1/refs/heads/main/tournamentinfo.txt"))

# Preview the data after removing the first four lines
head(tournamentData)
## [1] "-----------------------------------------------------------------------------------------" 
## [2] " Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| "
## [3] " Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | "
## [4] "-----------------------------------------------------------------------------------------" 
## [5] "    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|" 
## [6] "   ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |"

Initializations

The following code blocks will be using the following data from tournamentData and converting it into two matrices for manipulation, d1 will consist of the names and d2 will consist of the other values which we will use later on.

tdm <- matrix(unlist(tournamentData), byrow = TRUE)

d1 <- tdm[seq(5, length(tdm),3)]

head(d1)
## [1] "    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|"
## [2] "    2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|"
## [3] "    3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12|"
## [4] "    4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1|"
## [5] "    5 | HANSHI ZUO                      |5.5  |W  45|W  37|D  12|D  13|D   4|W  14|W  17|"
## [6] "    6 | HANSEN SONG                     |5.0  |W  34|D  29|L  11|W  35|D  10|W  27|W  21|"
d2 <- tdm[seq(6, length(tdm),3)]

head(d2)
## [1] "   ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |"
## [2] "   MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |"
## [3] "   MI | 14959604 / R: 1384   ->1640     |N:2  |W    |B    |W    |B    |W    |B    |W    |"
## [4] "   MI | 12616049 / R: 1716   ->1744     |N:2  |W    |B    |W    |B    |W    |B    |B    |"
## [5] "   MI | 14601533 / R: 1655   ->1690     |N:2  |B    |W    |B    |W    |B    |W    |B    |"
## [6] "   OH | 15055204 / R: 1686   ->1687     |N:3  |W    |B    |W    |B    |B    |W    |B    |"

String Manipulation

The following code block will modify the following strings for us to utilize the following variables playerName for player name, playerState for the state of the player, totalPoints for the total points of the player, preRating for the pre-rating, and numRounds for the number of rounds. This will be utilized to help us initialize the data to calculate the Average rating of the player’s opponents. The following matrices d1 and d2 will be utilized here for the values for each variable such as playerName, playerState, totalPoints, preRating, and numRounds.

playerNum <- as.numeric(str_extract(d1, '\\d+'))
pName <- str_extract(d1, '[A-z].{1,32}')
playerName <- str_trim(str_extract(pName, '.+\\s{2,}'))
playerState <- str_extract(d2, '[A-Z]{2}')
totalPoints <- as.numeric(str_extract(d1, '\\d+\\.\\d'))
preRatingraw <- str_extract(d2,'R:.{8,}-')
preRating <- as.numeric(str_extract(preRatingraw, '\\d{1,4}'))
numRoundsraw <- suppressWarnings(str_extract_all(d1,'[A-Z]\\s{2,}\\d+'))
numRounds <- suppressWarnings(str_extract_all(numRoundsraw, '\\d+'))

Average Chess Opponent Ratings

The following code block here will calculate the average chess opponent ratings for each player by utilizing a for loop where it calculates the mean value for each player. The value will be stored in avg_chess_opp_rating where i is the location for each player.

avg_chess_opp_rating <- c()

for(i in c(1:length(numRounds))){
  avg_chess_opp_rating[i] <- round(mean(preRating[as.numeric(numRounds[[i]])]),0)
}

head(avg_chess_opp_rating)
## [1] 1605 1469 1564 1574 1501 1519

Modified Tournament Data

After getting the ratings, the following values are then exported into modifiedtournamentData to be prepared to be exported into a csv file. The following values will be stored into a data frame: playerName, playerState, totalPoints, preRating, and avg_chess_opp_rating.

modifiedtournamentData <- data.frame(playerName, playerState, totalPoints, preRating, avg_chess_opp_rating)

head(modifiedtournamentData)
##            playerName playerState totalPoints preRating avg_chess_opp_rating
## 1            GARY HUA          ON         6.0      1794                 1605
## 2     DAKSHESH DARURI          MI         6.0      1553                 1469
## 3        ADITYA BAJAJ          MI         6.0      1384                 1564
## 4 PATRICK H SCHILLING          MI         5.5      1716                 1574
## 5          HANSHI ZUO          MI         5.5      1655                 1501
## 6         HANSEN SONG          OH         5.0      1686                 1519

Export Data to CSV

Lastly, the code block will export the values into a .csv file named new_tournament_info.csv.

write.csv(modifiedtournamentData, file = "new_tournament_info.csv")

Conclusion

Overall, this project was not that bad, I did attempt utilizing a function, which kept failing and kept outputing its values incorrectly after the first part, so I made it simpler. To make the function work, I will continue to troubleshoot it to make it perform better, and it will be included in my GitHub Page.