Project-1.knit

Introduction’

A text file was given tat entailed a chess tournament results where the information has some structure. This R Markdown file shows a manipulation from the given text data into a workble dataframe that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players: Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents

Importing Text File

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(readr)
setwd("~/Desktop/Data Acquisition & Management")
data <- readLines("tournamentinfo.txt")

## Warning in readLines("tournamentinfo.txt"): incomplete final line found on
## 'tournamentinfo.txt'

head(data)

## [1] "-----------------------------------------------------------------------------------------" 
## [2] " Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| "
## [3] " Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | "
## [4] "-----------------------------------------------------------------------------------------" 
## [5] "    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|" 
## [6] "   ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |"

Extracting Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponent

The vectors created would describe each variable assigned for this project. The first and second rows of data were dropped to omit them from appearing in the vectors as they were the column names, where only the observations were included in the vectors. Regukar exrpessions were used to filter the targeted observations, unlisted, then transformed to their appropriate datatype (the default being character)

data <- data[-c(1,2)]
data <- data[-c(1,2)]
player_name <- unlist(str_extract_all(data, "(\\w+\\s){2,3}"))
player_num <- as.numeric(unlist(str_extract_all(data, "\\s{3,5}\\d{1,2}\\s")))
player_state <- unlist(str_extract_all(data, "\\s{2,3}[A-Z]{2}\\s{1}"))
total_points <- as.integer(unlist(str_extract_all(data, "\\d{1}\\.\\d{1}")))
pre_rating <- as.numeric(unlist(str_extract_all(data, "(?<=R:\\s{1,2})(\\d{3,4}\\s)|(\\d{3,4}(?=P\\d{1,2}\\s*-))")))

#preview
head(player_name)

## [1] "GARY HUA "            "DAKSHESH DARURI "     "ADITYA BAJAJ "       
## [4] "PATRICK H SCHILLING " "HANSHI ZUO "          "HANSEN SONG "

head(player_num)

## [1] 1 2 3 4 5 6

head(player_state)

## [1] "   ON " "   MI " "   MI " "   MI " "   MI " "   OH "

head(total_points)

## [1] 6 6 6 5 5 5

head(pre_rating)

## [1] 1794 1553 1384 1716 1655 1686

Creating the DataFrame

The dataframe was created using all the already created vectors prior to creating the Avg Pre Ratings of Opponents so that when the vector is created, the iterations within the list can then referred to the dataframe where all the variables are matched together.

chess_data <- data.frame(player_name, player_num, player_state, total_points, pre_rating)
head(chess_data)

##            player_name player_num player_state total_points pre_rating
## 1            GARY HUA           1          ON             6       1794
## 2     DAKSHESH DARURI           2          MI             6       1553
## 3        ADITYA BAJAJ           3          MI             6       1384
## 4 PATRICK H SCHILLING           4          MI             5       1716
## 5          HANSHI ZUO           5          MI             5       1655
## 6         HANSEN SONG           6          OH             5       1686

opponent_avg_prerating <- str_extract_all(data, "(?<=(W|L|D)\\s{2,3})(\\d{1,2})")
head(opponent_avg_prerating)

## [[1]]
## [1] "39" "21" "18" "14" "7"  "12" "4" 
## 
## [[2]]
## character(0)
## 
## [[3]]
## character(0)
## 
## [[4]]
## [1] "63" "58" "4"  "17" "16" "20" "7" 
## 
## [[5]]
## character(0)
## 
## [[6]]
## character(0)

Retrieving the AVG Pre Ratigng of Opponents

The following code connect the created vector to describe each player’s opponents to the chess data frame to retrieve the average pre rating of each player’s 7 opponents. The vector of each player’s 7 opponents numbers were first created to then use a for loop, matching each iteration within the vector to the player’s number in the created chess data frame to then retrieve their pre rating for it to then be averaged together with the other opponents each player played against

players_opponents <- str_extract_all(data, "(?<=(W|L|D)\\s{2,3})(\\d{1,2})")

opponent_avg_pre_rating <- numeric(length(players_opponents))
for (i in 1:length(players_opponents)) {
  match_indices <- match(players_opponents[[i]], chess_data$player_num)
  total_pre_rating <- 0
  count <- 0
  for (j in match_indices) {
    if (!is.na(j)) {
      pre_rating <- chess_data[j, ]$pre_rating
      total_pre_rating <- total_pre_rating + pre_rating
      count <- count + 1
    }
  }
  if (count > 0) {
    opponent_avg_pre_rating[i] <- total_pre_rating / count
  }
}

as.numeric(unlist(opponent_avg_pre_rating))

##   [1] 1605.286    0.000    0.000 1469.286    0.000    0.000 1563.571    0.000
##   [9]    0.000 1573.571    0.000    0.000 1500.857    0.000    0.000 1518.714
##  [17]    0.000    0.000 1372.143    0.000    0.000 1468.429    0.000    0.000
##  [25] 1523.143    0.000    0.000 1554.143    0.000    0.000 1467.571    0.000
##  [33]    0.000 1506.167    0.000    0.000 1497.857    0.000    0.000 1515.000
##  [41]    0.000    0.000 1483.857    0.000    0.000 1385.800    0.000    0.000
##  [49] 1498.571    0.000    0.000 1480.000    0.000    0.000 1426.286    0.000
##  [57]    0.000 1410.857    0.000    0.000 1470.429    0.000    0.000 1300.333
##  [65]    0.000    0.000 1213.857    0.000    0.000 1357.000    0.000    0.000
##  [73] 1363.286    0.000    0.000 1506.857    0.000    0.000 1221.667    0.000
##  [81]    0.000 1522.143    0.000    0.000 1313.500    0.000    0.000 1144.143
##  [89]    0.000    0.000 1259.857    0.000    0.000 1378.714    0.000    0.000
##  [97] 1276.857    0.000    0.000 1375.286    0.000    0.000 1149.714    0.000
## [105]    0.000 1388.167    0.000    0.000 1384.800    0.000    0.000 1539.167
## [113]    0.000    0.000 1429.571    0.000    0.000 1390.571    0.000    0.000
## [121] 1248.500    0.000    0.000 1149.857    0.000    0.000 1106.571    0.000
## [129]    0.000 1327.000    0.000    0.000 1152.000    0.000    0.000 1357.714
## [137]    0.000    0.000 1392.000    0.000    0.000 1355.800    0.000    0.000
## [145] 1285.800    0.000    0.000 1296.000    0.000    0.000 1356.143    0.000
## [153]    0.000 1494.571    0.000    0.000 1345.333    0.000    0.000 1206.167
## [161]    0.000    0.000 1406.000    0.000    0.000 1414.400    0.000    0.000
## [169] 1363.000    0.000    0.000 1391.000    0.000    0.000 1319.000    0.000
## [177]    0.000 1330.200    0.000    0.000 1327.286    0.000    0.000 1186.000
## [185]    0.000    0.000 1350.200    0.000    0.000 1263.000    0.000    0.000

opponent_avg_pre_rating <- opponent_avg_pre_rating[opponent_avg_pre_rating != 0.00]


#Adding the avg opponents pre rating into the data frame.
chess_data$opponent_avg_pre_rating <-opponent_avg_pre_rating
head(chess_data)

##            player_name player_num player_state total_points pre_rating
## 1            GARY HUA           1          ON             6       1794
## 2     DAKSHESH DARURI           2          MI             6       1553
## 3        ADITYA BAJAJ           3          MI             6       1384
## 4 PATRICK H SCHILLING           4          MI             5       1716
## 5          HANSHI ZUO           5          MI             5       1655
## 6         HANSEN SONG           6          OH             5       1686
##   opponent_avg_pre_rating
## 1                1605.286
## 2                1469.286
## 3                1563.571
## 4                1573.571
## 5                1500.857
## 6                1518.714

Converting to CSV

path <- getwd()
write.csv(chess_data, file.path(path, "chess_data.csv"))