Introduction

Given a text file with the results of a chess tournament, I had to create a R Markdown file that generates the .CSV file and it had to have the players name, player’s state, total number of points, player’s pre-Rating, and average pre chess rating of opponents. For the text file, I was able to imported into R markdown and transform it into a data frame in which I used regex and other functions to remove items would unnecessary or create nulls for my analysis.

Data:https://raw.githubusercontent.com/Andreina-A/Project1_DAta607/refs/heads/main/tournamentinfo.txt

Loaded needed packages

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

##Loading data

The data set provided in a text form was uploaded into github in order to obtain the raw data to work with in R. I used read.delim since the data is separated by delimiter of “|”. I also displayed the head of the data set.

##                                                                                          V1
## 1 -----------------------------------------------------------------------------------------
## 2                                                                                     Pair 
## 3                                                                                     Num  
## 4 -----------------------------------------------------------------------------------------
## 5                                                                                        1 
## 6                                                                                       ON 
##                                  V2    V3    V4    V5    V6    V7    V8    V9
## 1                                                                            
## 2  Player Name                      Total Round Round Round Round Round Round
## 3  USCF ID / Rtg (Pre->Post)         Pts    1     2     3     4     5     6  
## 4                                                                            
## 5  GARY HUA                         6.0   W  39 W  21 W  18 W  14 W   7 D  12
## 6  15445895 / R: 1794   ->1817      N:2   W     B     W     B     W     B    
##     V10 V11
## 1        NA
## 2 Round  NA
## 3   7    NA
## 4        NA
## 5 D   4  NA
## 6 W      NA

Created a data frame for the data set and removed the dashes that divided the rows by using the negative sequence function.

DF_corrected<-data.frame(DF[-seq(1,nrow(DF),3),], "") 

Combined rows together by locating the odds and even rows and afterwards removed the empty columns created from combining the rows.

Chess_DF<-data.frame(cbind(DF_corrected[(seq(nrow(DF_corrected))%%2)==1,],#Odd rows
                                DF_corrected[(seq(nrow(DF_corrected))%% 2)==0,]))#Even row
Chess_DF$X..<-NULL
Chess_DF$X...1<-NULL

head(Chess_DF)
##        V1                                V2    V3    V4    V5    V6    V7    V8
## 2   Pair   Player Name                      Total Round Round Round Round Round
## 5      1   GARY HUA                         6.0   W  39 W  21 W  18 W  14 W   7
## 8      2   DAKSHESH DARURI                  6.0   W  63 W  58 L   4 W  17 W  16
## 11     3   ADITYA BAJAJ                     6.0   L   8 W  61 W  25 W  21 W  11
## 14     4   PATRICK H SCHILLING              5.5   W  23 D  28 W   2 W  26 D   5
## 17     5   HANSHI ZUO                       5.5   W  45 W  37 D  12 D  13 D   4
##       V9   V10 V11   V1.1                              V2.1  V3.1  V4.1  V5.1
## 2  Round Round  NA  Num    USCF ID / Rtg (Pre->Post)         Pts    1     2  
## 5  D  12 D   4  NA    ON   15445895 / R: 1794   ->1817      N:2   W     B    
## 8  W  20 W   7  NA    MI   14598900 / R: 1553   ->1663      N:2   B     W    
## 11 W  13 W  12  NA    MI   14959604 / R: 1384   ->1640      N:2   W     B    
## 14 W  19 D   1  NA    MI   12616049 / R: 1716   ->1744      N:2   W     B    
## 17 W  14 W  17  NA    MI   14601533 / R: 1655   ->1690      N:2   B     W    
##     V6.1  V7.1  V8.1  V9.1 V10.1 V11.1
## 2    3     4     5     6     7      NA
## 5  W     B     W     B     W        NA
## 8  B     W     B     W     B        NA
## 11 W     B     W     B     W        NA
## 14 W     B     W     B     B        NA
## 17 B     W     B     W     B        NA

#Created a data frame of the players, started by renaming the columns, I only renamed the columns needed such as the player’s name, player’s state, total score, rounds, and the player’s pre-Rating. I started by remove some columns that were unnecessary.

Chess_players<-Chess_DF[,-c(11,14:22)]

Chess_players<-Chess_players%>%rename("Player_number"=V1, "Player_name"=V2,"Total_score"=V3, "Round1"=V4, "Round2"=V5,"Round3"=V6,"Round4"=V7,"Round5"=V8,"Round6"=V9,"Round7"=V10,"Players_State"=V1.1,"Pre_rating"=V2.1)#Renamed the columns

#Removed the row with the old column titles that came with the data.

Chess_players<-Chess_players[-c(1),]

#Fixing player ratings

I fixed the players pre rating to display only the integers of the pre rating. I started by using the string remove all function with regular expressions(regex) to removing every item before the colon (which is before the pre-rating value) and every item after the dash, and only the pre rating values should be left behind. Some values add P with a score afterward, which I removed as well since it is not needed for our calculations. Lastly, I made the values numerical to use for calculations.

Chess_players$Pre_rating<-str_remove_all(Chess_players$Pre_rating, ".*:")#remove all items before the colon.
Chess_players$Pre_rating<-str_remove_all(Chess_players$Pre_rating, "-.*")#remove all items after the dash.
Chess_players$Pre_rating<-str_remove_all(Chess_players$Pre_rating, "P.*")#removed all items after the P
Chess_players$Pre_rating<-as.numeric(Chess_players$Pre_rating) #turned the values numerical for calculations.

#Fixing rounds

Using regex again I removed all letters that came before the numbers in the round columns and I also turned the values into integers to use in the loop I’ll created, I also turned the player number into an integer to use for the loop.

Chess_players$Round1<-as.integer(gsub("^\\D+","",Chess_players$Round1))
Chess_players$Round2<-as.integer(gsub("^\\D+","",Chess_players$Round2))
Chess_players$Round3<-as.integer(gsub("^\\D+","",Chess_players$Round3))
Chess_players$Round4<-as.integer(gsub("^\\D+","",Chess_players$Round4))
Chess_players$Round5<-as.integer(gsub("^\\D+","",Chess_players$Round5))
Chess_players$Round6<-as.integer(gsub("^\\D+","",Chess_players$Round6))
Chess_players$Round7<-as.integer(gsub("^\\D+","",Chess_players$Round7))
Chess_players$Player_number<-as.integer(as.character(Chess_players$Player_number))
head(Chess_players)
##    Player_number                       Player_name Total_score Round1 Round2
## 5              1  GARY HUA                               6.0       39     21
## 8              2  DAKSHESH DARURI                        6.0       63     58
## 11             3  ADITYA BAJAJ                           6.0        8     61
## 14             4  PATRICK H SCHILLING                    5.5       23     28
## 17             5  HANSHI ZUO                             5.5       45     37
## 20             6  HANSEN SONG                            5.0       34     29
##    Round3 Round4 Round5 Round6 Round7 Players_State Pre_rating
## 5      18     14      7     12      4           ON        1794
## 8       4     17     16     20      7           MI        1553
## 11     25     21     11     13     12           MI        1384
## 14      2     26      5     19      1           MI        1716
## 17     12     13      4     14     17           MI        1655
## 20     11     35     10     27     21           OH        1686

#Created column for average Pre Chess Rating of Opponents

Create a empty column to add the values for average pre chess rating of opponents.

Chess_players$Average_PR_Opp=NA

Created a loop to calculate the average of the players, by taking the value of each round to find the corresponding opponents’ player number to obtain their pre rating. Once the pre rating of the opponent is found it is added together and divided by the number of rounds played, this gave me the average of the opponents’ pre ratings.

for (i in 1:nrow(Chess_players)) { 
  OppR=c()#created an empty vector to loop through opponent’s pre rating
  #Calculate the average of the pre-rating with only the pre rating's of the opponents that played with the player in corresponding rounds.
  for (s in c(4:10)){ #will loop through to obtain values for the vector.
    if (is.na(Chess_players[1,s])==FALSE) {
      OppR=c(OppR, Chess_players$Pre_rating[Chess_players$Player_number==Chess_players[i,s]])
    }
  }
  Chess_players[i,"Average_PR_Opp"]<-round(mean(OppR), digits = 0)#to the Average PR Opp column add the values appended values into the OppR vector using the loop created and round the value to have zero decimal points
}
head(Chess_players)
##    Player_number                       Player_name Total_score Round1 Round2
## 5              1  GARY HUA                               6.0       39     21
## 8              2  DAKSHESH DARURI                        6.0       63     58
## 11             3  ADITYA BAJAJ                           6.0        8     61
## 14             4  PATRICK H SCHILLING                    5.5       23     28
## 17             5  HANSHI ZUO                             5.5       45     37
## 20             6  HANSEN SONG                            5.0       34     29
##    Round3 Round4 Round5 Round6 Round7 Players_State Pre_rating Average_PR_Opp
## 5      18     14      7     12      4           ON        1794           1605
## 8       4     17     16     20      7           MI        1553           1469
## 11     25     21     11     13     12           MI        1384           1564
## 14      2     26      5     19      1           MI        1716           1574
## 17     12     13      4     14     17           MI        1655           1501
## 20     11     35     10     27     21           OH        1686           1519

#Final step for the data frame

Removed columns no longer needed.

Chess_players[,c(1,4:10)]<-NULL
head(Chess_players)
##                          Player_name Total_score Players_State Pre_rating
## 5   GARY HUA                               6.0             ON        1794
## 8   DAKSHESH DARURI                        6.0             MI        1553
## 11  ADITYA BAJAJ                           6.0             MI        1384
## 14  PATRICK H SCHILLING                    5.5             MI        1716
## 17  HANSHI ZUO                             5.5             MI        1655
## 20  HANSEN SONG                            5.0             OH        1686
##    Average_PR_Opp
## 5            1605
## 8            1469
## 11           1564
## 14           1574
## 17           1501
## 20           1519

#Reorganized columns to display Name of the player, state, Total score, their pre rating, and their opponents pre rating avergae.

Chess_players<-Chess_players[,c(1,3,2,4,5)]
head(Chess_players)
##                          Player_name Players_State Total_score Pre_rating
## 5   GARY HUA                                   ON        6.0         1794
## 8   DAKSHESH DARURI                            MI        6.0         1553
## 11  ADITYA BAJAJ                               MI        6.0         1384
## 14  PATRICK H SCHILLING                        MI        5.5         1716
## 17  HANSHI ZUO                                 MI        5.5         1655
## 20  HANSEN SONG                                OH        5.0         1686
##    Average_PR_Opp
## 5            1605
## 8            1469
## 11           1564
## 14           1574
## 17           1501
## 20           1519

#Make data frame as CSV file and test

write_csv(Chess_players,file="Chess_Players_Pre_ratings.csv")
Test<-read_csv("Chess_Players_Pre_ratings.csv", show_col_types = FALSE)
head(Test)
## # A tibble: 6 × 5
##   Player_name         Players_State Total_score Pre_rating Average_PR_Opp
##   <chr>               <chr>               <dbl>      <dbl>          <dbl>
## 1 GARY HUA            ON                    6         1794           1605
## 2 DAKSHESH DARURI     MI                    6         1553           1469
## 3 ADITYA BAJAJ        MI                    6         1384           1564
## 4 PATRICK H SCHILLING MI                    5.5       1716           1574
## 5 HANSHI ZUO          MI                    5.5       1655           1501
## 6 HANSEN SONG         OH                    5         1686           1519
Test<-Test[order(Test$Average_PR_Opp, decreasing= TRUE),]
head(Test)
## # A tibble: 6 × 5
##   Player_name                Players_State Total_score Pre_rating Average_PR_Opp
##   <chr>                      <chr>               <dbl>      <dbl>          <dbl>
## 1 GARY HUA                   ON                    6         1794           1605
## 2 PATRICK H SCHILLING        MI                    5.5       1716           1574
## 3 ADITYA BAJAJ               MI                    6         1384           1564
## 4 ANVIT RAO                  MI                    5         1365           1554
## 5 STEFANO LEE                ON                    5         1411           1523
## 6 SOFIA ADINA STANESCU-BELLU MI                    3.5       1507           1522

Conclusion

The text file was successful created into a CSV file, by transforming and tdying the data. Using functions R to remove anomalies allowed me to perform an analyze on the average of pre ratings on opponents from the chess tournament results. Based on the Average of pre rating for opponents Gary Hua had the most average rate for opponents, meaning he won against players with high pre rating, meaning he might be a tough chess player to beat.