Introduction

For our project we are given a text file that contains a chart of chess player’s stats. Unlike the other files where this information is separated by commas, the charts is separated by two symbols “|” & “_“.

Reading the text file

First, I will use the read.delim2() function to read the text file. The file does not have the structure to say the argument that the file has headers, so it is false for now.

library(tidyr)
library(dplyr)
chess.file <- read.delim2("Chess_stats.txt", header = FALSE, sep = "\n")
head(chess.file)
##                                                                                           V1
## 1  -----------------------------------------------------------------------------------------
## 2  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 
## 3  Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## 4  -----------------------------------------------------------------------------------------
## 5      1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## 6     ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |

Setting up the table structure

Now, lets remove all these special characters from the upload and create our data table. First lets remove all the rows with row separator of “—…”. For this step, I can see all the separator rows are by threes. So, lets remove all rows in this sequence.

chess.file<-data.frame(A=chess.file[-seq(1,196,by=3),])
head(chess.file)
##                                                                                            A
## 1  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 
## 2  Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## 3      1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## 4     ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |
## 5      2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|
## 6     MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |

We can see that the player’s stats are placed in two rows. For simplicity, I can separate the rows into their own tables. Now, we can perform the dplyr separate function to divide the rows back into their columns.

chess.A<-data.frame(A=chess.file[-seq(2,129,by=2),])
chess.B<-data.frame(B=chess.file[-seq(1,129,by=2),])
chess.A<-separate(chess.A,col=A,sep = "[|]",into = c("state","player","Total_Points" ,"R1","R2","R3","R4","R5","R6","R7"))
chess.B<-separate(chess.B,col=B,sep = "[|]",into = c("state","player","R1","R2","R3","R4","R5","R6","R7"))
head(chess.A)
##    state                            player Total_Points    R1    R2    R3    R4
## 1  Pair   Player Name                             Total Round Round Round Round
## 2     1   GARY HUA                                6.0   W  39 W  21 W  18 W  14
## 3     2   DAKSHESH DARURI                         6.0   W  63 W  58 L   4 W  17
## 4     3   ADITYA BAJAJ                            6.0   L   8 W  61 W  25 W  21
## 5     4   PATRICK H SCHILLING                     5.5   W  23 D  28 W   2 W  26
## 6     5   HANSHI ZUO                              5.5   W  45 W  37 D  12 D  13
##      R5    R6    R7
## 1 Round Round Round
## 2 W   7 D  12 D   4
## 3 W  16 W  20 W   7
## 4 W  11 W  13 W  12
## 5 D   5 W  19 D   1
## 6 D   4 W  14 W  17
head(chess.B)
##    state                            player    R1    R2    R3    R4    R5    R6
## 1  Num    USCF ID / Rtg (Pre->Post)         Pts    1     2     3     4     5  
## 2    ON   15445895 / R: 1794   ->1817      N:2   W     B     W     B     W    
## 3    MI   14598900 / R: 1553   ->1663      N:2   B     W     B     W     B    
## 4    MI   14959604 / R: 1384   ->1640      N:2   W     B     W     B     W    
## 5    MI   12616049 / R: 1716   ->1744      N:2   W     B     W     B     W    
## 6    MI   14601533 / R: 1655   ->1690      N:2   B     W     B     W     B    
##      R7
## 1   6  
## 2 B    
## 3 W    
## 4 B    
## 5 B    
## 6 W

Our tables are close to normal! Now, I will get rid of the first row in both tables which is just the column names. Then, I am only keeping information needed for our .csv file.

chess.A<-chess.A[-1,]
chess.A<-chess.A[-65,]
chess.B<-chess.B[-1,1:2]

Tidying our tables and values

For this section, I will examined the information needed for both tables. In table A, I will need the player’s opponent number for a later function. So, I erased the Letters from the rounds’ information and change the column types from string to integers. For table b, there another separation for the player’s tag, pre ranking, and post ranking.

chess.A$R1 <- sub("[A-Z]", "", chess.A$R1)
chess.A$R2 <- sub("[A-Z]", "", chess.A$R2)
chess.A$R3 <- sub("[A-Z]", "", chess.A$R3)
chess.A$R4 <- sub("[A-Z]", "", chess.A$R4)
chess.A$R5 <- sub("[A-Z]", "", chess.A$R5)
chess.A$R6 <- sub("[A-Z]", "", chess.A$R6)
chess.A$R7 <- sub("[A-Z]", "", chess.A$R7)
chess.A$R1 <- as.numeric(chess.A$R1)
chess.A$R2 <- as.numeric(chess.A$R2)
chess.A$R3 <- as.numeric(chess.A$R3)
chess.A$R4 <- as.numeric(chess.A$R4)
chess.A$R5 <- as.numeric(chess.A$R5)
chess.A$R6 <- as.numeric(chess.A$R6)
chess.A$R7 <- as.numeric(chess.A$R7)
chess.B <-
  separate(
    chess.B,
    col = player,
    sep = "[/]",
    into = c("USCD_ID", "Ratings")
  )
chess.B <-
  separate(chess.B,
           col = Ratings,
           sep = "[->]",
           into = c("Pre", "Post"))
chess.B$Pre <- sub("R[:]", "", chess.B$Pre)

In order to get rid of the provisional ratings, There is a for loop that iterates through the column. If a PR is found, it finds the index of the PR and trims the strong down to the player’s Pre rating. Then, the pre rating is reclassify as a integer for the compare function below.

for(a in 1:length(chess.B$Pre)) {
  if (grepl("P", chess.B$Pre[a], fixed = TRUE) == TRUE) {
    c <- unlist(gregexpr("P", chess.B$Pre[a]))[1]
    c <- as.numeric(c)
    chess.B$Pre[a] <- strtrim(chess.B$Pre[a], c - 1)
  }
}
chess.B$Pre <- as.numeric(chess.B$Pre)
chess.B <- chess.B[, 1:3]

New calculations

For our last column in the outputted .csv file is the average of the opponents’ pre chess ratings. For the calculation, we must gather all the players’ ore rating scores and take the average. Let’s create a function that takes the round columns and return a list of the calculated averages.

preAverage <- function() {
  t <- 0
  r <- c(0)
  for (b in 1:64) {
    opps <-
      c(
        chess.A$R1[b],
        chess.A$R2[b],
        chess.A$R3[b],
        chess.A$R4[b],
        chess.A$R5[b],
        chess.A$R6[b],
        chess.A$R7[b]
      )
    opps<-na.omit(opps)
    for (c in 1:length(opps)) {
      t <- chess.B$Pre[opps[c]] + t
    }
    r <- append(r, floor(t / length(opps)))
    t <- 0
    
  }
  return(r)
}

pre.Avg<-preAverage()
pre.Avg<-pre.Avg[pre.Avg!=0]

Finishing touches

We have all the needed information to create a the .csv file. I will create a data frame to form all the data to one place.

result<-data.frame(Player_name=chess.A$player,State=chess.B$state,Total_points=chess.A$Total_Points,Pre_Rating=chess.B$Pre,Opp_Pre_Rating_Avg=pre.Avg)
head(result)
##                         Player_name  State Total_points Pre_Rating
## 1  GARY HUA                            ON         6.0         1794
## 2  DAKSHESH DARURI                     MI         6.0         1553
## 3  ADITYA BAJAJ                        MI         6.0         1384
## 4  PATRICK H SCHILLING                 MI         5.5         1716
## 5  HANSHI ZUO                          MI         5.5         1655
## 6  HANSEN SONG                         OH         5.0         1686
##   Opp_Pre_Rating_Avg
## 1               1605
## 2               1469
## 3               1563
## 4               1573
## 5               1500
## 6               1518

Final

All I have to do now is write our finalized data frame into its .csv!

write.csv(result,"Chess_Results.csv")