For our project we are given a text file that contains a chart of chess player’s stats. Unlike the other files where this information is separated by commas, the charts is separated by two symbols “|” & “_“.
First, I will use the read.delim2() function to read the text file. The file does not have the structure to say the argument that the file has headers, so it is false for now.
library(tidyr)
library(dplyr)
chess.file <- read.delim2("Chess_stats.txt", header = FALSE, sep = "\n")
head(chess.file)
## V1
## 1 -----------------------------------------------------------------------------------------
## 2 Pair | Player Name |Total|Round|Round|Round|Round|Round|Round|Round|
## 3 Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
## 4 -----------------------------------------------------------------------------------------
## 5 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|
## 6 ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |
Now, lets remove all these special characters from the upload and create our data table. First lets remove all the rows with row separator of “—…”. For this step, I can see all the separator rows are by threes. So, lets remove all rows in this sequence.
chess.file<-data.frame(A=chess.file[-seq(1,196,by=3),])
head(chess.file)
## A
## 1 Pair | Player Name |Total|Round|Round|Round|Round|Round|Round|Round|
## 2 Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
## 3 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|
## 4 ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |
## 5 2 | DAKSHESH DARURI |6.0 |W 63|W 58|L 4|W 17|W 16|W 20|W 7|
## 6 MI | 14598900 / R: 1553 ->1663 |N:2 |B |W |B |W |B |W |B |
We can see that the player’s stats are placed in two rows. For simplicity, I can separate the rows into their own tables. Now, we can perform the dplyr separate function to divide the rows back into their columns.
chess.A<-data.frame(A=chess.file[-seq(2,129,by=2),])
chess.B<-data.frame(B=chess.file[-seq(1,129,by=2),])
chess.A<-separate(chess.A,col=A,sep = "[|]",into = c("state","player","Total_Points" ,"R1","R2","R3","R4","R5","R6","R7"))
chess.B<-separate(chess.B,col=B,sep = "[|]",into = c("state","player","R1","R2","R3","R4","R5","R6","R7"))
head(chess.A)
## state player Total_Points R1 R2 R3 R4
## 1 Pair Player Name Total Round Round Round Round
## 2 1 GARY HUA 6.0 W 39 W 21 W 18 W 14
## 3 2 DAKSHESH DARURI 6.0 W 63 W 58 L 4 W 17
## 4 3 ADITYA BAJAJ 6.0 L 8 W 61 W 25 W 21
## 5 4 PATRICK H SCHILLING 5.5 W 23 D 28 W 2 W 26
## 6 5 HANSHI ZUO 5.5 W 45 W 37 D 12 D 13
## R5 R6 R7
## 1 Round Round Round
## 2 W 7 D 12 D 4
## 3 W 16 W 20 W 7
## 4 W 11 W 13 W 12
## 5 D 5 W 19 D 1
## 6 D 4 W 14 W 17
head(chess.B)
## state player R1 R2 R3 R4 R5 R6
## 1 Num USCF ID / Rtg (Pre->Post) Pts 1 2 3 4 5
## 2 ON 15445895 / R: 1794 ->1817 N:2 W B W B W
## 3 MI 14598900 / R: 1553 ->1663 N:2 B W B W B
## 4 MI 14959604 / R: 1384 ->1640 N:2 W B W B W
## 5 MI 12616049 / R: 1716 ->1744 N:2 W B W B W
## 6 MI 14601533 / R: 1655 ->1690 N:2 B W B W B
## R7
## 1 6
## 2 B
## 3 W
## 4 B
## 5 B
## 6 W
Our tables are close to normal! Now, I will get rid of the first row in both tables which is just the column names. Then, I am only keeping information needed for our .csv file.
chess.A<-chess.A[-1,]
chess.A<-chess.A[-65,]
chess.B<-chess.B[-1,1:2]
For this section, I will examined the information needed for both tables. In table A, I will need the player’s opponent number for a later function. So, I erased the Letters from the rounds’ information and change the column types from string to integers. For table b, there another separation for the player’s tag, pre ranking, and post ranking.
chess.A$R1 <- sub("[A-Z]", "", chess.A$R1)
chess.A$R2 <- sub("[A-Z]", "", chess.A$R2)
chess.A$R3 <- sub("[A-Z]", "", chess.A$R3)
chess.A$R4 <- sub("[A-Z]", "", chess.A$R4)
chess.A$R5 <- sub("[A-Z]", "", chess.A$R5)
chess.A$R6 <- sub("[A-Z]", "", chess.A$R6)
chess.A$R7 <- sub("[A-Z]", "", chess.A$R7)
chess.A$R1 <- as.numeric(chess.A$R1)
chess.A$R2 <- as.numeric(chess.A$R2)
chess.A$R3 <- as.numeric(chess.A$R3)
chess.A$R4 <- as.numeric(chess.A$R4)
chess.A$R5 <- as.numeric(chess.A$R5)
chess.A$R6 <- as.numeric(chess.A$R6)
chess.A$R7 <- as.numeric(chess.A$R7)
chess.B <-
separate(
chess.B,
col = player,
sep = "[/]",
into = c("USCD_ID", "Ratings")
)
chess.B <-
separate(chess.B,
col = Ratings,
sep = "[->]",
into = c("Pre", "Post"))
chess.B$Pre <- sub("R[:]", "", chess.B$Pre)
In order to get rid of the provisional ratings, There is a for loop that iterates through the column. If a PR is found, it finds the index of the PR and trims the strong down to the player’s Pre rating. Then, the pre rating is reclassify as a integer for the compare function below.
for(a in 1:length(chess.B$Pre)) {
if (grepl("P", chess.B$Pre[a], fixed = TRUE) == TRUE) {
c <- unlist(gregexpr("P", chess.B$Pre[a]))[1]
c <- as.numeric(c)
chess.B$Pre[a] <- strtrim(chess.B$Pre[a], c - 1)
}
}
chess.B$Pre <- as.numeric(chess.B$Pre)
chess.B <- chess.B[, 1:3]
For our last column in the outputted .csv file is the average of the opponents’ pre chess ratings. For the calculation, we must gather all the players’ ore rating scores and take the average. Let’s create a function that takes the round columns and return a list of the calculated averages.
preAverage <- function() {
t <- 0
r <- c(0)
for (b in 1:64) {
opps <-
c(
chess.A$R1[b],
chess.A$R2[b],
chess.A$R3[b],
chess.A$R4[b],
chess.A$R5[b],
chess.A$R6[b],
chess.A$R7[b]
)
opps<-na.omit(opps)
for (c in 1:length(opps)) {
t <- chess.B$Pre[opps[c]] + t
}
r <- append(r, floor(t / length(opps)))
t <- 0
}
return(r)
}
pre.Avg<-preAverage()
pre.Avg<-pre.Avg[pre.Avg!=0]
We have all the needed information to create a the .csv file. I will create a data frame to form all the data to one place.
result<-data.frame(Player_name=chess.A$player,State=chess.B$state,Total_points=chess.A$Total_Points,Pre_Rating=chess.B$Pre,Opp_Pre_Rating_Avg=pre.Avg)
head(result)
## Player_name State Total_points Pre_Rating
## 1 GARY HUA ON 6.0 1794
## 2 DAKSHESH DARURI MI 6.0 1553
## 3 ADITYA BAJAJ MI 6.0 1384
## 4 PATRICK H SCHILLING MI 5.5 1716
## 5 HANSHI ZUO MI 5.5 1655
## 6 HANSEN SONG OH 5.0 1686
## Opp_Pre_Rating_Avg
## 1 1605
## 2 1469
## 3 1563
## 4 1573
## 5 1500
## 6 1518
All I have to do now is write our finalized data frame into its .csv!
write.csv(result,"Chess_Results.csv")