For this project I saved the original .txt file as an .xlsx file and used the built-in “Text to Columns” feature to quickly separate data into its respective columns. All of the Excel transformations only took about 20 minutes start to finish.
Then I saved the transformed data as a tab delimited text file called tournamentinfo2.txt and placed it in my GitHub repository here. For comparison purposes both datasets were imported into R and saved respectively as chess and chess2. The latter was used for the pursuant transformations.
fileURL <- "https://raw.githubusercontent.com/douglasbarley/DATA607/master/tournamentinfo.txt"
chess <- read.delim(fileURL, header = TRUE)
fileURL2 <- "https://raw.githubusercontent.com/douglasbarley/DATA607/master/tournamentinfo2.txt"
chess2 <- read.delim(fileURL2, header = TRUE, sep = "\t")
Most of the basic data about each player is relatively easy to obtain from the transformed data, as shown in the sample below.
players <- subset(chess2, select = c(as.integer(PairNum), PlayerName, State, TotalPts, PreRtg))
head(players)
## PairNum PlayerName State TotalPts PreRtg
## 1 1 GARY HUA ON 6.0 1794
## 2 2 DAKSHESH DARURI MI 6.0 1553
## 3 3 ADITYA BAJAJ MI 6.0 1384
## 4 4 PATRICK H SCHILLING MI 5.5 1716
## 5 5 HANSHI ZUO MI 5.5 1655
## 6 6 HANSEN SONG OH 5.0 1686
But calculating the average rating of each player’s opponents is a multi-step process. First we create a data frame displaying each player’s opponents’ Pair Number for each round of the tournament.
R1opp <- as.integer(trimws(substring(chess2$Round1,4,5))) # each player's opponent for round 1
R2opp <- as.integer(trimws(substring(chess2$Round2,4,5))) # each player's opponent for round 2
R3opp <- as.integer(trimws(substring(chess2$Round3,4,5))) # each player's opponent for round 3
R4opp <- as.integer(trimws(substring(chess2$Round4,4,5))) # each player's opponent for round 4
R5opp <- as.integer(trimws(substring(chess2$Round5,4,5))) # each player's opponent for round 5
R6opp <- as.integer(trimws(substring(chess2$Round6,4,5))) # each player's opponent for round 6
R7opp <- as.integer(trimws(substring(chess2$Round7,4,5))) # each player's opponent for round 7
Opponents <- data.frame(chess2$PairNum, R1opp, R2opp, R3opp, R4opp, R5opp, R6opp, R7opp)
colnames(Opponents) <- c('PairNum','1','2','3','4','5','6','7')
head(Opponents)
## PairNum 1 2 3 4 5 6 7
## 1 1 39 21 18 14 7 12 4
## 2 2 63 58 4 17 16 20 7
## 3 3 8 61 25 21 11 13 12
## 4 4 23 28 2 26 5 19 1
## 5 5 45 37 12 13 4 14 17
## 6 6 34 29 11 35 10 27 21
Next we pivot the data frame so that each player’s opponents are listed for each played round as a single vector, dropping all NA values (i.e. blanks where there was no match for a given round). Then we look up each opponent’s preliminary rating for each round.
# pivot the data frame "long-wise"
OppNum <- Opponents %>%
pivot_longer(`1`:`7`, names_to = "Round", values_to = "Opp", values_drop_na = TRUE)
# look up the opponent's pre-rating using their PairNum as an index
for (i in 1:nrow(OppNum)){
OppNum$OppPreRtg[i] <- players$PreRtg[which(OppNum$Opp[i] == players$PairNum)]
}
head(OppNum)
## # A tibble: 6 x 4
## PairNum Round Opp OppPreRtg
## <int> <chr> <int> <int>
## 1 1 1 39 1436
## 2 1 2 21 1563
## 3 1 3 18 1600
## 4 1 4 14 1610
## 5 1 5 7 1649
## 6 1 6 12 1663
Then we average each player’s opponents’ pre-Rating scores across all rounds that were played. The easiest way to do this appears to be using aggregation in a data table.
# aggregate the averages in a data table
AvgOppRtg <- data.table(round(aggregate(OppPreRtg ~ PairNum, OppNum, mean), digits = 0))
colnames(AvgOppRtg) <- c('PairNum','AvgOppPreRating')
head(AvgOppRtg)
## PairNum AvgOppPreRating
## 1: 1 1605
## 2: 2 1469
## 3: 3 1564
## 4: 4 1574
## 5: 5 1501
## 6: 6 1519
Finally, with the average of each player’s opponents in hand, we merge that data into the players data frame as the last column.
# merge the two data sets
playersall <- merge(players,AvgOppRtg)
head(playersall)
## PairNum PlayerName State TotalPts PreRtg AvgOppPreRating
## 1 1 GARY HUA ON 6.0 1794 1605
## 2 2 DAKSHESH DARURI MI 6.0 1553 1469
## 3 3 ADITYA BAJAJ MI 6.0 1384 1564
## 4 4 PATRICK H SCHILLING MI 5.5 1716 1574
## 5 5 HANSHI ZUO MI 5.5 1655 1501
## 6 6 HANSEN SONG OH 5.0 1686 1519
All that is left to do is export the results to a .csv file, but since we do not need to export the PairNum column we will first set it to null.
# remove the PairNum column
playersall$PairNum = NULL
# export the file
write.csv(playersall, file = "players.csv", sep = ",", row.names = FALSE)
The file called “players.csv” should now be in your local Documents folder.