DATA 607 - Project 1

Chess Tournament Stats

For this project I saved the original .txt file as an .xlsx file and used the built-in “Text to Columns” feature to quickly separate data into its respective columns. All of the Excel transformations only took about 20 minutes start to finish.

Then I saved the transformed data as a tab delimited text file called tournamentinfo2.txt and placed it in my GitHub repository here. For comparison purposes both datasets were imported into R and saved respectively as chess and chess2. The latter was used for the pursuant transformations.

fileURL <- "https://raw.githubusercontent.com/douglasbarley/DATA607/master/tournamentinfo.txt"
chess <- read.delim(fileURL, header = TRUE)

fileURL2 <- "https://raw.githubusercontent.com/douglasbarley/DATA607/master/tournamentinfo2.txt"
chess2 <- read.delim(fileURL2, header = TRUE, sep = "\t")

Most of the basic data about each player is relatively easy to obtain from the transformed data, as shown in the sample below.

players <- subset(chess2, select = c(as.integer(PairNum), PlayerName, State, TotalPts, PreRtg))
head(players)

##   PairNum          PlayerName State TotalPts PreRtg
## 1       1            GARY HUA    ON      6.0   1794
## 2       2     DAKSHESH DARURI    MI      6.0   1553
## 3       3        ADITYA BAJAJ    MI      6.0   1384
## 4       4 PATRICK H SCHILLING    MI      5.5   1716
## 5       5          HANSHI ZUO    MI      5.5   1655
## 6       6         HANSEN SONG    OH      5.0   1686

But calculating the average rating of each player’s opponents is a multi-step process. First we create a data frame displaying each player’s opponents’ Pair Number for each round of the tournament.

R1opp <- as.integer(trimws(substring(chess2$Round1,4,5))) # each player's opponent for round 1
R2opp <- as.integer(trimws(substring(chess2$Round2,4,5))) # each player's opponent for round 2
R3opp <- as.integer(trimws(substring(chess2$Round3,4,5))) # each player's opponent for round 3
R4opp <- as.integer(trimws(substring(chess2$Round4,4,5))) # each player's opponent for round 4
R5opp <- as.integer(trimws(substring(chess2$Round5,4,5))) # each player's opponent for round 5
R6opp <- as.integer(trimws(substring(chess2$Round6,4,5))) # each player's opponent for round 6
R7opp <- as.integer(trimws(substring(chess2$Round7,4,5))) # each player's opponent for round 7

Opponents <- data.frame(chess2$PairNum, R1opp, R2opp, R3opp, R4opp, R5opp, R6opp, R7opp)

colnames(Opponents) <- c('PairNum','1','2','3','4','5','6','7')

head(Opponents)

##   PairNum  1  2  3  4  5  6  7
## 1       1 39 21 18 14  7 12  4
## 2       2 63 58  4 17 16 20  7
## 3       3  8 61 25 21 11 13 12
## 4       4 23 28  2 26  5 19  1
## 5       5 45 37 12 13  4 14 17
## 6       6 34 29 11 35 10 27 21

Next we pivot the data frame so that each player’s opponents are listed for each played round as a single vector, dropping all NA values (i.e. blanks where there was no match for a given round). Then we look up each opponent’s preliminary rating for each round.

# pivot the data frame "long-wise"
OppNum <- Opponents %>%
  pivot_longer(`1`:`7`, names_to = "Round", values_to = "Opp", values_drop_na = TRUE)

# look up the opponent's pre-rating using their PairNum as an index
for (i in 1:nrow(OppNum)){
  OppNum$OppPreRtg[i] <- players$PreRtg[which(OppNum$Opp[i] == players$PairNum)]
}

head(OppNum)

## # A tibble: 6 x 4
##   PairNum Round   Opp OppPreRtg
##     <int> <chr> <int>     <int>
## 1       1 1        39      1436
## 2       1 2        21      1563
## 3       1 3        18      1600
## 4       1 4        14      1610
## 5       1 5         7      1649
## 6       1 6        12      1663

Then we average each player’s opponents’ pre-Rating scores across all rounds that were played. The easiest way to do this appears to be using aggregation in a data table.

# aggregate the averages in a data table
AvgOppRtg <- data.table(round(aggregate(OppPreRtg ~ PairNum, OppNum, mean), digits = 0))

colnames(AvgOppRtg) <- c('PairNum','AvgOppPreRating')

head(AvgOppRtg)

##    PairNum AvgOppPreRating
## 1:       1            1605
## 2:       2            1469
## 3:       3            1564
## 4:       4            1574
## 5:       5            1501
## 6:       6            1519

Finally, with the average of each player’s opponents in hand, we merge that data into the players data frame as the last column.

# merge the two data sets
playersall <- merge(players,AvgOppRtg)

head(playersall)

##   PairNum          PlayerName State TotalPts PreRtg AvgOppPreRating
## 1       1            GARY HUA    ON      6.0   1794            1605
## 2       2     DAKSHESH DARURI    MI      6.0   1553            1469
## 3       3        ADITYA BAJAJ    MI      6.0   1384            1564
## 4       4 PATRICK H SCHILLING    MI      5.5   1716            1574
## 5       5          HANSHI ZUO    MI      5.5   1655            1501
## 6       6         HANSEN SONG    OH      5.0   1686            1519

All that is left to do is export the results to a .csv file, but since we do not need to export the PairNum column we will first set it to null.

# remove the PairNum column
playersall$PairNum = NULL

# export the file
write.csv(playersall, file = "players.csv", sep = ",", row.names = FALSE)

The file called “players.csv” should now be in your local Documents folder.

DATA 607 - Project 1

Douglas Barley

9/19/2020

Chess Tournament Stats