Project 1 - Chess Tournament

Loading the Data

First, load the stringr library and dataset TournamentInfo. To do this, we import the csv file (columns seperated by |) and will have to remove the empty lines filled with -’s.

library("stringr")
library("knitr")
TournamentInfo <- read.csv(url("https://raw.githubusercontent.com/chrisgmartin/DATA607/master/tournamentinfo.txt"),
                           sep = "|", stringsAsFactors=FALSE,
                           col.names = c("Pair", "Name", "Total", "R1", "R2", "R3", "R4", "R5", "R6", "R7", ""), 
                           skip = 1)
TournamentInfo <- TournamentInfo[seq(-2,-194,-3),]
kable(head(TournamentInfo))
Pair Name Total R1 R2 R3 R4 R5 R6 R7 X
1 Num USCF ID / Rtg (Pre->Post) Pts 1 2 3 4 5 6 7 NA
3 1 GARY HUA 6.0 W 39 W 21 W 18 W 14 W 7 D 12 D 4 NA
4 ON 15445895 / R: 1794 ->1817 N:2 W B W B W B W NA
6 2 DAKSHESH DARURI 6.0 W 63 W 58 L 4 W 17 W 16 W 20 W 7 NA
7 MI 14598900 / R: 1553 ->1663 N:2 B W B W B W B NA
9 3 ADITYA BAJAJ 6.0 L 8 W 61 W 25 W 21 W 11 W 13 W 12 NA

Seperating Rows for Each Player

Since we have the tournament info imported, the table shows that each player is displayed in two rows: the first includes the player name, result (win, loss, draw, etc.), and opponent number; the second includes the player’s state, player ID, rating (pre and post tournament), and some other various information. Since we’re only looking for specific information it’ll be important to merge and display the information we need. Here we can seperate these two rows for each player into two tables with one row for each player.

JoinedRows <- data.frame(TournamentInfo[seq(2,2*194/3,2),],TournamentInfo[seq(3,2*194/3,2),])
kable(head(JoinedRows))
Pair Name Total R1 R2 R3 R4 R5 R6 R7 X Pair.1 Name.1 Total.1 R1.1 R2.1 R3.1 R4.1 R5.1 R6.1 R7.1 X.1
3 1 GARY HUA 6.0 W 39 W 21 W 18 W 14 W 7 D 12 D 4 NA ON 15445895 / R: 1794 ->1817 N:2 W B W B W B W NA
6 2 DAKSHESH DARURI 6.0 W 63 W 58 L 4 W 17 W 16 W 20 W 7 NA MI 14598900 / R: 1553 ->1663 N:2 B W B W B W B NA
9 3 ADITYA BAJAJ 6.0 L 8 W 61 W 25 W 21 W 11 W 13 W 12 NA MI 14959604 / R: 1384 ->1640 N:2 W B W B W B W NA
12 4 PATRICK H SCHILLING 5.5 W 23 D 28 W 2 W 26 D 5 W 19 D 1 NA MI 12616049 / R: 1716 ->1744 N:2 W B W B W B B NA
15 5 HANSHI ZUO 5.5 W 45 W 37 D 12 D 13 D 4 W 14 W 17 NA MI 14601533 / R: 1655 ->1690 N:2 B W B W B W B NA
18 6 HANSEN SONG 5.0 W 34 D 29 L 11 W 35 D 10 W 27 W 21 NA OH 15055204 / R: 1686 ->1687 N:3 W B W B B W B NA

Adding Pre-Tournament Ratings

Gathering Pre-Tournament Ratings

Next we’ll pull out the Pre-Tournament Rating from the NoNamesRows table. From the TournamentInfo table, you can see that the Pre-Tournament Rating is a 3-4 digit number in the Player.Name column preceeding the characters: “R:”. Using the str_locate function we can locate the point for each row where the rating appears and then use the str_sub function to extract the string in the that location. Finally, since these strings are integers (which will help us in further calculations) we will want to extract them as integers.

PreRatingLoc <- c(str_locate(JoinedRows[,13],"R: ")[1,2]+1, str_locate(JoinedRows[,13], "R: ")[1,2]+4)
PreRatings <- as.integer(str_sub(JoinedRows[,13],PreRatingLoc[1],PreRatingLoc[2]))
kable(head(PreRatings))
1794
1553
1384
1716
1655
1686

Merging the Named and Non-Named tables with Pre-Tournament Ratings

Now that we have our tables, we can start to combine them. We’ll call this new table ChessTable and pull in the Player Name, State, Total Number of Points, and Pre-Rating.

ChessTable <- data.frame(c(1:64), sapply(JoinedRows[2],str_trim),JoinedRows[12],JoinedRows[3], PreRatings)
colnames(ChessTable) <- c("Number", "Name", "State", "Points", "PreRating")
kable(head(ChessTable))
Number Name State Points PreRating
3 1 GARY HUA ON 6.0 1794
6 2 DAKSHESH DARURI MI 6.0 1553
9 3 ADITYA BAJAJ MI 6.0 1384
12 4 PATRICK H SCHILLING MI 5.5 1716
15 5 HANSHI ZUO MI 5.5 1655
18 6 HANSEN SONG OH 5.0 1686

Adding Average-Opponent Ratings

Gathering Opponent List

The last piece we’ll need to add is the average opponent’s Pre-Tournament Rating. To get this we’ll create another table called OppRatings, which will list each opponent the players will face in the tournament.

Opponents <- data.frame(1, as.list(as.numeric(str_extract(JoinedRows[1,4:10],"[0-9]{1,2}"))))
colnames(Opponents) <- c("Number", "Opp1","Opp2","Opp3","Opp4","Opp5","Opp6","Opp7")
for (i in 2:64) {
  Opponents <- rbind(Opponents, as.numeric(str_extract(JoinedRows[i,4:10],"[0-9]{1,2}")))
}
Opponents$Number <- c(1:64)
kable(head(Opponents))
Number Opp1 Opp2 Opp3 Opp4 Opp5 Opp6 Opp7
1 39 21 18 14 7 12 4
2 58 4 17 16 20 7 63
3 61 25 21 11 13 12 8
4 28 2 26 5 19 1 23
5 37 12 13 4 14 17 45
6 29 11 35 10 27 21 34

Gathering Opponent Pre-Tournament Ratings

From the list of each players’ opponents, we’ll pull in the opponents’ score, number of games played, and get an average opponent Pre-Tournament Rating.

Opponents <- data.frame(c(1:64),
                        ChessTable$PreRating[Opponents$Opp1[1:64]],
                        ChessTable$PreRating[Opponents$Opp2[1:64]],
                        ChessTable$PreRating[Opponents$Opp3[1:64]],
                        ChessTable$PreRating[Opponents$Opp4[1:64]],
                        ChessTable$PreRating[Opponents$Opp5[1:64]],
                        ChessTable$PreRating[Opponents$Opp6[1:64]],
                        ChessTable$PreRating[Opponents$Opp7[1:64]])
Opponents$Average <- round(rowMeans(Opponents[2:8],na.rm=TRUE), digits=0)
colnames(Opponents) <- c("Number", "Opp1", "Opp2", "Opp3", "Opp4", "Opp5", "Opp6", "Opp7", "Average")
kable(head(Opponents))
Number Opp1 Opp2 Opp3 Opp4 Opp5 Opp6 Opp7 Average
1 1436 1563 1600 1610 1649 1663 1716 1605
2 917 1716 1629 1604 1595 1649 1175 1469
3 955 1745 1563 1712 1666 1663 1641 1564
4 1507 1553 1579 1655 1564 1794 1363 1574
5 980 1663 1666 1716 1610 1629 1242 1501
6 1602 1712 1438 1365 1552 1563 1399 1519

Merging all of the components into one table

ChessTable$OpponentAverage <- Opponents$Average
kable(head(ChessTable))
Number Name State Points PreRating OpponentAverage
3 1 GARY HUA ON 6.0 1794 1605
6 2 DAKSHESH DARURI MI 6.0 1553 1469
9 3 ADITYA BAJAJ MI 6.0 1384 1564
12 4 PATRICK H SCHILLING MI 5.5 1716 1574
15 5 HANSHI ZUO MI 5.5 1655 1501
18 6 HANSEN SONG OH 5.0 1686 1519

Plotting the table

require(ggplot2)
## Loading required package: ggplot2
ggplot(ChessTable, aes(x=PreRating, y=OpponentAverage, label = Name, colour = Points)) + geom_point() + geom_text(aes(label=ifelse(PreRating>1700, as.character(Name),'')), size = 2, hjust=0) + xlim(0,2000) + ggtitle("Player to Opponent Scatterplot")


Predicting an eigth match win

This tournament just got out of hand: against the rules of the tournament, an eigth round against a random player in the tournament has been set! Let’s find the chances that a player will face-off and win against their random opponent:

Selecting the random opponent

We’ll start by selecting (at random, without replacement) which opponent our player will face, and then predict the chances of a win.

ChessTable$Opp8 <- sample(1:length(ChessTable[,1]), 64, replace=F)
ChessTable$Opp8Rating <- ChessTable$PreRating[ChessTable$Opp8[1:64]]
kable(head(ChessTable))
Number Name State Points PreRating OpponentAverage Opp8 Opp8Rating
3 1 GARY HUA ON 6.0 1794 1605 62 1530
6 2 DAKSHESH DARURI MI 6.0 1553 1469 4 1716
9 3 ADITYA BAJAJ MI 6.0 1384 1564 25 1745
12 4 PATRICK H SCHILLING MI 5.5 1716 1574 35 1438
15 5 HANSHI ZUO MI 5.5 1655 1501 19 1564
18 6 HANSEN SONG OH 5.0 1686 1519 32 1441

Calculating the chance of winning

Using the calculation WP = 1 / [10^ (dR/400) + 1 ] where dR is the difference in ratings, we can come up with a Winning Probability[1]

ChessTable$ChanceOfWin <- paste(round((1 / (10^ ((ChessTable$Opp8Rating - ChessTable$PreRating) / 400) + 1)) * 100, 2), "%", sep = "")
kable(head(ChessTable))
Number Name State Points PreRating OpponentAverage Opp8 Opp8Rating ChanceOfWin
3 1 GARY HUA ON 6.0 1794 1605 62 1530 82.05%
6 2 DAKSHESH DARURI MI 6.0 1553 1469 4 1716 28.12%
9 3 ADITYA BAJAJ MI 6.0 1384 1564 25 1745 11.12%
12 4 PATRICK H SCHILLING MI 5.5 1716 1574 35 1438 83.21%
15 5 HANSHI ZUO MI 5.5 1655 1501 19 1564 62.8%
18 6 HANSEN SONG OH 5.0 1686 1519 32 1441 80.38%

[1] Reference: http://www.caissa.com/support/chess-ratings.php