First, load the stringr library and dataset TournamentInfo. To do this, we import the csv file (columns seperated by |) and will have to remove the empty lines filled with -’s.
library("stringr")
library("knitr")
TournamentInfo <- read.csv(url("https://raw.githubusercontent.com/chrisgmartin/DATA607/master/tournamentinfo.txt"),
sep = "|", stringsAsFactors=FALSE,
col.names = c("Pair", "Name", "Total", "R1", "R2", "R3", "R4", "R5", "R6", "R7", ""),
skip = 1)
TournamentInfo <- TournamentInfo[seq(-2,-194,-3),]
kable(head(TournamentInfo))
Pair | Name | Total | R1 | R2 | R3 | R4 | R5 | R6 | R7 | X | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 | NA |
3 | 1 | GARY HUA | 6.0 | W 39 | W 21 | W 18 | W 14 | W 7 | D 12 | D 4 | NA |
4 | ON | 15445895 / R: 1794 ->1817 | N:2 | W | B | W | B | W | B | W | NA |
6 | 2 | DAKSHESH DARURI | 6.0 | W 63 | W 58 | L 4 | W 17 | W 16 | W 20 | W 7 | NA |
7 | MI | 14598900 / R: 1553 ->1663 | N:2 | B | W | B | W | B | W | B | NA |
9 | 3 | ADITYA BAJAJ | 6.0 | L 8 | W 61 | W 25 | W 21 | W 11 | W 13 | W 12 | NA |
Since we have the tournament info imported, the table shows that each player is displayed in two rows: the first includes the player name, result (win, loss, draw, etc.), and opponent number; the second includes the player’s state, player ID, rating (pre and post tournament), and some other various information. Since we’re only looking for specific information it’ll be important to merge and display the information we need. Here we can seperate these two rows for each player into two tables with one row for each player.
JoinedRows <- data.frame(TournamentInfo[seq(2,2*194/3,2),],TournamentInfo[seq(3,2*194/3,2),])
kable(head(JoinedRows))
Pair | Name | Total | R1 | R2 | R3 | R4 | R5 | R6 | R7 | X | Pair.1 | Name.1 | Total.1 | R1.1 | R2.1 | R3.1 | R4.1 | R5.1 | R6.1 | R7.1 | X.1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 1 | GARY HUA | 6.0 | W 39 | W 21 | W 18 | W 14 | W 7 | D 12 | D 4 | NA | ON | 15445895 / R: 1794 ->1817 | N:2 | W | B | W | B | W | B | W | NA |
6 | 2 | DAKSHESH DARURI | 6.0 | W 63 | W 58 | L 4 | W 17 | W 16 | W 20 | W 7 | NA | MI | 14598900 / R: 1553 ->1663 | N:2 | B | W | B | W | B | W | B | NA |
9 | 3 | ADITYA BAJAJ | 6.0 | L 8 | W 61 | W 25 | W 21 | W 11 | W 13 | W 12 | NA | MI | 14959604 / R: 1384 ->1640 | N:2 | W | B | W | B | W | B | W | NA |
12 | 4 | PATRICK H SCHILLING | 5.5 | W 23 | D 28 | W 2 | W 26 | D 5 | W 19 | D 1 | NA | MI | 12616049 / R: 1716 ->1744 | N:2 | W | B | W | B | W | B | B | NA |
15 | 5 | HANSHI ZUO | 5.5 | W 45 | W 37 | D 12 | D 13 | D 4 | W 14 | W 17 | NA | MI | 14601533 / R: 1655 ->1690 | N:2 | B | W | B | W | B | W | B | NA |
18 | 6 | HANSEN SONG | 5.0 | W 34 | D 29 | L 11 | W 35 | D 10 | W 27 | W 21 | NA | OH | 15055204 / R: 1686 ->1687 | N:3 | W | B | W | B | B | W | B | NA |
Next we’ll pull out the Pre-Tournament Rating from the NoNamesRows table. From the TournamentInfo table, you can see that the Pre-Tournament Rating is a 3-4 digit number in the Player.Name column preceeding the characters: “R:”. Using the str_locate function we can locate the point for each row where the rating appears and then use the str_sub function to extract the string in the that location. Finally, since these strings are integers (which will help us in further calculations) we will want to extract them as integers.
PreRatingLoc <- c(str_locate(JoinedRows[,13],"R: ")[1,2]+1, str_locate(JoinedRows[,13], "R: ")[1,2]+4)
PreRatings <- as.integer(str_sub(JoinedRows[,13],PreRatingLoc[1],PreRatingLoc[2]))
kable(head(PreRatings))
1794 |
1553 |
1384 |
1716 |
1655 |
1686 |
Now that we have our tables, we can start to combine them. We’ll call this new table ChessTable and pull in the Player Name, State, Total Number of Points, and Pre-Rating.
ChessTable <- data.frame(c(1:64), sapply(JoinedRows[2],str_trim),JoinedRows[12],JoinedRows[3], PreRatings)
colnames(ChessTable) <- c("Number", "Name", "State", "Points", "PreRating")
kable(head(ChessTable))
Number | Name | State | Points | PreRating | |
---|---|---|---|---|---|
3 | 1 | GARY HUA | ON | 6.0 | 1794 |
6 | 2 | DAKSHESH DARURI | MI | 6.0 | 1553 |
9 | 3 | ADITYA BAJAJ | MI | 6.0 | 1384 |
12 | 4 | PATRICK H SCHILLING | MI | 5.5 | 1716 |
15 | 5 | HANSHI ZUO | MI | 5.5 | 1655 |
18 | 6 | HANSEN SONG | OH | 5.0 | 1686 |
The last piece we’ll need to add is the average opponent’s Pre-Tournament Rating. To get this we’ll create another table called OppRatings, which will list each opponent the players will face in the tournament.
Opponents <- data.frame(1, as.list(as.numeric(str_extract(JoinedRows[1,4:10],"[0-9]{1,2}"))))
colnames(Opponents) <- c("Number", "Opp1","Opp2","Opp3","Opp4","Opp5","Opp6","Opp7")
for (i in 2:64) {
Opponents <- rbind(Opponents, as.numeric(str_extract(JoinedRows[i,4:10],"[0-9]{1,2}")))
}
Opponents$Number <- c(1:64)
kable(head(Opponents))
Number | Opp1 | Opp2 | Opp3 | Opp4 | Opp5 | Opp6 | Opp7 |
---|---|---|---|---|---|---|---|
1 | 39 | 21 | 18 | 14 | 7 | 12 | 4 |
2 | 58 | 4 | 17 | 16 | 20 | 7 | 63 |
3 | 61 | 25 | 21 | 11 | 13 | 12 | 8 |
4 | 28 | 2 | 26 | 5 | 19 | 1 | 23 |
5 | 37 | 12 | 13 | 4 | 14 | 17 | 45 |
6 | 29 | 11 | 35 | 10 | 27 | 21 | 34 |
From the list of each players’ opponents, we’ll pull in the opponents’ score, number of games played, and get an average opponent Pre-Tournament Rating.
Opponents <- data.frame(c(1:64),
ChessTable$PreRating[Opponents$Opp1[1:64]],
ChessTable$PreRating[Opponents$Opp2[1:64]],
ChessTable$PreRating[Opponents$Opp3[1:64]],
ChessTable$PreRating[Opponents$Opp4[1:64]],
ChessTable$PreRating[Opponents$Opp5[1:64]],
ChessTable$PreRating[Opponents$Opp6[1:64]],
ChessTable$PreRating[Opponents$Opp7[1:64]])
Opponents$Average <- round(rowMeans(Opponents[2:8],na.rm=TRUE), digits=0)
colnames(Opponents) <- c("Number", "Opp1", "Opp2", "Opp3", "Opp4", "Opp5", "Opp6", "Opp7", "Average")
kable(head(Opponents))
Number | Opp1 | Opp2 | Opp3 | Opp4 | Opp5 | Opp6 | Opp7 | Average |
---|---|---|---|---|---|---|---|---|
1 | 1436 | 1563 | 1600 | 1610 | 1649 | 1663 | 1716 | 1605 |
2 | 917 | 1716 | 1629 | 1604 | 1595 | 1649 | 1175 | 1469 |
3 | 955 | 1745 | 1563 | 1712 | 1666 | 1663 | 1641 | 1564 |
4 | 1507 | 1553 | 1579 | 1655 | 1564 | 1794 | 1363 | 1574 |
5 | 980 | 1663 | 1666 | 1716 | 1610 | 1629 | 1242 | 1501 |
6 | 1602 | 1712 | 1438 | 1365 | 1552 | 1563 | 1399 | 1519 |
ChessTable$OpponentAverage <- Opponents$Average
kable(head(ChessTable))
Number | Name | State | Points | PreRating | OpponentAverage | |
---|---|---|---|---|---|---|
3 | 1 | GARY HUA | ON | 6.0 | 1794 | 1605 |
6 | 2 | DAKSHESH DARURI | MI | 6.0 | 1553 | 1469 |
9 | 3 | ADITYA BAJAJ | MI | 6.0 | 1384 | 1564 |
12 | 4 | PATRICK H SCHILLING | MI | 5.5 | 1716 | 1574 |
15 | 5 | HANSHI ZUO | MI | 5.5 | 1655 | 1501 |
18 | 6 | HANSEN SONG | OH | 5.0 | 1686 | 1519 |
require(ggplot2)
## Loading required package: ggplot2
ggplot(ChessTable, aes(x=PreRating, y=OpponentAverage, label = Name, colour = Points)) + geom_point() + geom_text(aes(label=ifelse(PreRating>1700, as.character(Name),'')), size = 2, hjust=0) + xlim(0,2000) + ggtitle("Player to Opponent Scatterplot")
This tournament just got out of hand: against the rules of the tournament, an eigth round against a random player in the tournament has been set! Let’s find the chances that a player will face-off and win against their random opponent:
We’ll start by selecting (at random, without replacement) which opponent our player will face, and then predict the chances of a win.
ChessTable$Opp8 <- sample(1:length(ChessTable[,1]), 64, replace=F)
ChessTable$Opp8Rating <- ChessTable$PreRating[ChessTable$Opp8[1:64]]
kable(head(ChessTable))
Number | Name | State | Points | PreRating | OpponentAverage | Opp8 | Opp8Rating | |
---|---|---|---|---|---|---|---|---|
3 | 1 | GARY HUA | ON | 6.0 | 1794 | 1605 | 62 | 1530 |
6 | 2 | DAKSHESH DARURI | MI | 6.0 | 1553 | 1469 | 4 | 1716 |
9 | 3 | ADITYA BAJAJ | MI | 6.0 | 1384 | 1564 | 25 | 1745 |
12 | 4 | PATRICK H SCHILLING | MI | 5.5 | 1716 | 1574 | 35 | 1438 |
15 | 5 | HANSHI ZUO | MI | 5.5 | 1655 | 1501 | 19 | 1564 |
18 | 6 | HANSEN SONG | OH | 5.0 | 1686 | 1519 | 32 | 1441 |
Using the calculation WP = 1 / [10^ (dR/400) + 1 ] where dR is the difference in ratings, we can come up with a Winning Probability[1]
ChessTable$ChanceOfWin <- paste(round((1 / (10^ ((ChessTable$Opp8Rating - ChessTable$PreRating) / 400) + 1)) * 100, 2), "%", sep = "")
kable(head(ChessTable))
Number | Name | State | Points | PreRating | OpponentAverage | Opp8 | Opp8Rating | ChanceOfWin | |
---|---|---|---|---|---|---|---|---|---|
3 | 1 | GARY HUA | ON | 6.0 | 1794 | 1605 | 62 | 1530 | 82.05% |
6 | 2 | DAKSHESH DARURI | MI | 6.0 | 1553 | 1469 | 4 | 1716 | 28.12% |
9 | 3 | ADITYA BAJAJ | MI | 6.0 | 1384 | 1564 | 25 | 1745 | 11.12% |
12 | 4 | PATRICK H SCHILLING | MI | 5.5 | 1716 | 1574 | 35 | 1438 | 83.21% |
15 | 5 | HANSHI ZUO | MI | 5.5 | 1655 | 1501 | 19 | 1564 | 62.8% |
18 | 6 | HANSEN SONG | OH | 5.0 | 1686 | 1519 | 32 | 1441 | 80.38% |
[1] Reference: http://www.caissa.com/support/chess-ratings.php