DATA607 - Project 1 - Chess Tournament

Loading the Data

First, load the stringr library and dataset TournamentInfo. To do this, we import the csv file (columns seperated by |) and will have to remove the empty lines filled with -’s.

library("stringr")
library("knitr")
TournamentInfo <- read.csv(url("https://raw.githubusercontent.com/chrisgmartin/DATA607/master/tournamentinfo.txt"),
                           sep = "|", stringsAsFactors=FALSE,
                           col.names = c("Pair", "Name", "Total", "R1", "R2", "R3", "R4", "R5", "R6", "R7", ""), 
                           skip = 1)
TournamentInfo <- TournamentInfo[seq(-2,-194,-3),]
kable(head(TournamentInfo))

	Pair	Name	Total	R1	R2	R3	R4	R5	R6	R7	X
1	Num	USCF ID / Rtg (Pre->Post)	Pts	1	2	3	4	5	6	7	NA
3	1	GARY HUA	6.0	W 39	W 21	W 18	W 14	W 7	D 12	D 4	NA
4	ON	15445895 / R: 1794 ->1817	N:2	W	B	W	B	W	B	W	NA
6	2	DAKSHESH DARURI	6.0	W 63	W 58	L 4	W 17	W 16	W 20	W 7	NA
7	MI	14598900 / R: 1553 ->1663	N:2	B	W	B	W	B	W	B	NA
9	3	ADITYA BAJAJ	6.0	L 8	W 61	W 25	W 21	W 11	W 13	W 12	NA

Seperating Rows for Each Player

Since we have the tournament info imported, the table shows that each player is displayed in two rows: the first includes the player name, result (win, loss, draw, etc.), and opponent number; the second includes the player’s state, player ID, rating (pre and post tournament), and some other various information. Since we’re only looking for specific information it’ll be important to merge and display the information we need. Here we can seperate these two rows for each player into two tables with one row for each player.

JoinedRows <- data.frame(TournamentInfo[seq(2,2*194/3,2),],TournamentInfo[seq(3,2*194/3,2),])
kable(head(JoinedRows))

	Pair	Name	Total	R1	R2	R3	R4	R5	R6	R7	X	Pair.1	Name.1	Total.1	R1.1	R2.1	R3.1	R4.1	R5.1	R6.1	R7.1	X.1
3	1	GARY HUA	6.0	W 39	W 21	W 18	W 14	W 7	D 12	D 4	NA	ON	15445895 / R: 1794 ->1817	N:2	W	B	W	B	W	B	W	NA
6	2	DAKSHESH DARURI	6.0	W 63	W 58	L 4	W 17	W 16	W 20	W 7	NA	MI	14598900 / R: 1553 ->1663	N:2	B	W	B	W	B	W	B	NA
9	3	ADITYA BAJAJ	6.0	L 8	W 61	W 25	W 21	W 11	W 13	W 12	NA	MI	14959604 / R: 1384 ->1640	N:2	W	B	W	B	W	B	W	NA
12	4	PATRICK H SCHILLING	5.5	W 23	D 28	W 2	W 26	D 5	W 19	D 1	NA	MI	12616049 / R: 1716 ->1744	N:2	W	B	W	B	W	B	B	NA
15	5	HANSHI ZUO	5.5	W 45	W 37	D 12	D 13	D 4	W 14	W 17	NA	MI	14601533 / R: 1655 ->1690	N:2	B	W	B	W	B	W	B	NA
18	6	HANSEN SONG	5.0	W 34	D 29	L 11	W 35	D 10	W 27	W 21	NA	OH	15055204 / R: 1686 ->1687	N:3	W	B	W	B	B	W	B	NA

Adding Pre-Tournament Ratings

Gathering Pre-Tournament Ratings

Next we’ll pull out the Pre-Tournament Rating from the NoNamesRows table. From the TournamentInfo table, you can see that the Pre-Tournament Rating is a 3-4 digit number in the Player.Name column preceeding the characters: “R:”. Using the str_locate function we can locate the point for each row where the rating appears and then use the str_sub function to extract the string in the that location. Finally, since these strings are integers (which will help us in further calculations) we will want to extract them as integers.

PreRatingLoc <- c(str_locate(JoinedRows[,13],"R: ")[1,2]+1, str_locate(JoinedRows[,13], "R: ")[1,2]+4)
PreRatings <- as.integer(str_sub(JoinedRows[,13],PreRatingLoc[1],PreRatingLoc[2]))
kable(head(PreRatings))

1794

1553

1384

1716

1655

1686

Merging the Named and Non-Named tables with Pre-Tournament Ratings

Now that we have our tables, we can start to combine them. We’ll call this new table ChessTable and pull in the Player Name, State, Total Number of Points, and Pre-Rating.

ChessTable <- data.frame(c(1:64), sapply(JoinedRows[2],str_trim),JoinedRows[12],JoinedRows[3], PreRatings)
colnames(ChessTable) <- c("Number", "Name", "State", "Points", "PreRating")
kable(head(ChessTable))

	Number	Name	State	Points	PreRating
3	1	GARY HUA	ON	6.0	1794
6	2	DAKSHESH DARURI	MI	6.0	1553
9	3	ADITYA BAJAJ	MI	6.0	1384
12	4	PATRICK H SCHILLING	MI	5.5	1716
15	5	HANSHI ZUO	MI	5.5	1655
18	6	HANSEN SONG	OH	5.0	1686

Adding Average-Opponent Ratings

Gathering Opponent List

The last piece we’ll need to add is the average opponent’s Pre-Tournament Rating. To get this we’ll create another table called OppRatings, which will list each opponent the players will face in the tournament.

Opponents <- data.frame(1, as.list(as.numeric(str_extract(JoinedRows[1,4:10],"[0-9]{1,2}"))))
colnames(Opponents) <- c("Number", "Opp1","Opp2","Opp3","Opp4","Opp5","Opp6","Opp7")
for (i in 2:64) {
  Opponents <- rbind(Opponents, as.numeric(str_extract(JoinedRows[i,4:10],"[0-9]{1,2}")))
}
Opponents$Number <- c(1:64)
kable(head(Opponents))

Number	Opp1	Opp2	Opp3	Opp4	Opp5	Opp6	Opp7
1	39	21	18	14	7	12	4
2	58	4	17	16	20	7	63
3	61	25	21	11	13	12	8
4	28	2	26	5	19	1	23
5	37	12	13	4	14	17	45
6	29	11	35	10	27	21	34

Gathering Opponent Pre-Tournament Ratings

From the list of each players’ opponents, we’ll pull in the opponents’ score, number of games played, and get an average opponent Pre-Tournament Rating.

Opponents <- data.frame(c(1:64),
                        ChessTable$PreRating[Opponents$Opp1[1:64]],
                        ChessTable$PreRating[Opponents$Opp2[1:64]],
                        ChessTable$PreRating[Opponents$Opp3[1:64]],
                        ChessTable$PreRating[Opponents$Opp4[1:64]],
                        ChessTable$PreRating[Opponents$Opp5[1:64]],
                        ChessTable$PreRating[Opponents$Opp6[1:64]],
                        ChessTable$PreRating[Opponents$Opp7[1:64]])
Opponents$Average <- round(rowMeans(Opponents[2:8],na.rm=TRUE), digits=0)
colnames(Opponents) <- c("Number", "Opp1", "Opp2", "Opp3", "Opp4", "Opp5", "Opp6", "Opp7", "Average")
kable(head(Opponents))

Number	Opp1	Opp2	Opp3	Opp4	Opp5	Opp6	Opp7	Average
1	1436	1563	1600	1610	1649	1663	1716	1605
2	917	1716	1629	1604	1595	1649	1175	1469
3	955	1745	1563	1712	1666	1663	1641	1564
4	1507	1553	1579	1655	1564	1794	1363	1574
5	980	1663	1666	1716	1610	1629	1242	1501
6	1602	1712	1438	1365	1552	1563	1399	1519

Merging all of the components into one table

ChessTable$OpponentAverage <- Opponents$Average
kable(head(ChessTable))

	Number	Name	State	Points	PreRating	OpponentAverage
3	1	GARY HUA	ON	6.0	1794	1605
6	2	DAKSHESH DARURI	MI	6.0	1553	1469
9	3	ADITYA BAJAJ	MI	6.0	1384	1564
12	4	PATRICK H SCHILLING	MI	5.5	1716	1574
15	5	HANSHI ZUO	MI	5.5	1655	1501
18	6	HANSEN SONG	OH	5.0	1686	1519

Plotting the table

require(ggplot2)

## Loading required package: ggplot2

ggplot(ChessTable, aes(x=PreRating, y=OpponentAverage, label = Name, colour = Points)) + geom_point() + geom_text(aes(label=ifelse(PreRating>1700, as.character(Name),'')), size = 2, hjust=0) + xlim(0,2000) + ggtitle("Player to Opponent Scatterplot")

Predicting an eigth match win

This tournament just got out of hand: against the rules of the tournament, an eigth round against a random player in the tournament has been set! Let’s find the chances that a player will face-off and win against their random opponent:

Selecting the random opponent

We’ll start by selecting (at random, without replacement) which opponent our player will face, and then predict the chances of a win.

ChessTable$Opp8 <- sample(1:length(ChessTable[,1]), 64, replace=F)
ChessTable$Opp8Rating <- ChessTable$PreRating[ChessTable$Opp8[1:64]]
kable(head(ChessTable))

	Number	Name	State	Points	PreRating	OpponentAverage	Opp8	Opp8Rating
3	1	GARY HUA	ON	6.0	1794	1605	62	1530
6	2	DAKSHESH DARURI	MI	6.0	1553	1469	4	1716
9	3	ADITYA BAJAJ	MI	6.0	1384	1564	25	1745
12	4	PATRICK H SCHILLING	MI	5.5	1716	1574	35	1438
15	5	HANSHI ZUO	MI	5.5	1655	1501	19	1564
18	6	HANSEN SONG	OH	5.0	1686	1519	32	1441

Calculating the chance of winning

Using the calculation WP = 1 / [10^ (dR/400) + 1 ] where dR is the difference in ratings, we can come up with a Winning Probability[1]

ChessTable$ChanceOfWin <- paste(round((1 / (10^ ((ChessTable$Opp8Rating - ChessTable$PreRating) / 400) + 1)) * 100, 2), "%", sep = "")
kable(head(ChessTable))

	Number	Name	State	Points	PreRating	OpponentAverage	Opp8	Opp8Rating	ChanceOfWin
3	1	GARY HUA	ON	6.0	1794	1605	62	1530	82.05%
6	2	DAKSHESH DARURI	MI	6.0	1553	1469	4	1716	28.12%
9	3	ADITYA BAJAJ	MI	6.0	1384	1564	25	1745	11.12%
12	4	PATRICK H SCHILLING	MI	5.5	1716	1574	35	1438	83.21%
15	5	HANSHI ZUO	MI	5.5	1655	1501	19	1564	62.8%
18	6	HANSEN SONG	OH	5.0	1686	1519	32	1441	80.38%

[1] Reference: http://www.caissa.com/support/chess-ratings.php