Project 1 Data

First we must get read the data. For the moment we’ll gather it into a single string, and remove new lines

tournament <- readLines("./tournamentinfo.txt")
## Warning in readLines("./tournamentinfo.txt"): incomplete final line found
## on './tournamentinfo.txt'
tournament <- paste(tournament,collapse="\\n")
tournament <- gsub("\\\\n"," ", tournament)

Then we just split on the boundary “–…–”. As the author has used overly complicated regex previously (see Week 1) in this case we will just reprocess using the built in functions which actually gets us a rough but mostly parsed table.

tournamentResults <- strsplit(tournament, "-----------------------------------------------------------------------------------------")
tournamentTable<-read.table(text = paste(tournamentResults[[1]][2:length(tournamentResults[[1]])],collapse="\n"), sep="|", header = TRUE)
kable(head(tournamentTable[c(1,2,3,12,6,7,8)]))
Pair Player.Name Total USCF.ID…Rtg..Pre..Post. Round.2 Round.3 Round.4
1 GARY HUA 6.0 15445895 / R: 1794 ->1817 W 18 W 14 W 7
2 DAKSHESH DARURI 6.0 14598900 / R: 1553 ->1663 L 4 W 17 W 16
3 ADITYA BAJAJ 6.0 14959604 / R: 1384 ->1640 W 25 W 21 W 11
4 PATRICK H SCHILLING 5.5 12616049 / R: 1716 ->1744 W 2 W 26 D 5
5 HANSHI ZUO 5.5 14601533 / R: 1655 ->1690 D 12 D 13 D 4
6 HANSEN SONG 5.0 15055204 / R: 1686 ->1687 L 11 W 35 D 10

Now that we have done the basics. One Column (1 line of 2 technically) contains three different bits of data that we need, further complicating things is that the data has some data that we don’t need (provisional ranking?).So we have to split it up. We then title case the player names, do some name tidying.

Then we do a column split on the Round tables to get a the sub column data.

tournamentTable <- cbind(tournamentTable[-12] ,lapply(strcapture("(\\d+) / R:\\s*(\\d+)P?\\d*\\s*->\\s*(\\d+)P?\\d*",as.character(tournamentTable$USCF.ID...Rtg..Pre..Post.),data.frame(uscf.id="",preScore="",postScore="")),function(x) as.numeric(as.character(x))))
tournamentTable$Player.Name <- toTitleCase(tolower(trimws(as.character(tournamentTable$Player.Name))))
names(tournamentTable)[11] <- "State"
names(tournamentTable)[3] <- "Points"
names(tournamentTable)[grep("^Round",colnames(tournamentTable))]<- paste(rep("Round",7),seq(1,7), sep = "")
tournamentTable <- cSplit(tournamentTable, grep("^Round",colnames(tournamentTable)), " ")
kable(head(tournamentTable[,c(1,2,17,18,19,20)]))
Pair Player.Name Round1_1 Round1_2 Round2_1 Round2_2
1 Gary Hua W 39 W 21
2 Dakshesh Daruri W 63 W 58
3 Aditya Bajaj L 8 W 61
4 Patrick h Schilling W 23 D 28
5 Hanshi Zuo W 45 W 37
6 Hansen Song W 34 D 29

Next we messily re store the referenced score using a match lookup. We then get a mean of the row and round it (for output purposes).

Next to top it all off we write out the csv file. We then read it and display it as an end to end check of the output.

tournamentTable$oppScore1 <- as.numeric(tournamentTable$preScore[match(tournamentTable$Round1_2,tournamentTable$Pair)])
tournamentTable$oppScore2 <- as.numeric(tournamentTable$preScore[match(tournamentTable$Round2_2,tournamentTable$Pair)])
tournamentTable$oppScore3 <- as.numeric(tournamentTable$preScore[match(tournamentTable$Round3_2,tournamentTable$Pair)])
tournamentTable$oppScore4 <- as.numeric(tournamentTable$preScore[match(tournamentTable$Round4_2,tournamentTable$Pair)])
tournamentTable$oppScore5 <- as.numeric(tournamentTable$preScore[match(tournamentTable$Round5_2,tournamentTable$Pair)])
tournamentTable$oppScore6 <- as.numeric(tournamentTable$preScore[match(tournamentTable$Round6_2,tournamentTable$Pair)])
tournamentTable$oppScore7 <- as.numeric(tournamentTable$preScore[match(tournamentTable$Round7_2,tournamentTable$Pair)])
tournamentTable$oppMean <- round( rowMeans(tournamentTable[,31:37], na.rm = TRUE))
write.csv(tournamentTable[,c(2,4,3,15,38)],"chess.csv", row.names = FALSE )
dogFood <- read.csv("chess.csv")
kable(head(dogFood))
Player.Name State Points preScore oppMean
Gary Hua ON 6.0 1794 1605
Dakshesh Daruri MI 6.0 1553 1469
Aditya Bajaj MI 6.0 1384 1564
Patrick h Schilling MI 5.5 1716 1574
Hanshi Zuo MI 5.5 1655 1501
Hansen Song OH 5.0 1686 1519

Analysis

USCF ID effect on play

One expects that the longer one has been playing (the lower the USCF ID) the more consistent the player.

tournamentTable$changedScore <-  (tournamentTable[,16] - tournamentTable[,15])
ggplot(tournamentTable, aes(x=uscf.id, y= abs(tournamentTable$changedScore))) +geom_point(size=3) +  geom_smooth(method=lm)

We can show this, but in looking at the scoring method closer players who have fewers than 30 games move more, and there are a number of confounding factors. Still it is broadly confirmed that there is more stablity expected for older id numbers. Another thing to consider is to look at the outcomes of those players who’ve been around for a while and are moving a fair bit. These players might be returning to play.

ggplot() +geom_point(data=tournamentTable, size=3, aes(x=uscf.id, y= tournamentTable$changedScore)) + geom_point(data=tournamentTable[tournamentTable$uscf.id < 1.4e+07 & abs(tournamentTable$changedScore) > 50],aes(uscf.id, changedScore), color= "red")

We do see a cluster of players who are moving more than expected in red. Unfortunately many are moving down. If this were to hold up on a larger data set they would be players the organizers might try to prevent discouragement in this group. If these are returners the ones giving up their previous score are unlikely to become frequent players, absent other factors and without support.

How does the tournament work?

The author doesn’t know how chess tournaments in general are setup and certainly not this one in particular. Does a player get a set opponent list at the start of the tournament, or does their performance change their opponents?

To look into this we can look at the relative score of the first opponent versus the second after a win and after a loss.

boxplot(tournamentTable$oppScore2[tournamentTable$Round1_1=="W"] - tournamentTable$oppScore1[tournamentTable$Round1_1=="W"],tournamentTable$oppScore2[tournamentTable$Round1_1=="L"] - tournamentTable$oppScore1[tournamentTable$Round1_1=="L"], title="Round 2 Strength vs Round 1", names = c("After Win", "After Loss"))

boxplot(tournamentTable$oppScore3[tournamentTable$Round2_1=="W" & tournamentTable$Round1_1=="W"] - tournamentTable$oppScore1[tournamentTable$Round2_1=="W" & tournamentTable$Round1_1=="W"],tournamentTable$oppScore3[tournamentTable$Round2_1=="L" & tournamentTable$Round1_1=="L" ] - tournamentTable$oppScore1[tournamentTable$Round2_1=="L" & tournamentTable$Round1_1=="L"], title="Round 3 Strength vs Round 1", names = c("After Double Win", "After Double Loss"))

It seems very clear that winners are matched with harder opponents.