Project 1

Project 1 Data

First we must get read the data. For the moment we’ll gather it into a single string, and remove new lines

tournament <- readLines("./tournamentinfo.txt")

## Warning in readLines("./tournamentinfo.txt"): incomplete final line found
## on './tournamentinfo.txt'

tournament <- paste(tournament,collapse="\\n")
tournament <- gsub("\\\\n"," ", tournament)

Then we just split on the boundary “–…–”. As the author has used overly complicated regex previously (see Week 1) in this case we will just reprocess using the built in functions which actually gets us a rough but mostly parsed table.

tournamentResults <- strsplit(tournament, "-----------------------------------------------------------------------------------------")
tournamentTable<-read.table(text = paste(tournamentResults[[1]][2:length(tournamentResults[[1]])],collapse="\n"), sep="|", header = TRUE)
kable(head(tournamentTable[c(1,2,3,12,6,7,8)]))

Pair	Player.Name	Total	USCF.ID…Rtg..Pre..Post.	Round.2	Round.3	Round.4
1	GARY HUA	6.0	15445895 / R: 1794 ->1817	W 18	W 14	W 7
2	DAKSHESH DARURI	6.0	14598900 / R: 1553 ->1663	L 4	W 17	W 16
3	ADITYA BAJAJ	6.0	14959604 / R: 1384 ->1640	W 25	W 21	W 11
4	PATRICK H SCHILLING	5.5	12616049 / R: 1716 ->1744	W 2	W 26	D 5
5	HANSHI ZUO	5.5	14601533 / R: 1655 ->1690	D 12	D 13	D 4
6	HANSEN SONG	5.0	15055204 / R: 1686 ->1687	L 11	W 35	D 10

Now that we have done the basics. One Column (1 line of 2 technically) contains three different bits of data that we need, further complicating things is that the data has some data that we don’t need (provisional ranking?).So we have to split it up. We then title case the player names, do some name tidying.

Then we do a column split on the Round tables to get a the sub column data.

tournamentTable <- cbind(tournamentTable[-12] ,lapply(strcapture("(\\d+) / R:\\s*(\\d+)P?\\d*\\s*->\\s*(\\d+)P?\\d*",as.character(tournamentTable$USCF.ID...Rtg..Pre..Post.),data.frame(uscf.id="",preScore="",postScore="")),function(x) as.numeric(as.character(x))))
tournamentTable$Player.Name <- toTitleCase(tolower(trimws(as.character(tournamentTable$Player.Name))))
names(tournamentTable)[11] <- "State"
names(tournamentTable)[3] <- "Points"
names(tournamentTable)[grep("^Round",colnames(tournamentTable))]<- paste(rep("Round",7),seq(1,7), sep = "")
tournamentTable <- cSplit(tournamentTable, grep("^Round",colnames(tournamentTable)), " ")
kable(head(tournamentTable[,c(1,2,17,18,19,20)]))

Pair	Player.Name	Round1_1	Round1_2	Round2_1	Round2_2
1	Gary Hua	W	39	W	21
2	Dakshesh Daruri	W	63	W	58
3	Aditya Bajaj	L	8	W	61
4	Patrick h Schilling	W	23	D	28
5	Hanshi Zuo	W	45	W	37
6	Hansen Song	W	34	D	29

Next we messily re store the referenced score using a match lookup. We then get a mean of the row and round it (for output purposes).

Next to top it all off we write out the csv file. We then read it and display it as an end to end check of the output.

tournamentTable$oppScore1 <- as.numeric(tournamentTable$preScore[match(tournamentTable$Round1_2,tournamentTable$Pair)])
tournamentTable$oppScore2 <- as.numeric(tournamentTable$preScore[match(tournamentTable$Round2_2,tournamentTable$Pair)])
tournamentTable$oppScore3 <- as.numeric(tournamentTable$preScore[match(tournamentTable$Round3_2,tournamentTable$Pair)])
tournamentTable$oppScore4 <- as.numeric(tournamentTable$preScore[match(tournamentTable$Round4_2,tournamentTable$Pair)])
tournamentTable$oppScore5 <- as.numeric(tournamentTable$preScore[match(tournamentTable$Round5_2,tournamentTable$Pair)])
tournamentTable$oppScore6 <- as.numeric(tournamentTable$preScore[match(tournamentTable$Round6_2,tournamentTable$Pair)])
tournamentTable$oppScore7 <- as.numeric(tournamentTable$preScore[match(tournamentTable$Round7_2,tournamentTable$Pair)])
tournamentTable$oppMean <- round( rowMeans(tournamentTable[,31:37], na.rm = TRUE))
write.csv(tournamentTable[,c(2,4,3,15,38)],"chess.csv", row.names = FALSE )
dogFood <- read.csv("chess.csv")
kable(head(dogFood))

Player.Name	State	Points	preScore	oppMean
Gary Hua	ON	6.0	1794	1605
Dakshesh Daruri	MI	6.0	1553	1469
Aditya Bajaj	MI	6.0	1384	1564
Patrick h Schilling	MI	5.5	1716	1574
Hanshi Zuo	MI	5.5	1655	1501
Hansen Song	OH	5.0	1686	1519

Analysis

USCF ID effect on play

One expects that the longer one has been playing (the lower the USCF ID) the more consistent the player.

tournamentTable$changedScore <-  (tournamentTable[,16] - tournamentTable[,15])
ggplot(tournamentTable, aes(x=uscf.id, y= abs(tournamentTable$changedScore))) +geom_point(size=3) +  geom_smooth(method=lm)

We can show this, but in looking at the scoring method closer players who have fewers than 30 games move more, and there are a number of confounding factors. Still it is broadly confirmed that there is more stablity expected for older id numbers. Another thing to consider is to look at the outcomes of those players who’ve been around for a while and are moving a fair bit. These players might be returning to play.

ggplot() +geom_point(data=tournamentTable, size=3, aes(x=uscf.id, y= tournamentTable$changedScore)) + geom_point(data=tournamentTable[tournamentTable$uscf.id < 1.4e+07 & abs(tournamentTable$changedScore) > 50],aes(uscf.id, changedScore), color= "red")

We do see a cluster of players who are moving more than expected in red. Unfortunately many are moving down. If this were to hold up on a larger data set they would be players the organizers might try to prevent discouragement in this group. If these are returners the ones giving up their previous score are unlikely to become frequent players, absent other factors and without support.

How does the tournament work?

The author doesn’t know how chess tournaments in general are setup and certainly not this one in particular. Does a player get a set opponent list at the start of the tournament, or does their performance change their opponents?

To look into this we can look at the relative score of the first opponent versus the second after a win and after a loss.

boxplot(tournamentTable$oppScore2[tournamentTable$Round1_1=="W"] - tournamentTable$oppScore1[tournamentTable$Round1_1=="W"],tournamentTable$oppScore2[tournamentTable$Round1_1=="L"] - tournamentTable$oppScore1[tournamentTable$Round1_1=="L"], title="Round 2 Strength vs Round 1", names = c("After Win", "After Loss"))

boxplot(tournamentTable$oppScore3[tournamentTable$Round2_1=="W" & tournamentTable$Round1_1=="W"] - tournamentTable$oppScore1[tournamentTable$Round2_1=="W" & tournamentTable$Round1_1=="W"],tournamentTable$oppScore3[tournamentTable$Round2_1=="L" & tournamentTable$Round1_1=="L" ] - tournamentTable$oppScore1[tournamentTable$Round2_1=="L" & tournamentTable$Round1_1=="L"], title="Round 3 Strength vs Round 1", names = c("After Double Win", "After Double Loss"))

It seems very clear that winners are matched with harder opponents.

Project 1

Scott Reed

9/17/2019

Project 1 Data

Analysis

USCF ID effect on play

How does the tournament work?