Overview

The objective of this project is to import and clean a set of data about chess tournament player performance. The output will be a csv which contains the following information:
Players.Name Players.State Total.Number.of.Points Players.Pre_Rating Average.Pre.Chess.Rating.of.Opponents
Gary Hua ON 6 1794 1605

Retrieve Data

The data can be found in GitHub as a text file. This will be imported into R using the read_table() function.

## Parsed with column specification:
## cols(
##   `-----------------------------------------------------------------------------------------` = col_character()
## )

Clean Data

Below we are going to clean the Chess Tournament data that has been imported from a text file. There are enitre dashed lines separating lines of data and a single line of data spans two rows. We will need to merge the information from these two rows into a single line. I will also rename and remove columns that won’t be needed.

##  -----------------------------------------------------------------------------------------1 
##  "Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round|" 
##  -----------------------------------------------------------------------------------------2 
##  "Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  |" 
##  -----------------------------------------------------------------------------------------3 
## "-----------------------------------------------------------------------------------------" 
##  -----------------------------------------------------------------------------------------4 
##     "1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|" 
##  -----------------------------------------------------------------------------------------5 
##    "ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |" 
##  -----------------------------------------------------------------------------------------6 
## "-----------------------------------------------------------------------------------------"

Player’s Pre & Post Tournament Ratings

From the raw data, each player can been seen to have a pre tournament rating as well as a post tournament ratings. The first number that follows the “R:” is the Pre-Tournamnet Player Rating.
Player_ID USCF ID / Rtg (Pre>Post)
2 1 15445895 / R: 1794 >1817
3 2 14598900 / R: 1553 >1663
4 3 14959604 / R: 1384 >1640
5 4 12616049 / R: 1716 >1744
6 5 14601533 / R: 1655 >1690
7 6 15055204 / R: 1686 >1687

The information is all contained in a single string and must be parsed out using regular expressions.

Clean Opponent Match Information

We must remove excess information that doesn’t pertain to our analysis. We are looking to tidy data so that we can determine for each player, which opponent did they go up against in each match? We will pivot the data into a long format, so joining with player data later will be easier.

Player_ID State Opponent_no Opponent_ID
1 ON Opponent1 39
1 ON Opponent2 21
1 ON Opponent3 18
1 ON Opponent4 14
1 ON Opponent5 7
1 ON Opponent6 12

Average Opponent Ratings

We then grouped each player’s opponent ratings and average the opponent scores.

Player_ID Avg_Opponent_Pre_Rating
1 1605.286
2 1469.286
3 1563.571
4 1573.571
5 1500.857
6 1518.714

Combining Everything Together

Now that we have the individual player information:
Player_ID PlayerName Total Points State Rating_Pre
1 GARY HUA 6.0 ON 1794
2 DAKSHESH DARURI 6.0 MI 1553
3 ADITYA BAJAJ 6.0 MI 1384
4 PATRICK H SCHILLING 5.5 MI 1716
5 HANSHI ZUO 5.5 MI 1655
6 HANSEN SONG 5.0 OH 1686
Along with the average opponent pre tournament scores:
Player_ID Avg_Opponent_Pre_Rating
1 1605.286
2 1469.286
3 1563.571
4 1573.571
5 1500.857
6 1518.714
We can combine them together, joining using Player_ID, to get our final results & export the data into a csv “Chess_Tournament_Results.csv”.
PlayerName State Total Points Rating_Pre Avg_Opponent_Pre_Rating
GARY HUA ON 6.0 1794 1605.286
DAKSHESH DARURI MI 6.0 1553 1469.286
ADITYA BAJAJ MI 6.0 1384 1563.571
PATRICK H SCHILLING MI 5.5 1716 1573.571
HANSHI ZUO MI 5.5 1655 1500.857
HANSEN SONG OH 5.0 1686 1518.714

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot. chess[