Chess Tournament

We’re taking a look at chess tournament data and we want to process and extract the following variables into into a formatted .csv file extract:
  • Player’s Name
  • Player’s State
  • Total Number of Points
  • Player’s Pre-Rating
  • Average Pre Chess Rating of Opponents
Output format:

Gary Hua, ON, 6.0, 1794, 1605

Data Evaluation

The structure of the text is the following:
- The first three lines describe the pattern (which we will skip when processing)
- Player attributes such as Player Name, Total Points as shown below
- Player attributes such as State, Pre Tournament Rating as shown below
- A dashed line which we will ignore

##   X.........................................................................................
## 1  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 
## 2  Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## 3  -----------------------------------------------------------------------------------------
## 4      1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## 5     ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |
## 6  -----------------------------------------------------------------------------------------
## 7      2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|
## 8     MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |
## 9  -----------------------------------------------------------------------------------------

Data Transformation

We re-format the text file for easier extraction. Each player has two lines of text from which we will extract information in the two following loops. We ignore the first three lines of text and increment by 3 everytime to ignore the lines of dashes.

We combine this extracted information into a dataframe and take a look at it.

## 'data.frame':    64 obs. of  5 variables:
##  $ name                 : Factor w/ 64 levels "ADITYA BAJAJ",..: 24 12 1 51 28 27 23 21 59 5 ...
##  $ state                : Factor w/ 3 levels "MI","OH","ON": 3 1 1 1 1 2 1 1 3 1 ...
##  $ total_pts            : num  6 6 6 5.5 5.5 5 5 5 5 5 ...
##  $ pre_tournament_rating: num  1794 1553 1384 1716 1655 ...
##  $ opponents            : Factor w/ 64 levels "1,54,40,16,44,21,24",..: 35 60 63 18 41 31 53 26 20 10 ...
Extracted Data
name state total_pts pre_tournament_rating opponents
GARY HUA ON 6.0 1794 39,21,18,14,7,12,4
DAKSHESH DARURI MI 6.0 1553 63,58,4,17,16,20,7
ADITYA BAJAJ MI 6.0 1384 8,61,25,21,11,13,12
PATRICK SCHILLING MI 5.5 1716 23,28,2,26,5,19,1
HANSHI ZUO MI 5.5 1655 45,37,12,13,4,14,17
HANSEN SONG OH 5.0 1686 34,29,11,35,10,27,21
Summary of Extraced Data
name state total_pts pre_tournament_rating opponents
ADITYA BAJAJ : 1 MI:55 Min. :1.000 Min. :1011 1,54,40,16,44,21,24 : 1
ALAN BUI : 1 OH: 1 1st Qu.:2.500 1st Qu.:1280 10,15,39,2,36,NA,NA : 1
ALEX KONG : 1 ON: 8 Median :3.500 Median :1430 11,35,29,12,18,15,NA: 1
AMIYATOSH PWNANANDAM: 1 NA Mean :3.438 Mean :1425 11,35,45,40,42,NA,NA: 1
ANVIT RAO : 1 NA 3rd Qu.:4.000 3rd Qu.:1596 12,50,57,60,61,64,56: 1
ASHWIN BALAJI : 1 NA Max. :6.000 Max. :1794 13,57,51,33,16,28,NA: 1
(Other) :58 NA NA NA’s :4 (Other) :58

Data Processing

From the summary above, we learn that there missing values in both the pre_tournament_rating and the opponents values. We need to keep this in mind when calculating the apcro (Average Pre Chess Rating of Opponent) for each player using each player’s row index as the pair number. We will drop the NA values in the apcro calculation.

Preparing the output format for data export

Processed Data
name state total_pts pre_tournament_rating apcro
GARY HUA ON 6.0 1794 1605
DAKSHESH DARURI MI 6.0 1553 1561
ADITYA BAJAJ MI 6.0 1384 1665
PATRICK SCHILLING MI 5.5 1716 1574
HANSHI ZUO MI 5.5 1655 1515
HANSEN SONG OH 5.0 1686 1519
GARY DEE SWATHELL MI 5.0 1649 1472
EZEKIEL HOUGHTON MI 5.0 1641 1468
STEFANO LEE ON 5.0 1411 1635
ANVIT RAO MI 5.0 1365 1554

Data Analysis & Visualization

Let’s start by taking a look at the summary statistics. It is interesting to note that the mean for pre tournament rating of the players and their opponents were very close, 1425 and 1424 respectively.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.500   3.500   3.438   4.000   6.000
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1011    1280    1430    1425    1596    1794       4
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1186    1356    1418    1424    1496    1665
Here we take a look at the top ratings for pre tournament and opponents
Pre Tournament Rating Descending
name state total_pts pre_tournament_rating apcro
1 GARY HUA ON 6.0 1794 1605
25 LOREN SCHWIEBERT MI 3.5 1745 1363
4 PATRICK SCHILLING MI 5.5 1716 1574
11 CAMERON WILLIAM MC LEMAN MI 4.5 1712 1468
6 HANSEN SONG OH 5.0 1686 1519
13 TORRANCE HENRY JR MI 4.5 1666 1498
Average Pre Chess Rating of Opponent Descending
name state total_pts pre_tournament_rating apcro
3 ADITYA BAJAJ MI 6.0 1384 1665
9 STEFANO LEE ON 5.0 1411 1635
41 KYLE WILLIAM MURPHY MI 3.0 1403 1612
1 GARY HUA ON 6.0 1794 1605
4 PATRICK SCHILLING MI 5.5 1716 1574
2 DAKSHESH DARURI MI 6.0 1553 1561

The histogram of the total points show that a mean and median close to the center at 3.5 as was described above.

Here we look at the the Average Pre Chess Rating of Opponent against the Pre Tournament Rating. A x=y is added to identify which players played against opponents that were on average rated better (upper left side of line) or worse than themselves (lower right side of line). A color dimension is added to represent the number of points obtained. It appears that weaker players played players better than themselves and better players played weaker players than themselves, as can be expected.