DATA 607 Project One 9-19-21

Project One Requirements

In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players:

Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents. For the first player, the information would be:

Gary Hua, ON, 6.0, 1794, 1605. The 1605 was calculated by using the pre-tournament opponents’ ratings of 1436, 1563, 1600, 1610, 1649, 1663, 1716, and dividing by the total number of games played.

The chess rating system (invented by a Minnesota statistician named Arpad Elo) has been used in many other contexts, including assessing relative strength of employment candidates by human resource departments.

All of your code should be in an R markdown file (and published to rpubs.com); with your data accessible for the person running the script.

Background

In order to understand the problem and draw ideas on how to best work through the problem, I reviewed other similar works in RPubs as well as assigned readings and many tutorials on dplyr and stringr to understand how to apply concepts. I created my own version of explanatory notes and how to solve the problem.

First, we load appropriate libraries and import the dataset. We renamed the file “tournamentinfo” into an object labeled “Chess”. The first four rows of the dataset begins with a hyphenated line, the word “Pair”, the word “Num” and another a hyphenated line.

Global variables for CSV source and destination. To make it easier for future use, we’ll set global variables for the CSV source URL and full local path for the CSV file.

csv_url <- "https://raw.githubusercontent.com/professorfoy/DATA-607/main/tournamentinfo.txt"
csv_output_file <- "/Users/clee/Documents/chess_ratings.csv"

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.3     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(dplyr)
library(stringr)
library(ggplot2)
Chess <- read.csv(csv_url, header = FALSE)

View the variable names with the “str” command. Confirm that all 196 rows of data was imported.

str(Chess)

## 'data.frame':    196 obs. of  1 variable:
##  $ V1: chr  "-----------------------------------------------------------------------------------------" " Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| " " Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | " "-----------------------------------------------------------------------------------------" ...

Once importation of the full dataset has been verified, the first four rows of the dataset must be removed as they will not be part of the analysis. To accomplish this, a new object is created labeled “Chess_2”.

Chess_2 <- Chess[-c(1:4),]

Once the first four rows of data have been removed, the data containing the Player’s Name, Player’s State, Total Number of Points, and Player’s Pre-Rating have to be isolated and unwanted symbols removed. We could further remove all the lines that are hyphenated, but it is more efficient to isolate the rows of desired data. To do this, we need to know the total length of the Chess_2 dataset. Since we removed the first four rows of data, the total length should be reduced to 192 rows, which will start with the player name. we ran the “str” function to verify that Chess_2 in fact has 192 rows of data remaining.

str(Chess_2)

##  chr [1:192] "    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|" ...

Now that C verified the length of the dataset as having 192 rows, we can extract the rows with the data we am seeking. However, the dataset contains rows that have hyphenated lines across every 3rd row. To bypass the hyphenated lines, we created two objects to represent each row of data that is to be retained and pass them through a sequence command that will look for the row with the Player’s name, starting with the first row, skipping the 2nd and 3rd row, all the way through 192 rows. Another sequence command is given to look for the row with the Player’s state, starting with the second row, skipping the 1st and 3rd row, all the way through 192 rows.

Player_name_row <- Chess_2[seq(1, 192, 3)]
Player_state_row <- Chess_2[seq(2, 192, 3)]

To verify that these rows have been isolated, we passed each object through a head.matrix command to display all remaining rows.

head.matrix(Player_name_row)

##  [1] "    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|"
##  [2] "    2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|"
##  [3] "    3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12|"
##  [4] "    4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1|"
##  [5] "    5 | HANSHI ZUO                      |5.5  |W  45|W  37|D  12|D  13|D   4|W  14|W  17|"
##  [6] "    6 | HANSEN SONG                     |5.0  |W  34|D  29|L  11|W  35|D  10|W  27|W  21|"
##  [7] "    7 | GARY DEE SWATHELL               |5.0  |W  57|W  46|W  13|W  11|L   1|W   9|L   2|"
##  [8] "    8 | EZEKIEL HOUGHTON                |5.0  |W   3|W  32|L  14|L   9|W  47|W  28|W  19|"
##  [9] "    9 | STEFANO LEE                     |5.0  |W  25|L  18|W  59|W   8|W  26|L   7|W  20|"
## [10] "   10 | ANVIT RAO                       |5.0  |D  16|L  19|W  55|W  31|D   6|W  25|W  18|"
## [11] "   11 | CAMERON WILLIAM MC LEMAN        |4.5  |D  38|W  56|W   6|L   7|L   3|W  34|W  26|"
## [12] "   12 | KENNETH J TACK                  |4.5  |W  42|W  33|D   5|W  38|H    |D   1|L   3|"
## [13] "   13 | TORRANCE HENRY JR               |4.5  |W  36|W  27|L   7|D   5|W  33|L   3|W  32|"
## [14] "   14 | BRADLEY SHAW                    |4.5  |W  54|W  44|W   8|L   1|D  27|L   5|W  31|"
## [15] "   15 | ZACHARY JAMES HOUGHTON          |4.5  |D  19|L  16|W  30|L  22|W  54|W  33|W  38|"
## [16] "   16 | MIKE NIKITIN                    |4.0  |D  10|W  15|H    |W  39|L   2|W  36|U    |"
## [17] "   17 | RONALD GRZEGORCZYK              |4.0  |W  48|W  41|L  26|L   2|W  23|W  22|L   5|"
## [18] "   18 | DAVID SUNDEEN                   |4.0  |W  47|W   9|L   1|W  32|L  19|W  38|L  10|"
## [19] "   19 | DIPANKAR ROY                    |4.0  |D  15|W  10|W  52|D  28|W  18|L   4|L   8|"
## [20] "   20 | JASON ZHENG                     |4.0  |L  40|W  49|W  23|W  41|W  28|L   2|L   9|"
## [21] "   21 | DINH DANG BUI                   |4.0  |W  43|L   1|W  47|L   3|W  40|W  39|L   6|"
## [22] "   22 | EUGENE L MCCLURE                |4.0  |W  64|D  52|L  28|W  15|H    |L  17|W  40|"
## [23] "   23 | ALAN BUI                        |4.0  |L   4|W  43|L  20|W  58|L  17|W  37|W  46|"
## [24] "   24 | MICHAEL R ALDRICH               |4.0  |L  28|L  47|W  43|L  25|W  60|W  44|W  39|"
## [25] "   25 | LOREN SCHWIEBERT                |3.5  |L   9|W  53|L   3|W  24|D  34|L  10|W  47|"
## [26] "   26 | MAX ZHU                         |3.5  |W  49|W  40|W  17|L   4|L   9|D  32|L  11|"
## [27] "   27 | GAURAV GIDWANI                  |3.5  |W  51|L  13|W  46|W  37|D  14|L   6|U    |"
## [28] "   28 | SOFIA ADINA STANESCU-BELLU      |3.5  |W  24|D   4|W  22|D  19|L  20|L   8|D  36|"
## [29] "   29 | CHIEDOZIE OKORIE                |3.5  |W  50|D   6|L  38|L  34|W  52|W  48|U    |"
## [30] "   30 | GEORGE AVERY JONES              |3.5  |L  52|D  64|L  15|W  55|L  31|W  61|W  50|"
## [31] "   31 | RISHI SHETTY                    |3.5  |L  58|D  55|W  64|L  10|W  30|W  50|L  14|"
## [32] "   32 | JOSHUA PHILIP MATHEWS           |3.5  |W  61|L   8|W  44|L  18|W  51|D  26|L  13|"
## [33] "   33 | JADE GE                         |3.5  |W  60|L  12|W  50|D  36|L  13|L  15|W  51|"
## [34] "   34 | MICHAEL JEFFERY THOMAS          |3.5  |L   6|W  60|L  37|W  29|D  25|L  11|W  52|"
## [35] "   35 | JOSHUA DAVID LEE                |3.5  |L  46|L  38|W  56|L   6|W  57|D  52|W  48|"
## [36] "   36 | SIDDHARTH JHA                   |3.5  |L  13|W  57|W  51|D  33|H    |L  16|D  28|"
## [37] "   37 | AMIYATOSH PWNANANDAM            |3.5  |B    |L   5|W  34|L  27|H    |L  23|W  61|"
## [38] "   38 | BRIAN LIU                       |3.0  |D  11|W  35|W  29|L  12|H    |L  18|L  15|"
## [39] "   39 | JOEL R HENDON                   |3.0  |L   1|W  54|W  40|L  16|W  44|L  21|L  24|"
## [40] "   40 | FOREST ZHANG                    |3.0  |W  20|L  26|L  39|W  59|L  21|W  56|L  22|"
## [41] "   41 | KYLE WILLIAM MURPHY             |3.0  |W  59|L  17|W  58|L  20|X    |U    |U    |"
## [42] "   42 | JARED GE                        |3.0  |L  12|L  50|L  57|D  60|D  61|W  64|W  56|"
## [43] "   43 | ROBERT GLEN VASEY               |3.0  |L  21|L  23|L  24|W  63|W  59|L  46|W  55|"
## [44] "   44 | JUSTIN D SCHILLING              |3.0  |B    |L  14|L  32|W  53|L  39|L  24|W  59|"
## [45] "   45 | DEREK YAN                       |3.0  |L   5|L  51|D  60|L  56|W  63|D  55|W  58|"
## [46] "   46 | JACOB ALEXANDER LAVALLEY        |3.0  |W  35|L   7|L  27|L  50|W  64|W  43|L  23|"
## [47] "   47 | ERIC WRIGHT                     |2.5  |L  18|W  24|L  21|W  61|L   8|D  51|L  25|"
## [48] "   48 | DANIEL KHAIN                    |2.5  |L  17|W  63|H    |D  52|H    |L  29|L  35|"
## [49] "   49 | MICHAEL J MARTIN                |2.5  |L  26|L  20|D  63|D  64|W  58|H    |U    |"
## [50] "   50 | SHIVAM JHA                      |2.5  |L  29|W  42|L  33|W  46|H    |L  31|L  30|"
## [51] "   51 | TEJAS AYYAGARI                  |2.5  |L  27|W  45|L  36|W  57|L  32|D  47|L  33|"
## [52] "   52 | ETHAN GUO                       |2.5  |W  30|D  22|L  19|D  48|L  29|D  35|L  34|"
## [53] "   53 | JOSE C YBARRA                   |2.0  |H    |L  25|H    |L  44|U    |W  57|U    |"
## [54] "   54 | LARRY HODGE                     |2.0  |L  14|L  39|L  61|B    |L  15|L  59|W  64|"
## [55] "   55 | ALEX KONG                       |2.0  |L  62|D  31|L  10|L  30|B    |D  45|L  43|"
## [56] "   56 | MARISA RICCI                    |2.0  |H    |L  11|L  35|W  45|H    |L  40|L  42|"
## [57] "   57 | MICHAEL LU                      |2.0  |L   7|L  36|W  42|L  51|L  35|L  53|B    |"
## [58] "   58 | VIRAJ MOHILE                    |2.0  |W  31|L   2|L  41|L  23|L  49|B    |L  45|"
## [59] "   59 | SEAN M MC CORMICK               |2.0  |L  41|B    |L   9|L  40|L  43|W  54|L  44|"
## [60] "   60 | JULIA SHEN                      |1.5  |L  33|L  34|D  45|D  42|L  24|H    |U    |"
## [61] "   61 | JEZZEL FARKAS                   |1.5  |L  32|L   3|W  54|L  47|D  42|L  30|L  37|"
## [62] "   62 | ASHWIN BALAJI                   |1.0  |W  55|U    |U    |U    |U    |U    |U    |"
## [63] "   63 | THOMAS JOSEPH HOSMER            |1.0  |L   2|L  48|D  49|L  43|L  45|H    |U    |"
## [64] "   64 | BEN LI                          |1.0  |L  22|D  30|L  31|D  49|L  46|L  42|L  54|"

head.matrix(Player_state_row)

##  [1] "   ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |"
##  [2] "   MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |"
##  [3] "   MI | 14959604 / R: 1384   ->1640     |N:2  |W    |B    |W    |B    |W    |B    |W    |"
##  [4] "   MI | 12616049 / R: 1716   ->1744     |N:2  |W    |B    |W    |B    |W    |B    |B    |"
##  [5] "   MI | 14601533 / R: 1655   ->1690     |N:2  |B    |W    |B    |W    |B    |W    |B    |"
##  [6] "   OH | 15055204 / R: 1686   ->1687     |N:3  |W    |B    |W    |B    |B    |W    |B    |"
##  [7] "   MI | 11146376 / R: 1649   ->1673     |N:3  |W    |B    |W    |B    |B    |W    |W    |"
##  [8] "   MI | 15142253 / R: 1641P17->1657P24  |N:3  |B    |W    |B    |W    |B    |W    |W    |"
##  [9] "   ON | 14954524 / R: 1411   ->1564     |N:2  |W    |B    |W    |B    |W    |B    |B    |"
## [10] "   MI | 14150362 / R: 1365   ->1544     |N:3  |W    |W    |B    |B    |W    |B    |W    |"
## [11] "   MI | 12581589 / R: 1712   ->1696     |N:3  |B    |W    |B    |W    |B    |W    |B    |"
## [12] "   MI | 12681257 / R: 1663   ->1670     |N:3  |W    |B    |W    |B    |     |W    |B    |"
## [13] "   MI | 15082995 / R: 1666   ->1662     |N:3  |B    |W    |B    |B    |W    |W    |B    |"
## [14] "   MI | 10131499 / R: 1610   ->1618     |N:3  |W    |B    |W    |W    |B    |B    |W    |"
## [15] "   MI | 15619130 / R: 1220P13->1416P20  |N:3  |B    |B    |W    |W    |B    |B    |W    |"
## [16] "   MI | 10295068 / R: 1604   ->1613     |N:3  |B    |W    |     |B    |W    |B    |     |"
## [17] "   MI | 10297702 / R: 1629   ->1610     |N:3  |W    |B    |W    |B    |W    |B    |W    |"
## [18] "   MI | 11342094 / R: 1600   ->1600     |N:3  |B    |W    |B    |W    |B    |W    |B    |"
## [19] "   MI | 14862333 / R: 1564   ->1570     |N:3  |W    |B    |W    |B    |W    |W    |B    |"
## [20] "   MI | 14529060 / R: 1595   ->1569     |N:4  |W    |B    |W    |B    |W    |B    |W    |"
## [21] "   ON | 15495066 / R: 1563P22->1562     |N:3  |B    |W    |B    |W    |W    |B    |W    |"
## [22] "   MI | 12405534 / R: 1555   ->1529     |N:4  |W    |B    |W    |B    |     |W    |B    |"
## [23] "   ON | 15030142 / R: 1363   ->1371     |     |B    |W    |B    |W    |B    |W    |B    |"
## [24] "   MI | 13469010 / R: 1229   ->1300     |N:4  |B    |W    |B    |B    |W    |W    |B    |"
## [25] "   MI | 12486656 / R: 1745   ->1681     |N:4  |B    |W    |B    |W    |B    |W    |B    |"
## [26] "   ON | 15131520 / R: 1579   ->1564     |N:4  |B    |W    |B    |W    |B    |W    |W    |"
## [27] "   MI | 14476567 / R: 1552   ->1539     |N:4  |W    |B    |W    |B    |W    |B    |     |"
## [28] "   MI | 14882954 / R: 1507   ->1513     |N:3  |W    |W    |B    |W    |B    |B    |W    |"
## [29] "   MI | 15323285 / R: 1602P6 ->1508P12  |N:4  |B    |W    |B    |W    |W    |B    |     |"
## [30] "   ON | 12577178 / R: 1522   ->1444     |     |W    |B    |B    |W    |W    |B    |B    |"
## [31] "   MI | 15131618 / R: 1494   ->1444     |     |B    |W    |B    |W    |B    |W    |B    |"
## [32] "   ON | 14073750 / R: 1441   ->1433     |N:4  |W    |B    |W    |B    |W    |B    |W    |"
## [33] "   MI | 14691842 / R: 1449   ->1421     |     |B    |W    |B    |W    |B    |W    |B    |"
## [34] "   MI | 15051807 / R: 1399   ->1400     |     |B    |W    |B    |B    |W    |B    |W    |"
## [35] "   MI | 14601397 / R: 1438   ->1392     |     |W    |W    |B    |W    |B    |B    |W    |"
## [36] "   MI | 14773163 / R: 1355   ->1367     |N:4  |W    |B    |W    |B    |     |W    |B    |"
## [37] "   MI | 15489571 / R:  980P12->1077P17  |     |     |B    |W    |W    |     |B    |W    |"
## [38] "   MI | 15108523 / R: 1423   ->1439     |N:4  |W    |B    |W    |W    |     |B    |B    |"
## [39] "   MI | 12923035 / R: 1436P23->1413     |N:4  |B    |W    |B    |W    |B    |W    |W    |"
## [40] "   MI | 14892710 / R: 1348   ->1346     |     |B    |B    |W    |W    |B    |W    |W    |"
## [41] "   MI | 15761443 / R: 1403P5 ->1341P9   |     |B    |W    |B    |W    |     |     |     |"
## [42] "   MI | 14462326 / R: 1332   ->1256     |     |B    |W    |B    |B    |W    |W    |B    |"
## [43] "   MI | 14101068 / R: 1283   ->1244     |     |W    |B    |W    |W    |B    |B    |W    |"
## [44] "   MI | 15323504 / R: 1199   ->1199     |     |     |W    |B    |B    |W    |B    |W    |"
## [45] "   MI | 15372807 / R: 1242   ->1191     |     |W    |B    |W    |B    |W    |B    |W    |"
## [46] "   MI | 15490981 / R:  377P3 ->1076P10  |     |B    |W    |B    |W    |B    |W    |W    |"
## [47] "   MI | 12533115 / R: 1362   ->1341     |     |W    |B    |W    |B    |W    |B    |W    |"
## [48] "   MI | 14369165 / R: 1382   ->1335     |     |B    |W    |     |B    |     |W    |B    |"
## [49] "   MI | 12531685 / R: 1291P12->1259P17  |     |W    |W    |B    |W    |B    |     |     |"
## [50] "   MI | 14773178 / R: 1056   ->1111     |     |W    |B    |W    |B    |     |B    |W    |"
## [51] "   MI | 15205474 / R: 1011   ->1097     |     |B    |W    |B    |W    |B    |W    |W    |"
## [52] "   MI | 14918803 / R:  935   ->1092     |N:4  |B    |W    |B    |W    |B    |W    |B    |"
## [53] "   MI | 12578849 / R: 1393   ->1359     |     |     |B    |     |W    |     |W    |     |"
## [54] "   MI | 12836773 / R: 1270   ->1200     |     |B    |B    |W    |     |W    |B    |W    |"
## [55] "   MI | 15412571 / R: 1186   ->1163     |     |W    |B    |W    |B    |     |W    |B    |"
## [56] "   MI | 14679887 / R: 1153   ->1140     |     |     |B    |W    |W    |     |B    |W    |"
## [57] "   MI | 15113330 / R: 1092   ->1079     |     |B    |W    |W    |B    |W    |B    |     |"
## [58] "   MI | 14700365 / R:  917   -> 941     |     |W    |B    |W    |B    |W    |     |B    |"
## [59] "   MI | 12841036 / R:  853   -> 878     |     |W    |     |B    |B    |W    |W    |B    |"
## [60] "   MI | 14579262 / R:  967   -> 984     |     |W    |B    |B    |W    |B    |     |     |"
## [61] "   ON | 15771592 / R:  955P11-> 979P18  |     |B    |W    |B    |W    |B    |W    |B    |"
## [62] "   MI | 15219542 / R: 1530   ->1535     |     |B    |     |     |     |     |     |     |"
## [63] "   MI | 15057092 / R: 1175   ->1125     |     |W    |B    |W    |B    |B    |     |     |"
## [64] "   MI | 15006561 / R: 1163   ->1112     |     |B    |W    |W    |B    |W    |B    |B    |"

The task is to extract the necessary data and avoid unwanted symbols across each row of data in the two objects. To extract the necessary data, we assigned new objects to each of the data fields that is required (Player’s Name, Player’s State, Total Number of Points, and Player’s Pre-Rating). We applied regular expression (regex) commands. We specifically used the str_extract(string, pattern) to create vectors for each object.

Extract Player’s Names and create the Player_Name object:

Player_Name <- str_extract(Player_name_row, "([a-zA-Z]+)\\s*([a-zA-Z]+)\\s*+([a-zA-Z]+)\\s*+([a-zA-Z]+)")
head(Player_Name)

## [1] "GARY HUA"            "DAKSHESH DARURI"     "ADITYA BAJAJ"       
## [4] "PATRICK H SCHILLING" "HANSHI ZUO"          "HANSEN SONG"

Extract Player’s States and create the Player_State object:

Player_State <- str_extract(Player_state_row, "([a-zA-Z]+)")
head(Player_State)

## [1] "ON" "MI" "MI" "MI" "MI" "OH"

Extract Player’s Total Number of Points and create the Player_Points object:

Player_Points <- str_extract(Player_name_row, "\\d.\\d")
Player_Points <- as.numeric(Player_Points)

# This next statement chains multiple functions together to extract each player's post rating.
# We have to account for the right-arrow with an optional space afterwards (e.g. -> 1672P234).
# Reading the statement from the inside out:
#   1. Extract all of the digits to the right of the right-arrow (str_extract)
#   2. Remove the white space and right-arrow (str_replace)
#   3. Convert the characters to integers (as.integer)
Player_PostRating <- as.integer(
                                  str_replace(
                                    str_extract(Player_state_row,"-> *(\\d+)"), 
                                    "-> *", "")
                                  )

Extract Player’s Player’s Pre-Rating - Part I. To extract the Player’s Pre-Rating, it required a two-step process. First, we created an object to isolate the Player_Pre-Rating ranking. To capture the data and not confound it with the Post-Rating, we extracted the R: with the Pre-Rating.

Player_PreRate <- str_extract(Player_state_row, "R:\\s*[[:digit:]]{3,}")
head(Player_PreRate)

## [1] "R: 1794" "R: 1553" "R: 1384" "R: 1716" "R: 1655" "R: 1686"

Extract Player’s Player’s Pre-Rating - Part II. After extracting the R: with the Player’s Pre-Rating, we created another object, Player_PreRating2 to remove the R: and whitespace between ranking number. This procedure isolates the Pre-Rating.

Player_PreRating <- str_replace_all(Player_PreRate, "R:\\s", "")
Player_PreRating <- as.integer(Player_PreRating)
head(Player_PreRating)

## [1] 1794 1553 1384 1716 1655 1686

Average Rating

In the Player_name_row dataset, each row records each win(W), lose(L) or draw(D), U or B against each opponent. The opponent has a corresponding list number in each row. Each W, L D U or B and opponent list number are separated by vertical pipes. For example, the format appears like this for a win against opponent 39 in the list: |W 39|. In the Player_state_row dataset, each opponent list number has an associated pre-player rating, which as been extracted as a list in the object Player_PreRating. To calculate the average rating for each opponent, we will need to isolate the opponent numbers in the Player_name_row dataset and match them with the list number in the Player_PreRating dataset. Then, create a new object that calculates the average for each row of data.

Since the format appears like this |(W)(L)(D)(U)(B) #| for each opponent in the list, an easy way to extract the opponent number is isolate the W, L D U or B, plus whitespace and player list number for each opponent and then remove the letters, plus whitespace.

opponent_list <- str_extract_all(Player_name_row, "\\|+\\w+\\s+.*")
head(opponent_list)

## [[1]]
## [1] "|W  39|W  21|W  18|W  14|W   7|D  12|D   4|"
## 
## [[2]]
## [1] "|W  63|W  58|L   4|W  17|W  16|W  20|W   7|"
## 
## [[3]]
## [1] "|L   8|W  61|W  25|W  21|W  11|W  13|W  12|"
## 
## [[4]]
## [1] "|W  23|D  28|W   2|W  26|D   5|W  19|D   1|"
## 
## [[5]]
## [1] "|W  45|W  37|D  12|D  13|D   4|W  14|W  17|"
## 
## [[6]]
## [1] "|W  34|D  29|L  11|W  35|D  10|W  27|W  21|"

After the zeros have been inserted, we removed the letters, whitespace and vertical pipes to isolate just the opponent list number. We labeled a new object, “opponent_list2” to represent the opponents list numbers. Each round now has a number associated with it to show who are the opponents or zero if there were no opponents (players may have dropped out of the tournament).

opponent_list2 <- str_extract_all(opponent_list, "\\d+")
head(opponent_list2)

## [[1]]
## [1] "39" "21" "18" "14" "7"  "12" "4" 
## 
## [[2]]
## [1] "63" "58" "4"  "17" "16" "20" "7" 
## 
## [[3]]
## [1] "8"  "61" "25" "21" "11" "13" "12"
## 
## [[4]]
## [1] "23" "28" "2"  "26" "5"  "19" "1" 
## 
## [[5]]
## [1] "45" "37" "12" "13" "4"  "14" "17"
## 
## [[6]]
## [1] "34" "29" "11" "35" "10" "27" "21"

With the opponent list numbers isolated in new object labeled opponent_list2, the goal is to calculate the prerating rating of each player’s opponents.

Specifically, since opponent_list2 is a list of character vectors (see below):

[[1]] “39” “21” “18” “14” “7” “12” “4”

[[2]] “63” “58” “4” “17” “16” “20” “7” . . .

For index [[1]], iterate through each element and find the opponent_rating: “39” “21” “18” “14” “7” “12” “4” -> 1436 1563 1600 1610 1649 1663 1716

And then calculate the mean(1436, 1563, 1600, 1610, 1649, 1663, 1716) = 1605.

Repeat the above steps through the rest of the opponent_list2.

# Initialize a vector of opponent ratings and player expected ratings
Average_PreChess_Rating_Opponent <- c()

# We'll also calculate a player's expected rating
Player_ExpPoints <- c()

# Iterate through the opponent_list2 by index
for (player_index in 1:length(opponent_list2)) {
  # Initialize temporary vector to hold the opponent ratings
  opponent_ratings <- c()

  # Iterate through every opponent_list2 vector
  # We also convert the character vector to integers
  for (s in as.integer(opponent_list2[[player_index]])) {
    opponent_ratings <- append(opponent_ratings, as.integer(Player_PreRating[[s]]))
  }
  # Remember to round the mean to zero decimal places
  average_opponent_prechess_rating = round(mean(opponent_ratings))[1]
  
  # Expected rating for the player to one significant digit
  # Reference:
  #   - https://chess.stackexchange.com/questions/18209/how-do-you-calculate-your-tournament-performance-rating
  #   - https://docs.google.com/spreadsheets/d/19yWu1PxCDwx0phTqeoKzdyP21FiRUgNfp6B6vktSND0/copy
  expected_points = 1/(10^((mean(opponent_ratings) - Player_PreRating[player_index])/400)+1) * length(opponent_ratings)
  expected_points = round( expected_points, 1 )
  
  # Save the results
  Average_PreChess_Rating_Opponent <- append(Average_PreChess_Rating_Opponent, average_opponent_prechess_rating)
  Player_ExpPoints <- append(Player_ExpPoints, expected_points)
  
}
head(Average_PreChess_Rating_Opponent)

## [1] 1605 1469 1564 1574 1501 1519

head(Player_ExpPoints)

## [1] 5.2 4.3 1.8 4.9 5.0 5.1

Finally, we create the CSV dataframe:

csv <- data.frame(Player_Name, Player_State, Player_Points, Player_ExpPoints, Player_PreRating, Player_PostRating,Average_PreChess_Rating_Opponent)
csv <- csv %>%
  mutate (PointDifferential = Player_Points - Player_ExpPoints)
csv <- csv %>%
  mutate (RatingDifferential = Player_PostRating - Player_PreRating)

# Write the csv to a file
write_csv(csv, csv_output_file)

head(csv)

##           Player_Name Player_State Player_Points Player_ExpPoints
## 1            GARY HUA           ON           6.0              5.2
## 2     DAKSHESH DARURI           MI           6.0              4.3
## 3        ADITYA BAJAJ           MI           6.0              1.8
## 4 PATRICK H SCHILLING           MI           5.5              4.9
## 5          HANSHI ZUO           MI           5.5              5.0
## 6         HANSEN SONG           OH           5.0              5.1
##   Player_PreRating Player_PostRating Average_PreChess_Rating_Opponent
## 1             1794              1817                             1605
## 2             1553              1663                             1469
## 3             1384              1640                             1564
## 4             1716              1744                             1574
## 5             1655              1690                             1501
## 6             1686              1687                             1519
##   PointDifferential RatingDifferential
## 1               0.8                 23
## 2               1.7                110
## 3               4.2                256
## 4               0.6                 28
## 5               0.5                 35
## 6              -0.1                  1

Plots

Histogram

ggplot(data = csv, aes(x=Player_PreRating)) + geom_histogram(binwidth=25)

ggplot(data = csv, aes(x=Player_PostRating)) + geom_histogram(binwidth=25)

Boxplot

ggplot(data = csv, aes(y=Player_PreRating)) + geom_boxplot()

Scatter plot, color by state

ggplot(data = csv, aes(x=Player_PostRating, y=Player_Points, color=Player_State)) + geom_point()

DATA 607 Project One 9-19-21

9/19/2021

Project One Requirements

Background

Plots