Shana Green

DATA 607 - Project 1

Due Date: 9/19/2020

Data Analysis: Chess Tournament

In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players:

Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents

For the first player, the information would be:

Gary Hua, ON, 6.0, 1794, 1605

1605 was calculated by using the pre-tournament opponents’ ratings of 1436, 1563, 1600, 1610, 1649, 1663, 1716, and dividing by the total number of games played.

If you have questions about the meaning of the data or the results, please post them on the discussion forum. Data science, like chess, is a game of back and forth.

The chess rating system (invented by a Minnesota statistician named Arpad Elo) has been used in many other contexts, including assessing relative strength of employment candidates by human resource departments.

You may substitute another text file (or set of text files, or data scraped from web pages) of similar or greater complexity, and create your own assignment and solution. You may work in a small team. All of your code should be in an R markdown file (and published to rpubs.com); with your data accessible for the person running the script.

Loading the data

chess <- read.csv("C:/Users/SAGreen/Documents/SPS/Fall 2020/DATA 607/Project/Project 1/tournamentinfo.txt", header=F)

head(chess)
##                                                                                           V1
## 1  -----------------------------------------------------------------------------------------
## 2  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 
## 3  Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## 4  -----------------------------------------------------------------------------------------
## 5      1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## 6     ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |

Upon uploading the tournament data, there is a pattern of where the hyphens are located in each row. By running the head of the data, there are hyphens located in the 1st, 4th, 7th, 10th, (3n+1) row.

Removing the heading rows of the chess tournament

chess_info <- chess[-c(1:3),]
head(chess_info)
## [1] "-----------------------------------------------------------------------------------------"
## [2] "    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|"
## [3] "   ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |"
## [4] "-----------------------------------------------------------------------------------------"
## [5] "    2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|"
## [6] "   MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |"

Now that the heading is removed, I also noticed the subjects of each row also has a pattern. The numeric value of each player is listed in the 2nd, 5th, 8th (3n+2) row. The player’s states are listed in the 3rd, 6th, 9th, (3n+3) row.

Removing the player numeric value and the states listed

n <- length(chess_info)
first_row <- chess_info[seq(2, n, 3)]
second_row <- chess_info[seq(3, n, 3)]

Extracting data for each column

library(stringr)

Player <- as.integer(str_extract(first_row, "\\d+"))

Name <- str_trim(str_extract(first_row, "(\\w+\\s){2,3}")) 

State <- str_extract(second_row, "\\w+")

Points <- as.numeric(str_extract(first_row, "\\d+\\.\\d+"))

PreRating <- as.integer(str_extract(str_extract(second_row, "[^\\d]\\d{3,4}[^\\d]"), "\\d+"))

Opponents <- str_extract_all(str_extract_all(first_row, "\\d+\\|"), "\\d+")
## Warning in stri_extract_all_regex(string, pattern, simplify = simplify, :
## argument is not an atomic vector; coercing

Calculating the Average Opponent Scores

Opponent_PreRating <- numeric(n / 3)

for (i in 1:(n / 3)) { 
  Opponent_PreRating[i] <- mean(PreRating[as.numeric(unlist(Opponents[Player[i]]))]) 
}

Creating the Data Frame here

csv <- data.frame(Name, State, Points, PreRating, Opponent_PreRating); csv
##                        Name State Points PreRating Opponent_PreRating
## 1                  GARY HUA    ON    6.0      1794           1605.286
## 2           DAKSHESH DARURI    MI    6.0      1553           1469.286
## 3              ADITYA BAJAJ    MI    6.0      1384           1563.571
## 4       PATRICK H SCHILLING    MI    5.5      1716           1573.571
## 5                HANSHI ZUO    MI    5.5      1655           1500.857
## 6               HANSEN SONG    OH    5.0      1686           1518.714
## 7         GARY DEE SWATHELL    MI    5.0      1649           1372.143
## 8          EZEKIEL HOUGHTON    MI    5.0      1641           1468.429
## 9               STEFANO LEE    ON    5.0      1411           1523.143
## 10                ANVIT RAO    MI    5.0      1365           1554.143
## 11       CAMERON WILLIAM MC    MI    4.5      1712           1467.571
## 12           KENNETH J TACK    MI    4.5      1663           1506.167
## 13        TORRANCE HENRY JR    MI    4.5      1666           1497.857
## 14             BRADLEY SHAW    MI    4.5      1610           1515.000
## 15   ZACHARY JAMES HOUGHTON    MI    4.5      1220           1483.857
## 16             MIKE NIKITIN    MI    4.0      1604           1385.800
## 17       RONALD GRZEGORCZYK    MI    4.0      1629           1498.571
## 18            DAVID SUNDEEN    MI    4.0      1600           1480.000
## 19             DIPANKAR ROY    MI    4.0      1564           1426.286
## 20              JASON ZHENG    MI    4.0      1595           1410.857
## 21            DINH DANG BUI    ON    4.0      1563           1470.429
## 22         EUGENE L MCCLURE    MI    4.0      1555           1300.333
## 23                 ALAN BUI    ON    4.0      1363           1213.857
## 24        MICHAEL R ALDRICH    MI    4.0      1229           1357.000
## 25         LOREN SCHWIEBERT    MI    3.5      1745           1363.286
## 26                  MAX ZHU    ON    3.5      1579           1506.857
## 27           GAURAV GIDWANI    MI    3.5      1552           1221.667
## 28              SOFIA ADINA    MI    3.5      1507           1522.143
## 29         CHIEDOZIE OKORIE    MI    3.5      1602           1313.500
## 30       GEORGE AVERY JONES    ON    3.5      1522           1144.143
## 31             RISHI SHETTY    MI    3.5      1494           1259.857
## 32    JOSHUA PHILIP MATHEWS    ON    3.5      1441           1378.714
## 33                  JADE GE    MI    3.5      1449           1276.857
## 34   MICHAEL JEFFERY THOMAS    MI    3.5      1399           1375.286
## 35         JOSHUA DAVID LEE    MI    3.5      1438           1149.714
## 36            SIDDHARTH JHA    MI    3.5      1355           1388.167
## 37     AMIYATOSH PWNANANDAM    MI    3.5       980           1384.800
## 38                BRIAN LIU    MI    3.0      1423           1539.167
## 39            JOEL R HENDON    MI    3.0      1436           1429.571
## 40             FOREST ZHANG    MI    3.0      1348           1390.571
## 41      KYLE WILLIAM MURPHY    MI    3.0      1403           1248.500
## 42                 JARED GE    MI    3.0      1332           1149.857
## 43        ROBERT GLEN VASEY    MI    3.0      1283           1106.571
## 44       JUSTIN D SCHILLING    MI    3.0      1199           1327.000
## 45                DEREK YAN    MI    3.0      1242           1152.000
## 46 JACOB ALEXANDER LAVALLEY    MI    3.0       377           1357.714
## 47              ERIC WRIGHT    MI    2.5      1362           1392.000
## 48             DANIEL KHAIN    MI    2.5      1382           1355.800
## 49         MICHAEL J MARTIN    MI    2.5      1291           1285.800
## 50               SHIVAM JHA    MI    2.5      1056           1296.000
## 51           TEJAS AYYAGARI    MI    2.5      1011           1356.143
## 52                ETHAN GUO    MI    2.5       935           1494.571
## 53            JOSE C YBARRA    MI    2.0      1393           1345.333
## 54              LARRY HODGE    MI    2.0      1270           1206.167
## 55                ALEX KONG    MI    2.0      1186           1406.000
## 56             MARISA RICCI    MI    2.0      1153           1414.400
## 57               MICHAEL LU    MI    2.0      1092           1363.000
## 58             VIRAJ MOHILE    MI    2.0       917           1391.000
## 59                SEAN M MC    MI    2.0       853           1319.000
## 60               JULIA SHEN    MI    1.5       967           1330.200
## 61            JEZZEL FARKAS    ON    1.5       955           1327.286
## 62            ASHWIN BALAJI    MI    1.0      1530           1186.000
## 63     THOMAS JOSEPH HOSMER    MI    1.0      1175           1350.200
## 64                   BEN LI    MI    1.0      1163           1263.000

Exporting Data into a CSV file

write.table(csv, file = "DATA 607 - Project1.csv", sep = ",", col.names = T)

Github link here

Rpubs link here