Collaborators: Dan Rosenfeld, Magnus Skonberg, and Rick Sughrue.

Overview

In this project, we’re given a text file with chess tournament results where the information has some structure. Our job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players:

Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents

For the first player, the information would be: Gary Hua, ON, 6.0, 1794, 1605

1605 was calculated by using the pre-tournament opponents’ ratings of 1436, 1563, 1600, 1610, 1649, 1663, 1716, and dividing by the total number of games played.

Libraries

library(stringr)
library(tidyverse)
## -- Attaching packages -------------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v dplyr   1.0.2
## v tidyr   1.1.2     v forcats 0.5.0
## v readr   1.3.1
## -- Conflicts ----------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(downloader)
library(readr)
library(knitr)
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows

Download text file from GitHub to local machine

# Download text file to the local machine

url <- "https://raw.githubusercontent.com/jnataky/DATA606project1/master/tournamentinfo.txt" 
chess_txt <- "tournamentinfo.txt"

downloader::download(url, chess_txt)

getwd()
## [1] "C:/Users/ataky/OneDrive/Documents/DATA607"

Read the text file

chess <- readLines(chess_txt)
## Warning in readLines(chess_txt): incomplete final line found on
## 'tournamentinfo.txt'
#Get the insight
chess[1:10]
##  [1] "-----------------------------------------------------------------------------------------" 
##  [2] " Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| "
##  [3] " Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | "
##  [4] "-----------------------------------------------------------------------------------------" 
##  [5] "    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|" 
##  [6] "   ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |" 
##  [7] "-----------------------------------------------------------------------------------------" 
##  [8] "    2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|" 
##  [9] "   MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |" 
## [10] "-----------------------------------------------------------------------------------------"

Variables extraction & patterns

This section is about extraction of all the string variable from chess tournament file provided following different patterns, then make some transformation and calculation as needed. Str and unlist will be widely used in here.

# Patterns

pn <- "\\w+[^USCF|a-z] ?\\w+ \\w+"  # Player's name pattern
ps <- "(?:^|\\W)ON | MI | OH(?:$|\\W)" # Player's state pattern
pp <- "\\d\\.\\d"   # Player's total points pattern


# Extract player's names, player' state, and total point following the above patterns

PlayerName <- unlist(str_extract_all(chess, pn))

PlayerState <- unlist (str_extract_all(chess, ps))

PlayerPoints <- unlist(str_extract_all(chess, pp ))
# Pre-rating extraction 
# Patterns

p1 <- "(R:\\s*)(\\d+)"  # Player's pre-rating pattern 1
p2 <- "(\\d+)" # Player's pre-rating pattern 2

# Extraction: this will be done following the two patterns p1 and p2 simultaneously

PreRating <- unlist(str_extract_all(chess, p1))

PreRating <- unlist(str_extract_all(PreRating, p2))

PreRating <- as.numeric(PreRating) # String to numeric

PreRating
##  [1] 1794 1553 1384 1716 1655 1686 1649 1641 1411 1365 1712 1663 1666 1610 1220
## [16] 1604 1629 1600 1564 1595 1563 1555 1363 1229 1745 1579 1552 1507 1602 1522
## [31] 1494 1441 1449 1399 1438 1355  980 1423 1436 1348 1403 1332 1283 1199 1242
## [46]  377 1362 1382 1291 1056 1011  935 1393 1270 1186 1153 1092  917  853  967
## [61]  955 1530 1175 1163
# Extract the average rating

# Patterns

A1 <- "\\|[0-9].*"  # Average rating pattern 1
A2 <- "\\s\\d{1,2}"  # Average rating pattern 2

# String extraction

AveRating <- unlist(str_extract_all(chess, A1))

AveRating <- str_replace_all(AveRating, "\\s{1,2}\\|","0|") # Replace empty (unplayed game) by 0
AveRating <- (str_extract_all(AveRating, A2))
# Put all AveRating into a matrix 
# This will make our further calculation easier 

AveRating_matrix <- matrix(unlist(AveRating),  byrow= TRUE, nrow = length(AveRating))
# Convert string values of the above matrix to numeric
# Note that apply converts it but chnages the dimension
#Use transform t to bring back the initial dimension

AveRating_matrix_t <- t(apply(AveRating_matrix, 1, as.numeric))
# Replace 0 values by the NA to calculate further the mean
#by considering only played games
#Note that 0 represented unplayed game


for (r in 1:nrow(AveRating_matrix_t)) {
  
  for (c in 1:ncol(AveRating_matrix_t)) {
    
    if (AveRating_matrix_t[r,c] == 0){
      
      AveRating_matrix_t[r,c] = NA
    } else {
      AveRating_matrix_t[r,c] <- PreRating[AveRating_matrix_t[r,c]]
    }
    
  }
  
  
  
}
# Calculate the opponents average using rowMeans

OpponentsAvg <- c(rowMeans(AveRating_matrix_t, na.rm = TRUE))

OpponentsAvg
##  [1] 1605.286 1469.286 1563.571 1573.571 1500.857 1518.714 1372.143 1468.429
##  [9] 1523.143 1554.143 1467.571 1506.167 1497.857 1515.000 1483.857 1385.800
## [17] 1498.571 1480.000 1426.286 1410.857 1470.429 1300.333 1213.857 1357.000
## [25] 1363.286 1506.857 1221.667 1522.143 1313.500 1144.143 1259.857 1378.714
## [33] 1276.857 1375.286 1149.714 1388.167 1384.800 1539.167 1429.571 1390.571
## [41] 1248.500 1149.857 1106.571 1327.000 1152.000 1357.714 1392.000 1355.800
## [49] 1285.800 1296.000 1356.143 1494.571 1345.333 1206.167 1406.000 1414.400
## [57] 1363.000 1391.000 1319.000 1330.200 1327.286 1186.000 1350.200 1263.000
# Round the opponents pre-rating average

OpponentsAvg <- round(OpponentsAvg, 0)

OpponentsAvg
##  [1] 1605 1469 1564 1574 1501 1519 1372 1468 1523 1554 1468 1506 1498 1515 1484
## [16] 1386 1499 1480 1426 1411 1470 1300 1214 1357 1363 1507 1222 1522 1314 1144
## [31] 1260 1379 1277 1375 1150 1388 1385 1539 1430 1391 1248 1150 1107 1327 1152
## [46] 1358 1392 1356 1286 1296 1356 1495 1345 1206 1406 1414 1363 1391 1319 1330
## [61] 1327 1186 1350 1263
# Construct a data frame for to output the result

results <- data.frame(PlayerName, PlayerState, PlayerPoints, PreRating, OpponentsAvg)

# Rename columns names

colnames(results) <- c("Player's Name", "Player's State", "Total Number of Point", "Player's Pre-Rating", "Opponents Pre-Rating Avg")

Chess Tournament Info

This section will show the result of the tournament on a table.

results %>%
  kbl(caption = "Chess Tournament Info", align = 'c') %>%
  kable_material(c("striped", "hover")) %>%
  row_spec(0, color = "indigo")
Chess Tournament Info
Player’s Name Player’s State Total Number of Point Player’s Pre-Rating Opponents Pre-Rating Avg
GARY HUA ON 6.0 1794 1605
DAKSHESH DARURI MI 6.0 1553 1469
ADITYA BAJAJ MI 6.0 1384 1564
PATRICK H SCHILLING MI 5.5 1716 1574
HANSHI ZUO MI 5.5 1655 1501
HANSEN SONG OH 5.0 1686 1519
GARY DEE SWATHELL MI 5.0 1649 1372
EZEKIEL HOUGHTON MI 5.0 1641 1468
STEFANO LEE ON 5.0 1411 1523
ANVIT RAO MI 5.0 1365 1554
CAMERON WILLIAM MC MI 4.5 1712 1468
KENNETH J TACK MI 4.5 1663 1506
TORRANCE HENRY JR MI 4.5 1666 1498
BRADLEY SHAW MI 4.5 1610 1515
ZACHARY JAMES HOUGHTON MI 4.5 1220 1484
MIKE NIKITIN MI 4.0 1604 1386
RONALD GRZEGORCZYK MI 4.0 1629 1499
DAVID SUNDEEN MI 4.0 1600 1480
DIPANKAR ROY MI 4.0 1564 1426
JASON ZHENG MI 4.0 1595 1411
DINH DANG BUI ON 4.0 1563 1470
EUGENE L MCCLURE MI 4.0 1555 1300
ALAN BUI ON 4.0 1363 1214
MICHAEL R ALDRICH MI 4.0 1229 1357
LOREN SCHWIEBERT MI 3.5 1745 1363
MAX ZHU ON 3.5 1579 1507
GAURAV GIDWANI MI 3.5 1552 1222
SOFIA ADINA STANESCU MI 3.5 1507 1522
CHIEDOZIE OKORIE MI 3.5 1602 1314
GEORGE AVERY JONES ON 3.5 1522 1144
RISHI SHETTY MI 3.5 1494 1260
JOSHUA PHILIP MATHEWS ON 3.5 1441 1379
JADE GE MI 3.5 1449 1277
MICHAEL JEFFERY THOMAS MI 3.5 1399 1375
JOSHUA DAVID LEE MI 3.5 1438 1150
SIDDHARTH JHA MI 3.5 1355 1388
AMIYATOSH PWNANANDAM MI 3.5 980 1385
BRIAN LIU MI 3.0 1423 1539
JOEL R HENDON MI 3.0 1436 1430
FOREST ZHANG MI 3.0 1348 1391
KYLE WILLIAM MURPHY MI 3.0 1403 1248
JARED GE MI 3.0 1332 1150
ROBERT GLEN VASEY MI 3.0 1283 1107
JUSTIN D SCHILLING MI 3.0 1199 1327
DEREK YAN MI 3.0 1242 1152
JACOB ALEXANDER LAVALLEY MI 3.0 377 1358
ERIC WRIGHT MI 2.5 1362 1392
DANIEL KHAIN MI 2.5 1382 1356
MICHAEL J MARTIN MI 2.5 1291 1286
SHIVAM JHA MI 2.5 1056 1296
TEJAS AYYAGARI MI 2.5 1011 1356
ETHAN GUO MI 2.5 935 1495
JOSE C YBARRA MI 2.0 1393 1345
LARRY HODGE MI 2.0 1270 1206
ALEX KONG MI 2.0 1186 1406
MARISA RICCI MI 2.0 1153 1414
MICHAEL LU MI 2.0 1092 1363
VIRAJ MOHILE MI 2.0 917 1391
SEAN M MC MI 2.0 853 1319
JULIA SHEN MI 1.5 967 1330
JEZZEL FARKAS ON 1.5 955 1327
ASHWIN BALAJI MI 1.0 1530 1186
THOMAS JOSEPH HOSMER MI 1.0 1175 1350
BEN LI MI 1.0 1163 1263

Export to csv

# Export to csv

write.csv(results, "chessgame.csv", row.names = FALSE )

Extra work: ELO calculation

Calculating the expected score for each player compare to the opponent’s average.

Use the above formula to calculate the probability of a player to win the game:

\(elo = \frac{1}{1+10^{\frac{B-A}{400}}}\)

New rating = rating + 32(score -expected score) (1)

Rewrite (1): N = R + 32(S-elo)

\(S = \frac{N-R}{32} + elo\) (2)

Use (2) to calculate the score, S.

With A: Player’s Pre-Rating and B: Opponents Pre-Rating Avg

ELO calculation

# ELO Calculation

results <- results %>%
  mutate(elo_rating = 1/(1 + 10^(((results$`Opponents Pre-Rating Avg`) - (results$`Player's Pre-Rating`))/400 ))) 

# Convert elo to percentage and round the answer by 2 decimals

# results$elo_rating <- 100*results$elo_rating
results$elo_rating <- round(results$elo_rating, 3)

Player Post-Rating extraction

# Post-Rating extraction
# Extract all everything first (pre & post rating)

PostRating <- str_extract_all(chess, "(( \\:)|(\\>))?.?\\d{1,}P*\\.?")

# Detect the pattern for post rating

PostRating_detect <- str_detect(unlist(PostRating), "\\>.?\\b\\d{3,4}P?\\b")

# Construct the string of post rating

PostRating_new <-unlist(PostRating)[PostRating_detect]


# Clean the post rating s bit 

PostRating_new <-str_replace_all(PostRating_new, "([>P])","")

# String to numeric
PostRating_new <-as.numeric(PostRating_new)
# Add post rating to the data frame

results <- results %>%
  mutate(PostRating_new)

Player’s actual score

actual_score = (((results$`PostRating_new`) - (results$`Player's Pre-Rating`))/32) + results$elo_rating
# Add columns of actual score

results <- results %>%
  mutate(actual_score) 

# Convert actual score and round the answer by 1 decimal place

results$actual_score <- round(results$actual_score, 1)
# Arrange players per actual score

new_results <- results %>%
 arrange(desc(actual_score))

# Rename the columns names

colnames(new_results) <- c("Player's Name", "Player's State", "Total Number of Point", "Player's Pre-Rating", "Opponents Pre-Rating Avg", "ELO rating (%)", "Player's Post-Rating", "Actual Score Points")

Tournament table per actual score

# Result in table.

new_results %>%
  kbl(caption = "Chess Tournament Info per actual score", align = 'c') %>%
  kable_material(c("striped", "hover")) %>%
  footnote(general = "Jacob Alexander Lavalley scored the most points relative to his expected result.") %>%
  row_spec(0, color = "indigo")
Chess Tournament Info per actual score
Player’s Name Player’s State Total Number of Point Player’s Pre-Rating Opponents Pre-Rating Avg ELO rating (%) Player’s Post-Rating Actual Score Points
JACOB ALEXANDER LAVALLEY MI 3.0 377 1358 0.004 1076 21.8
ADITYA BAJAJ MI 6.0 1384 1564 0.262 1640 8.3
ZACHARY JAMES HOUGHTON MI 4.5 1220 1484 0.180 1416 6.3
ANVIT RAO MI 5.0 1365 1554 0.252 1544 5.8
STEFANO LEE ON 5.0 1411 1523 0.344 1564 5.1
ETHAN GUO MI 2.5 935 1495 0.038 1092 4.9
DAKSHESH DARURI MI 6.0 1553 1469 0.619 1663 4.1
AMIYATOSH PWNANANDAM MI 3.5 980 1385 0.089 1077 3.1
TEJAS AYYAGARI MI 2.5 1011 1356 0.121 1097 2.8
MICHAEL R ALDRICH MI 4.0 1229 1357 0.324 1300 2.5
SHIVAM JHA MI 2.5 1056 1296 0.201 1111 1.9
HANSHI ZUO MI 5.5 1655 1501 0.708 1690 1.8
PATRICK H SCHILLING MI 5.5 1716 1574 0.694 1744 1.6
GARY DEE SWATHELL MI 5.0 1649 1372 0.831 1673 1.6
GARY HUA ON 6.0 1794 1605 0.748 1817 1.5
EZEKIEL HOUGHTON MI 5.0 1641 1468 0.730 1657 1.2
MIKE NIKITIN MI 4.0 1604 1386 0.778 1613 1.1
ALAN BUI ON 4.0 1363 1214 0.702 1371 1.0
ASHWIN BALAJI MI 1.0 1530 1186 0.879 1535 1.0
KENNETH J TACK MI 4.5 1663 1506 0.712 1670 0.9
BRADLEY SHAW MI 4.5 1610 1515 0.633 1618 0.9
DIPANKAR ROY MI 4.0 1564 1426 0.689 1570 0.9
JEZZEL FARKAS ON 1.5 955 1327 0.105 979 0.9
HANSEN SONG OH 5.0 1686 1519 0.723 1687 0.8
SIDDHARTH JHA MI 3.5 1355 1388 0.453 1367 0.8
BRIAN LIU MI 3.0 1423 1539 0.339 1439 0.8
VIRAJ MOHILE MI 2.0 917 1391 0.061 941 0.8
SEAN M MC MI 2.0 853 1319 0.064 878 0.8
DAVID SUNDEEN MI 4.0 1600 1480 0.666 1600 0.7
SOFIA ADINA STANESCU MI 3.5 1507 1522 0.478 1513 0.7
TORRANCE HENRY JR MI 4.5 1666 1498 0.725 1662 0.6
DINH DANG BUI ON 4.0 1563 1470 0.631 1562 0.6
MICHAEL JEFFERY THOMAS MI 3.5 1399 1375 0.534 1400 0.6
JULIA SHEN MI 1.5 967 1330 0.110 984 0.6
GAURAV GIDWANI MI 3.5 1552 1222 0.870 1539 0.5
FOREST ZHANG MI 3.0 1348 1391 0.438 1346 0.4
CAMERON WILLIAM MC MI 4.5 1712 1468 0.803 1696 0.3
JOSHUA PHILIP MATHEWS ON 3.5 1441 1379 0.588 1433 0.3
JUSTIN D SCHILLING MI 3.0 1199 1327 0.324 1199 0.3
RONALD GRZEGORCZYK MI 4.0 1629 1499 0.679 1610 0.1
MAX ZHU ON 3.5 1579 1507 0.602 1564 0.1
EUGENE L MCCLURE MI 4.0 1555 1300 0.813 1529 0.0
JASON ZHENG MI 4.0 1595 1411 0.743 1569 -0.1
JADE GE MI 3.5 1449 1277 0.729 1421 -0.1
JOEL R HENDON MI 3.0 1436 1430 0.509 1413 -0.2
ERIC WRIGHT MI 2.5 1362 1392 0.457 1341 -0.2
MARISA RICCI MI 2.0 1153 1414 0.182 1140 -0.2
MICHAEL LU MI 2.0 1092 1363 0.174 1079 -0.2
ROBERT GLEN VASEY MI 3.0 1283 1107 0.734 1244 -0.5
MICHAEL J MARTIN MI 2.5 1291 1286 0.507 1259 -0.5
JOSE C YBARRA MI 2.0 1393 1345 0.569 1359 -0.5
ALEX KONG MI 2.0 1186 1406 0.220 1163 -0.5
JOSHUA DAVID LEE MI 3.5 1438 1150 0.840 1392 -0.6
RISHI SHETTY MI 3.5 1494 1260 0.794 1444 -0.8
DANIEL KHAIN MI 2.5 1382 1356 0.537 1335 -0.9
DEREK YAN MI 3.0 1242 1152 0.627 1191 -1.0
LOREN SCHWIEBERT MI 3.5 1745 1363 0.900 1681 -1.1
KYLE WILLIAM MURPHY MI 3.0 1403 1248 0.709 1341 -1.2
BEN LI MI 1.0 1163 1263 0.360 1112 -1.2
THOMAS JOSEPH HOSMER MI 1.0 1175 1350 0.267 1125 -1.3
GEORGE AVERY JONES ON 3.5 1522 1144 0.898 1444 -1.5
JARED GE MI 3.0 1332 1150 0.740 1256 -1.6
LARRY HODGE MI 2.0 1270 1206 0.591 1200 -1.6
CHIEDOZIE OKORIE MI 3.5 1602 1314 0.840 1508 -2.1
Note:
Jacob Alexander Lavalley scored the most points relative to his expected result.
