This project involves in reading Chess Tournament Cross Table and generating a CSV file with the following information for all of the players: Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents.
Average Pre Chess Rating of Opponents is calculated as sum of pre-tournament opponents’ ratings divided by total number of games played.
Libraries used for this project.
library(readr) # used for reading the text file
library(stringr) # used for extracting text using regular expressions
library(data.table) # used for saving out into tabel format
library(sqldf) # used for querying and to generate aggregate output
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
library(utils) # used for generating CSV file
Download and save the tournamentinfo.txt, to local working directory from website. https://bbhosted.cuny.edu/bbcswebdav/pid-28524631-dt-content-rid-126144833_1/courses/SPS01_DATA_607_01_1172_1/SPS01_DATA_607_01_1169_1_ImportedContent_20160815114002/SPS01_DATA_607_01_1162_1_ImportedContent_20160126062155/tournamentinfo.txt
Load Chess Tournament Cross Table file.
# Get local directory and load the file
localDir <- getwd()
chessFile <- paste(str_trim(localDir),"//tournamentinfo.txt",sep="")
# Skip top 3 lines as it contains header info
chess.text<-read_lines(chessFile,skip = 3)
head(chess.text,10)
## [1] "-----------------------------------------------------------------------------------------"
## [2] " 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|"
## [3] " ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |"
## [4] "-----------------------------------------------------------------------------------------"
## [5] " 2 | DAKSHESH DARURI |6.0 |W 63|W 58|L 4|W 17|W 16|W 20|W 7|"
## [6] " MI | 14598900 / R: 1553 ->1663 |N:2 |B |W |B |W |B |W |B |"
## [7] "-----------------------------------------------------------------------------------------"
## [8] " 3 | ADITYA BAJAJ |6.0 |L 8|W 61|W 25|W 21|W 11|W 13|W 12|"
## [9] " MI | 14959604 / R: 1384 ->1640 |N:2 |W |B |W |B |W |B |W |"
## [10] "-----------------------------------------------------------------------------------------"
There are total of 64 players. File needs formatting in order to perform calculations and generate CSV output.
# First level of formatting removes hypen("-") characters
chess.text.level1 <- gsub("-","",chess.text)
head(chess.text.level1, 6)
## [1] ""
## [2] " 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|"
## [3] " ON | 15445895 / R: 1794 >1817 |N:2 |W |B |W |B |W |B |W |"
## [4] ""
## [5] " 2 | DAKSHESH DARURI |6.0 |W 63|W 58|L 4|W 17|W 16|W 20|W 7|"
## [6] " MI | 14598900 / R: 1553 >1663 |N:2 |B |W |B |W |B |W |B |"
# Second level of formatting removes empty lines
# This makes a continous data without newline characters
chess.text.level2 <- paste(unlist(chess.text.level1), collapse = " ")
head(chess.text.level2,10)
## [1] " 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4| ON | 15445895 / R: 1794 >1817 |N:2 |W |B |W |B |W |B |W | 2 | DAKSHESH DARURI |6.0 |W 63|W 58|L 4|W 17|W 16|W 20|W 7| MI | 14598900 / R: 1553 >1663 |N:2 |B |W |B |W |B |W |B | 3 | ADITYA BAJAJ |6.0 |L 8|W 61|W 25|W 21|W 11|W 13|W 12| MI | 14959604 / R: 1384 >1640 |N:2 |W |B |W |B |W |B |W | 4 | PATRICK H SCHILLING |5.5 |W 23|D 28|W 2|W 26|D 5|W 19|D 1| MI | 12616049 / R: 1716 >1744 |N:2 |W |B |W |B |W |B |B | 5 | HANSHI ZUO |5.5 |W 45|W 37|D 12|D 13|D 4|W 14|W 17| MI | 14601533 / R: 1655 >1690 |N:2 |B |W |B |W |B |W |B | 6 | HANSEN SONG |5.0 |W 34|D 29|L 11|W 35|D 10|W 27|W 21| OH | 15055204 / R: 1686 >1687 |N:3 |W |B |W |B |B |W |B | 7 | GARY DEE SWATHELL |5.0 |W 57|W 46|W 13|W 11|L 1|W 9|L 2| MI | 11146376 / R: 1649 >1673 |N:3 |W |B |W |B |B |W |W | 8 | EZEKIEL HOUGHTON |5.0 |W 3|W 32|L 14|L 9|W 47|W 28|W 19| MI | 15142253 / R: 1641P17>1657P24 |N:3 |B |W |B |W |B |W |W | 9 | STEFANO LEE |5.0 |W 25|L 18|W 59|W 8|W 26|L 7|W 20| ON | 14954524 / R: 1411 >1564 |N:2 |W |B |W |B |W |B |B | 10 | ANVIT RAO |5.0 |D 16|L 19|W 55|W 31|D 6|W 25|W 18| MI | 14150362 / R: 1365 >1544 |N:3 |W |W |B |B |W |B |W | 11 | CAMERON WILLIAM MC LEMAN |4.5 |D 38|W 56|W 6|L 7|L 3|W 34|W 26| MI | 12581589 / R: 1712 >1696 |N:3 |B |W |B |W |B |W |B | 12 | KENNETH J TACK |4.5 |W 42|W 33|D 5|W 38|H |D 1|L 3| MI | 12681257 / R: 1663 >1670 |N:3 |W |B |W |B | |W |B | 13 | TORRANCE HENRY JR |4.5 |W 36|W 27|L 7|D 5|W 33|L 3|W 32| MI | 15082995 / R: 1666 >1662 |N:3 |B |W |B |B |W |W |B | 14 | BRADLEY SHAW |4.5 |W 54|W 44|W 8|L 1|D 27|L 5|W 31| MI | 10131499 / R: 1610 >1618 |N:3 |W |B |W |W |B |B |W | 15 | ZACHARY JAMES HOUGHTON |4.5 |D 19|L 16|W 30|L 22|W 54|W 33|W 38| MI | 15619130 / R: 1220P13>1416P20 |N:3 |B |B |W |W |B |B |W | 16 | MIKE NIKITIN |4.0 |D 10|W 15|H |W 39|L 2|W 36|U | MI | 10295068 / R: 1604 >1613 |N:3 |B |W | |B |W |B | | 17 | RONALD GRZEGORCZYK |4.0 |W 48|W 41|L 26|L 2|W 23|W 22|L 5| MI | 10297702 / R: 1629 >1610 |N:3 |W |B |W |B |W |B |W | 18 | DAVID SUNDEEN |4.0 |W 47|W 9|L 1|W 32|L 19|W 38|L 10| MI | 11342094 / R: 1600 >1600 |N:3 |B |W |B |W |B |W |B | 19 | DIPANKAR ROY |4.0 |D 15|W 10|W 52|D 28|W 18|L 4|L 8| MI | 14862333 / R: 1564 >1570 |N:3 |W |B |W |B |W |W |B | 20 | JASON ZHENG |4.0 |L 40|W 49|W 23|W 41|W 28|L 2|L 9| MI | 14529060 / R: 1595 >1569 |N:4 |W |B |W |B |W |B |W | 21 | DINH DANG BUI |4.0 |W 43|L 1|W 47|L 3|W 40|W 39|L 6| ON | 15495066 / R: 1563P22>1562 |N:3 |B |W |B |W |W |B |W | 22 | EUGENE L MCCLURE |4.0 |W 64|D 52|L 28|W 15|H |L 17|W 40| MI | 12405534 / R: 1555 >1529 |N:4 |W |B |W |B | |W |B | 23 | ALAN BUI |4.0 |L 4|W 43|L 20|W 58|L 17|W 37|W 46| ON | 15030142 / R: 1363 >1371 | |B |W |B |W |B |W |B | 24 | MICHAEL R ALDRICH |4.0 |L 28|L 47|W 43|L 25|W 60|W 44|W 39| MI | 13469010 / R: 1229 >1300 |N:4 |B |W |B |B |W |W |B | 25 | LOREN SCHWIEBERT |3.5 |L 9|W 53|L 3|W 24|D 34|L 10|W 47| MI | 12486656 / R: 1745 >1681 |N:4 |B |W |B |W |B |W |B | 26 | MAX ZHU |3.5 |W 49|W 40|W 17|L 4|L 9|D 32|L 11| ON | 15131520 / R: 1579 >1564 |N:4 |B |W |B |W |B |W |W | 27 | GAURAV GIDWANI |3.5 |W 51|L 13|W 46|W 37|D 14|L 6|U | MI | 14476567 / R: 1552 >1539 |N:4 |W |B |W |B |W |B | | 28 | SOFIA ADINA STANESCUBELLU |3.5 |W 24|D 4|W 22|D 19|L 20|L 8|D 36| MI | 14882954 / R: 1507 >1513 |N:3 |W |W |B |W |B |B |W | 29 | CHIEDOZIE OKORIE |3.5 |W 50|D 6|L 38|L 34|W 52|W 48|U | MI | 15323285 / R: 1602P6 >1508P12 |N:4 |B |W |B |W |W |B | | 30 | GEORGE AVERY JONES |3.5 |L 52|D 64|L 15|W 55|L 31|W 61|W 50| ON | 12577178 / R: 1522 >1444 | |W |B |B |W |W |B |B | 31 | RISHI SHETTY |3.5 |L 58|D 55|W 64|L 10|W 30|W 50|L 14| MI | 15131618 / R: 1494 >1444 | |B |W |B |W |B |W |B | 32 | JOSHUA PHILIP MATHEWS |3.5 |W 61|L 8|W 44|L 18|W 51|D 26|L 13| ON | 14073750 / R: 1441 >1433 |N:4 |W |B |W |B |W |B |W | 33 | JADE GE |3.5 |W 60|L 12|W 50|D 36|L 13|L 15|W 51| MI | 14691842 / R: 1449 >1421 | |B |W |B |W |B |W |B | 34 | MICHAEL JEFFERY THOMAS |3.5 |L 6|W 60|L 37|W 29|D 25|L 11|W 52| MI | 15051807 / R: 1399 >1400 | |B |W |B |B |W |B |W | 35 | JOSHUA DAVID LEE |3.5 |L 46|L 38|W 56|L 6|W 57|D 52|W 48| MI | 14601397 / R: 1438 >1392 | |W |W |B |W |B |B |W | 36 | SIDDHARTH JHA |3.5 |L 13|W 57|W 51|D 33|H |L 16|D 28| MI | 14773163 / R: 1355 >1367 |N:4 |W |B |W |B | |W |B | 37 | AMIYATOSH PWNANANDAM |3.5 |B |L 5|W 34|L 27|H |L 23|W 61| MI | 15489571 / R: 980P12>1077P17 | | |B |W |W | |B |W | 38 | BRIAN LIU |3.0 |D 11|W 35|W 29|L 12|H |L 18|L 15| MI | 15108523 / R: 1423 >1439 |N:4 |W |B |W |W | |B |B | 39 | JOEL R HENDON |3.0 |L 1|W 54|W 40|L 16|W 44|L 21|L 24| MI | 12923035 / R: 1436P23>1413 |N:4 |B |W |B |W |B |W |W | 40 | FOREST ZHANG |3.0 |W 20|L 26|L 39|W 59|L 21|W 56|L 22| MI | 14892710 / R: 1348 >1346 | |B |B |W |W |B |W |W | 41 | KYLE WILLIAM MURPHY |3.0 |W 59|L 17|W 58|L 20|X |U |U | MI | 15761443 / R: 1403P5 >1341P9 | |B |W |B |W | | | | 42 | JARED GE |3.0 |L 12|L 50|L 57|D 60|D 61|W 64|W 56| MI | 14462326 / R: 1332 >1256 | |B |W |B |B |W |W |B | 43 | ROBERT GLEN VASEY |3.0 |L 21|L 23|L 24|W 63|W 59|L 46|W 55| MI | 14101068 / R: 1283 >1244 | |W |B |W |W |B |B |W | 44 | JUSTIN D SCHILLING |3.0 |B |L 14|L 32|W 53|L 39|L 24|W 59| MI | 15323504 / R: 1199 >1199 | | |W |B |B |W |B |W | 45 | DEREK YAN |3.0 |L 5|L 51|D 60|L 56|W 63|D 55|W 58| MI | 15372807 / R: 1242 >1191 | |W |B |W |B |W |B |W | 46 | JACOB ALEXANDER LAVALLEY |3.0 |W 35|L 7|L 27|L 50|W 64|W 43|L 23| MI | 15490981 / R: 377P3 >1076P10 | |B |W |B |W |B |W |W | 47 | ERIC WRIGHT |2.5 |L 18|W 24|L 21|W 61|L 8|D 51|L 25| MI | 12533115 / R: 1362 >1341 | |W |B |W |B |W |B |W | 48 | DANIEL KHAIN |2.5 |L 17|W 63|H |D 52|H |L 29|L 35| MI | 14369165 / R: 1382 >1335 | |B |W | |B | |W |B | 49 | MICHAEL J MARTIN |2.5 |L 26|L 20|D 63|D 64|W 58|H |U | MI | 12531685 / R: 1291P12>1259P17 | |W |W |B |W |B | | | 50 | SHIVAM JHA |2.5 |L 29|W 42|L 33|W 46|H |L 31|L 30| MI | 14773178 / R: 1056 >1111 | |W |B |W |B | |B |W | 51 | TEJAS AYYAGARI |2.5 |L 27|W 45|L 36|W 57|L 32|D 47|L 33| MI | 15205474 / R: 1011 >1097 | |B |W |B |W |B |W |W | 52 | ETHAN GUO |2.5 |W 30|D 22|L 19|D 48|L 29|D 35|L 34| MI | 14918803 / R: 935 >1092 |N:4 |B |W |B |W |B |W |B | 53 | JOSE C YBARRA |2.0 |H |L 25|H |L 44|U |W 57|U | MI | 12578849 / R: 1393 >1359 | | |B | |W | |W | | 54 | LARRY HODGE |2.0 |L 14|L 39|L 61|B |L 15|L 59|W 64| MI | 12836773 / R: 1270 >1200 | |B |B |W | |W |B |W | 55 | ALEX KONG |2.0 |L 62|D 31|L 10|L 30|B |D 45|L 43| MI | 15412571 / R: 1186 >1163 | |W |B |W |B | |W |B | 56 | MARISA RICCI |2.0 |H |L 11|L 35|W 45|H |L 40|L 42| MI | 14679887 / R: 1153 >1140 | | |B |W |W | |B |W | 57 | MICHAEL LU |2.0 |L 7|L 36|W 42|L 51|L 35|L 53|B | MI | 15113330 / R: 1092 >1079 | |B |W |W |B |W |B | | 58 | VIRAJ MOHILE |2.0 |W 31|L 2|L 41|L 23|L 49|B |L 45| MI | 14700365 / R: 917 > 941 | |W |B |W |B |W | |B | 59 | SEAN M MC CORMICK |2.0 |L 41|B |L 9|L 40|L 43|W 54|L 44| MI | 12841036 / R: 853 > 878 | |W | |B |B |W |W |B | 60 | JULIA SHEN |1.5 |L 33|L 34|D 45|D 42|L 24|H |U | MI | 14579262 / R: 967 > 984 | |W |B |B |W |B | | | 61 | JEZZEL FARKAS |1.5 |L 32|L 3|W 54|L 47|D 42|L 30|L 37| ON | 15771592 / R: 955P11> 979P18 | |B |W |B |W |B |W |B | 62 | ASHWIN BALAJI |1.0 |W 55|U |U |U |U |U |U | MI | 15219542 / R: 1530 >1535 | |B | | | | | | | 63 | THOMAS JOSEPH HOSMER |1.0 |L 2|L 48|D 49|L 43|L 45|H |U | MI | 15057092 / R: 1175 >1125 | |W |B |W |B |B | | | 64 | BEN LI |1.0 |L 22|D 30|L 31|D 49|L 46|L 42|L 54| MI | 15006561 / R: 1163 >1112 | |B |W |W |B |W |B |B | "
# Add extra characters at the end of text data. This helps read complete data.
# If extra characters are not added, last line will be skipped.
chess.text.level2 <- paste(chess.text.level2,"_*_*_*_")
# Extract the text using pattern repeating 180 characters
# Idea is to combine 2 lines of each player's info into single line
# Once combined, information about single player takes 180 characters
pattern <- "[[:print:]]{180}"
chess.text.level3 <- unlist(str_extract_all(chess.text.level2, pattern = pattern))
head(chess.text.level3,3)
## [1] " 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4| ON | 15445895 / R: 1794 >1817 |N:2 |W |B |W |B |W |B |W | "
## [2] " 2 | DAKSHESH DARURI |6.0 |W 63|W 58|L 4|W 17|W 16|W 20|W 7| MI | 14598900 / R: 1553 >1663 |N:2 |B |W |B |W |B |W |B | "
## [3] " 3 | ADITYA BAJAJ |6.0 |L 8|W 61|W 25|W 21|W 11|W 13|W 12| MI | 14959604 / R: 1384 >1640 |N:2 |W |B |W |B |W |B |W | "
# Each player has about 20 columns of information. Split the single string into 20 columns
chess.data <- data.frame(unlist(str_split_fixed(chess.text.level3, "\\|", 21)))
head(chess.data, 5)
## X1 X2 X3 X4 X5 X6 X7
## 1 1 GARY HUA 6.0 W 39 W 21 W 18 W 14
## 2 2 DAKSHESH DARURI 6.0 W 63 W 58 L 4 W 17
## 3 3 ADITYA BAJAJ 6.0 L 8 W 61 W 25 W 21
## 4 4 PATRICK H SCHILLING 5.5 W 23 D 28 W 2 W 26
## 5 5 HANSHI ZUO 5.5 W 45 W 37 D 12 D 13
## X8 X9 X10 X11 X12 X13 X14
## 1 W 7 D 12 D 4 ON 15445895 / R: 1794 >1817 N:2 W
## 2 W 16 W 20 W 7 MI 14598900 / R: 1553 >1663 N:2 B
## 3 W 11 W 13 W 12 MI 14959604 / R: 1384 >1640 N:2 W
## 4 D 5 W 19 D 1 MI 12616049 / R: 1716 >1744 N:2 W
## 5 D 4 W 14 W 17 MI 14601533 / R: 1655 >1690 N:2 B
## X15 X16 X17 X18 X19 X20 X21
## 1 B W B W B W
## 2 W B W B W B
## 3 B W B W B W
## 4 B W B W B B
## 5 W B W B W B
# Using regular expressions extract the pre match rating for each player
pattern = "(R:\\s+[[:digit:]]{3,4})"
chess.preGameRating <- unlist(str_extract(string = chess.data[,12], pattern = pattern))
# Get the pre-game rating
pattern = "[[:digit:]]{3,4}"
chess.preGameRating <- str_extract(string = chess.preGameRating, pattern = pattern)
chess.preGameRating <- data.frame(chess.data, preGameRating = chess.preGameRating)
head(chess.preGameRating, 5)
## X1 X2 X3 X4 X5 X6 X7
## 1 1 GARY HUA 6.0 W 39 W 21 W 18 W 14
## 2 2 DAKSHESH DARURI 6.0 W 63 W 58 L 4 W 17
## 3 3 ADITYA BAJAJ 6.0 L 8 W 61 W 25 W 21
## 4 4 PATRICK H SCHILLING 5.5 W 23 D 28 W 2 W 26
## 5 5 HANSHI ZUO 5.5 W 45 W 37 D 12 D 13
## X8 X9 X10 X11 X12 X13 X14
## 1 W 7 D 12 D 4 ON 15445895 / R: 1794 >1817 N:2 W
## 2 W 16 W 20 W 7 MI 14598900 / R: 1553 >1663 N:2 B
## 3 W 11 W 13 W 12 MI 14959604 / R: 1384 >1640 N:2 W
## 4 D 5 W 19 D 1 MI 12616049 / R: 1716 >1744 N:2 W
## 5 D 4 W 14 W 17 MI 14601533 / R: 1655 >1690 N:2 B
## X15 X16 X17 X18 X19 X20 X21 preGameRating
## 1 B W B W B W 1794
## 2 W B W B W B 1553
## 3 B W B W B W 1384
## 4 B W B W B B 1716
## 5 W B W B W B 1655
# Format the column type
chess.preGameRating$X1 <- as.numeric(as.character(chess.preGameRating$X1))
chess.preGameRating$preGameRating <- as.numeric(as.character(chess.preGameRating$preGameRating))
# Extract the information about opponent for each round
# Opponent player info is set to "0" for all "NA" values
pattern = "[[:digit:]]{1,2}"
chess.preGameRating$G1opponent = ifelse(is.na(str_match(string = chess.preGameRating$X4, pattern = pattern)), 0, as.numeric(str_extract(string = chess.preGameRating$X4, pattern = pattern)))
chess.preGameRating$G2opponent = ifelse(is.na(str_match(string = chess.preGameRating$X5, pattern = pattern)), 0, as.numeric(str_extract(string = chess.preGameRating$X5, pattern = pattern)))
chess.preGameRating$G3opponent = ifelse(is.na(str_match(string = chess.preGameRating$X6, pattern = pattern)), 0, as.numeric(str_extract(string = chess.preGameRating$X6, pattern = pattern)))
chess.preGameRating$G4opponent = ifelse(is.na(str_match(string = chess.preGameRating$X7, pattern = pattern)), 0, as.numeric(str_extract(string = chess.preGameRating$X7, pattern = pattern)))
chess.preGameRating$G5opponent = ifelse(is.na(str_match(string = chess.preGameRating$X8, pattern = pattern)), 0, as.numeric(str_extract(string = chess.preGameRating$X8, pattern = pattern)))
chess.preGameRating$G6opponent = ifelse(is.na(str_match(string = chess.preGameRating$X9, pattern = pattern)), 0, as.numeric(str_extract(string = chess.preGameRating$X9, pattern = pattern)))
chess.preGameRating$G7opponent = ifelse(is.na(str_match(string = chess.preGameRating$X10, pattern = pattern)), 0, as.numeric(str_extract(string = chess.preGameRating$X10, pattern = pattern)))
head(chess.preGameRating,10)
## X1 X2 X3 X4 X5 X6 X7
## 1 1 GARY HUA 6.0 W 39 W 21 W 18 W 14
## 2 2 DAKSHESH DARURI 6.0 W 63 W 58 L 4 W 17
## 3 3 ADITYA BAJAJ 6.0 L 8 W 61 W 25 W 21
## 4 4 PATRICK H SCHILLING 5.5 W 23 D 28 W 2 W 26
## 5 5 HANSHI ZUO 5.5 W 45 W 37 D 12 D 13
## 6 6 HANSEN SONG 5.0 W 34 D 29 L 11 W 35
## 7 7 GARY DEE SWATHELL 5.0 W 57 W 46 W 13 W 11
## 8 8 EZEKIEL HOUGHTON 5.0 W 3 W 32 L 14 L 9
## 9 9 STEFANO LEE 5.0 W 25 L 18 W 59 W 8
## 10 10 ANVIT RAO 5.0 D 16 L 19 W 55 W 31
## X8 X9 X10 X11 X12 X13 X14
## 1 W 7 D 12 D 4 ON 15445895 / R: 1794 >1817 N:2 W
## 2 W 16 W 20 W 7 MI 14598900 / R: 1553 >1663 N:2 B
## 3 W 11 W 13 W 12 MI 14959604 / R: 1384 >1640 N:2 W
## 4 D 5 W 19 D 1 MI 12616049 / R: 1716 >1744 N:2 W
## 5 D 4 W 14 W 17 MI 14601533 / R: 1655 >1690 N:2 B
## 6 D 10 W 27 W 21 OH 15055204 / R: 1686 >1687 N:3 W
## 7 L 1 W 9 L 2 MI 11146376 / R: 1649 >1673 N:3 W
## 8 W 47 W 28 W 19 MI 15142253 / R: 1641P17>1657P24 N:3 B
## 9 W 26 L 7 W 20 ON 14954524 / R: 1411 >1564 N:2 W
## 10 D 6 W 25 W 18 MI 14150362 / R: 1365 >1544 N:3 W
## X15 X16 X17 X18 X19 X20 X21 preGameRating G1opponent
## 1 B W B W B W 1794 39
## 2 W B W B W B 1553 63
## 3 B W B W B W 1384 8
## 4 B W B W B B 1716 23
## 5 W B W B W B 1655 45
## 6 B W B B W B 1686 34
## 7 B W B B W W 1649 57
## 8 W B W B W W 1641 3
## 9 B W B W B B 1411 25
## 10 W B B W B W 1365 16
## G2opponent G3opponent G4opponent G5opponent G6opponent G7opponent
## 1 21 18 14 7 12 4
## 2 58 4 17 16 20 7
## 3 61 25 21 11 13 12
## 4 28 2 26 5 19 1
## 5 37 12 13 4 14 17
## 6 29 11 35 10 27 21
## 7 46 13 11 1 9 2
## 8 32 14 9 47 28 19
## 9 18 59 8 26 7 20
## 10 19 55 31 6 25 18
chessTable <- data.table(chess.preGameRating)
chess.rating <- sqldf("SELECT ct1.*,SUM(ct2.preGameRating) AS sumOfOpponentPreGameRating,
SUM(CASE WHEN ct1.G1opponent > 0 THEN 1
WHEN ct1.G2opponent > 0 THEN 1
WHEN ct1.G3opponent > 0 THEN 1
WHEN ct1.G4opponent > 0 THEN 1
WHEN ct1.G5opponent > 0 THEN 1
WHEN ct1.G6opponent > 0 THEN 1
WHEN ct1.G7opponent > 0 THEN 1
ELSE 0 END) AS numberOfGamesPlayed
FROM chessTable ct1, chessTable ct2
WHERE ct2.X1 IN(ct1.G1opponent,ct1.G2opponent,ct1.G3opponent,ct1.G4opponent,ct1.G5opponent,ct1.G6opponent,ct1.G7opponent) GROUP BY ct1.X1")
## Loading required package: tcltk
## Warning: Quoted identifiers should have class SQL, use DBI::SQL() if the
## caller performs the quoting.
head(chess.rating,5)
## X1 X2 X3 X4 X5 X6 X7 X8
## 1 1 GARY HUA 6.0 W 39 W 21 W 18 W 14 W 7
## 2 2 DAKSHESH DARURI 6.0 W 63 W 58 L 4 W 17 W 16
## 3 3 ADITYA BAJAJ 6.0 L 8 W 61 W 25 W 21 W 11
## 4 4 PATRICK H SCHILLING 5.5 W 23 D 28 W 2 W 26 D 5
## 5 5 HANSHI ZUO 5.5 W 45 W 37 D 12 D 13 D 4
## X9 X10 X11 X12 X13 X14 X15
## 1 D 12 D 4 ON 15445895 / R: 1794 >1817 N:2 W B
## 2 W 20 W 7 MI 14598900 / R: 1553 >1663 N:2 B W
## 3 W 13 W 12 MI 14959604 / R: 1384 >1640 N:2 W B
## 4 W 19 D 1 MI 12616049 / R: 1716 >1744 N:2 W B
## 5 W 14 W 17 MI 14601533 / R: 1655 >1690 N:2 B W
## X16 X17 X18 X19 X20 X21 preGameRating G1opponent G2opponent
## 1 W B W B W 1794 39 21
## 2 B W B W B 1553 63 58
## 3 W B W B W 1384 8 61
## 4 W B W B B 1716 23 28
## 5 B W B W B 1655 45 37
## G3opponent G4opponent G5opponent G6opponent G7opponent
## 1 18 14 7 12 4
## 2 4 17 16 20 7
## 3 25 21 11 13 12
## 4 2 26 5 19 1
## 5 12 13 4 14 17
## sumOfOpponentPreGameRating numberOfGamesPlayed
## 1 11237 7
## 2 10285 7
## 3 10945 7
## 4 11015 7
## 5 10506 7
# Average opponent pre game rating (rounded).
chess.rating$avgOppPreGameRating <- round(chess.rating$sumOfOpponentPreGameRating / chess.rating$numberOfGamesPlayed,0)
head(chess.rating,10)
## X1 X2 X3 X4 X5 X6 X7
## 1 1 GARY HUA 6.0 W 39 W 21 W 18 W 14
## 2 2 DAKSHESH DARURI 6.0 W 63 W 58 L 4 W 17
## 3 3 ADITYA BAJAJ 6.0 L 8 W 61 W 25 W 21
## 4 4 PATRICK H SCHILLING 5.5 W 23 D 28 W 2 W 26
## 5 5 HANSHI ZUO 5.5 W 45 W 37 D 12 D 13
## 6 6 HANSEN SONG 5.0 W 34 D 29 L 11 W 35
## 7 7 GARY DEE SWATHELL 5.0 W 57 W 46 W 13 W 11
## 8 8 EZEKIEL HOUGHTON 5.0 W 3 W 32 L 14 L 9
## 9 9 STEFANO LEE 5.0 W 25 L 18 W 59 W 8
## 10 10 ANVIT RAO 5.0 D 16 L 19 W 55 W 31
## X8 X9 X10 X11 X12 X13 X14
## 1 W 7 D 12 D 4 ON 15445895 / R: 1794 >1817 N:2 W
## 2 W 16 W 20 W 7 MI 14598900 / R: 1553 >1663 N:2 B
## 3 W 11 W 13 W 12 MI 14959604 / R: 1384 >1640 N:2 W
## 4 D 5 W 19 D 1 MI 12616049 / R: 1716 >1744 N:2 W
## 5 D 4 W 14 W 17 MI 14601533 / R: 1655 >1690 N:2 B
## 6 D 10 W 27 W 21 OH 15055204 / R: 1686 >1687 N:3 W
## 7 L 1 W 9 L 2 MI 11146376 / R: 1649 >1673 N:3 W
## 8 W 47 W 28 W 19 MI 15142253 / R: 1641P17>1657P24 N:3 B
## 9 W 26 L 7 W 20 ON 14954524 / R: 1411 >1564 N:2 W
## 10 D 6 W 25 W 18 MI 14150362 / R: 1365 >1544 N:3 W
## X15 X16 X17 X18 X19 X20 X21 preGameRating G1opponent
## 1 B W B W B W 1794 39
## 2 W B W B W B 1553 63
## 3 B W B W B W 1384 8
## 4 B W B W B B 1716 23
## 5 W B W B W B 1655 45
## 6 B W B B W B 1686 34
## 7 B W B B W W 1649 57
## 8 W B W B W W 1641 3
## 9 B W B W B B 1411 25
## 10 W B B W B W 1365 16
## G2opponent G3opponent G4opponent G5opponent G6opponent G7opponent
## 1 21 18 14 7 12 4
## 2 58 4 17 16 20 7
## 3 61 25 21 11 13 12
## 4 28 2 26 5 19 1
## 5 37 12 13 4 14 17
## 6 29 11 35 10 27 21
## 7 46 13 11 1 9 2
## 8 32 14 9 47 28 19
## 9 18 59 8 26 7 20
## 10 19 55 31 6 25 18
## sumOfOpponentPreGameRating numberOfGamesPlayed avgOppPreGameRating
## 1 11237 7 1605
## 2 10285 7 1469
## 3 10945 7 1564
## 4 11015 7 1574
## 5 10506 7 1501
## 6 10631 7 1519
## 7 9605 7 1372
## 8 10279 7 1468
## 9 10662 7 1523
## 10 10879 7 1554
# Change column names to readable format
names(chess.rating)[names(chess.rating) == "X1"] <- "playerNumber"
names(chess.rating)[names(chess.rating) == "X2"] <- "playerName"
names(chess.rating)[names(chess.rating) == "X11"] <- "playerState"
names(chess.rating)[names(chess.rating) == "X3"] <- "playerPoints"
names(chess.rating)[names(chess.rating) == "preGameRating"] <- "playerPreGameRating"
# Format the column values remove extra spaces
trim <- function (x) gsub("^\\s+|\\s+$", "", x)
chess.rating$playerState <- trim(chess.rating$playerState)
chess.rating$playerPoints <- trim(chess.rating$playerPoints)
chess.rating$playerName <- trim(chess.rating$playerName)
# Create subset of data to be written to CSV file
chess.csvdata <- subset(chess.rating,select = c(playerNumber,playerName,playerState,playerPoints,playerPreGameRating,avgOppPreGameRating))
head(chess.csvdata,10)
## playerNumber playerName playerState playerPoints
## 1 1 GARY HUA ON 6.0
## 2 2 DAKSHESH DARURI MI 6.0
## 3 3 ADITYA BAJAJ MI 6.0
## 4 4 PATRICK H SCHILLING MI 5.5
## 5 5 HANSHI ZUO MI 5.5
## 6 6 HANSEN SONG OH 5.0
## 7 7 GARY DEE SWATHELL MI 5.0
## 8 8 EZEKIEL HOUGHTON MI 5.0
## 9 9 STEFANO LEE ON 5.0
## 10 10 ANVIT RAO MI 5.0
## playerPreGameRating avgOppPreGameRating
## 1 1794 1605
## 2 1553 1469
## 3 1384 1564
## 4 1716 1574
## 5 1655 1501
## 6 1686 1519
## 7 1649 1372
## 8 1641 1468
## 9 1411 1523
## 10 1365 1554
# Generate CSV files
chessOutFile <- paste(str_trim(localDir),"//chessOutFile.csv",sep="")
write.table(chess.csvdata, file = chessOutFile, row.names = FALSE, sep = ",")
References:
SQL from within R - https://github.com/ggrothendieck/sqldf#example-4-join
Trim function - http://stackoverflow.com/questions/2261079/how-to-trim-leading-and-trailing-whitespace-in-r
Regular expressions - Handling and Processing Strings in R, Gaston Sanchez