This project involves in reading Chess Tournament Cross Table and generating a CSV file with the following information for all of the players: Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents.

Average Pre Chess Rating of Opponents is calculated as sum of pre-tournament opponents’ ratings divided by total number of games played.

Libraries used for this project.

library(readr)      # used for reading the text file
library(stringr)    # used for extracting text using regular expressions
library(data.table) # used for saving out into tabel format
library(sqldf)      # used for querying and to generate aggregate output
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
library(utils)      # used for generating CSV file

Download and save the tournamentinfo.txt, to local working directory from website. https://bbhosted.cuny.edu/bbcswebdav/pid-28524631-dt-content-rid-126144833_1/courses/SPS01_DATA_607_01_1172_1/SPS01_DATA_607_01_1169_1_ImportedContent_20160815114002/SPS01_DATA_607_01_1162_1_ImportedContent_20160126062155/tournamentinfo.txt

Load Chess Tournament Cross Table file.

# Get local directory and load the file

localDir <- getwd()
chessFile <- paste(str_trim(localDir),"//tournamentinfo.txt",sep="")

# Skip top 3 lines as it contains header info

chess.text<-read_lines(chessFile,skip = 3)
head(chess.text,10)
##  [1] "-----------------------------------------------------------------------------------------"
##  [2] "    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|"
##  [3] "   ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |"
##  [4] "-----------------------------------------------------------------------------------------"
##  [5] "    2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|"
##  [6] "   MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |"
##  [7] "-----------------------------------------------------------------------------------------"
##  [8] "    3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12|"
##  [9] "   MI | 14959604 / R: 1384   ->1640     |N:2  |W    |B    |W    |B    |W    |B    |W    |"
## [10] "-----------------------------------------------------------------------------------------"

There are total of 64 players. File needs formatting in order to perform calculations and generate CSV output.

# First level of formatting removes hypen("-") characters

chess.text.level1 <- gsub("-","",chess.text)
head(chess.text.level1, 6)
## [1] ""                                                                                         
## [2] "    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|"
## [3] "   ON | 15445895 / R: 1794   >1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |" 
## [4] ""                                                                                         
## [5] "    2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|"
## [6] "   MI | 14598900 / R: 1553   >1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |"
# Second level of formatting removes empty lines
# This makes a continous data without newline characters 

chess.text.level2 <- paste(unlist(chess.text.level1), collapse = " ")
head(chess.text.level2,10)
## [1] "     1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|    ON | 15445895 / R: 1794   >1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |      2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|    MI | 14598900 / R: 1553   >1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |      3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12|    MI | 14959604 / R: 1384   >1640     |N:2  |W    |B    |W    |B    |W    |B    |W    |      4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1|    MI | 12616049 / R: 1716   >1744     |N:2  |W    |B    |W    |B    |W    |B    |B    |      5 | HANSHI ZUO                      |5.5  |W  45|W  37|D  12|D  13|D   4|W  14|W  17|    MI | 14601533 / R: 1655   >1690     |N:2  |B    |W    |B    |W    |B    |W    |B    |      6 | HANSEN SONG                     |5.0  |W  34|D  29|L  11|W  35|D  10|W  27|W  21|    OH | 15055204 / R: 1686   >1687     |N:3  |W    |B    |W    |B    |B    |W    |B    |      7 | GARY DEE SWATHELL               |5.0  |W  57|W  46|W  13|W  11|L   1|W   9|L   2|    MI | 11146376 / R: 1649   >1673     |N:3  |W    |B    |W    |B    |B    |W    |W    |      8 | EZEKIEL HOUGHTON                |5.0  |W   3|W  32|L  14|L   9|W  47|W  28|W  19|    MI | 15142253 / R: 1641P17>1657P24  |N:3  |B    |W    |B    |W    |B    |W    |W    |      9 | STEFANO LEE                     |5.0  |W  25|L  18|W  59|W   8|W  26|L   7|W  20|    ON | 14954524 / R: 1411   >1564     |N:2  |W    |B    |W    |B    |W    |B    |B    |     10 | ANVIT RAO                       |5.0  |D  16|L  19|W  55|W  31|D   6|W  25|W  18|    MI | 14150362 / R: 1365   >1544     |N:3  |W    |W    |B    |B    |W    |B    |W    |     11 | CAMERON WILLIAM MC LEMAN        |4.5  |D  38|W  56|W   6|L   7|L   3|W  34|W  26|    MI | 12581589 / R: 1712   >1696     |N:3  |B    |W    |B    |W    |B    |W    |B    |     12 | KENNETH J TACK                  |4.5  |W  42|W  33|D   5|W  38|H    |D   1|L   3|    MI | 12681257 / R: 1663   >1670     |N:3  |W    |B    |W    |B    |     |W    |B    |     13 | TORRANCE HENRY JR               |4.5  |W  36|W  27|L   7|D   5|W  33|L   3|W  32|    MI | 15082995 / R: 1666   >1662     |N:3  |B    |W    |B    |B    |W    |W    |B    |     14 | BRADLEY SHAW                    |4.5  |W  54|W  44|W   8|L   1|D  27|L   5|W  31|    MI | 10131499 / R: 1610   >1618     |N:3  |W    |B    |W    |W    |B    |B    |W    |     15 | ZACHARY JAMES HOUGHTON          |4.5  |D  19|L  16|W  30|L  22|W  54|W  33|W  38|    MI | 15619130 / R: 1220P13>1416P20  |N:3  |B    |B    |W    |W    |B    |B    |W    |     16 | MIKE NIKITIN                    |4.0  |D  10|W  15|H    |W  39|L   2|W  36|U    |    MI | 10295068 / R: 1604   >1613     |N:3  |B    |W    |     |B    |W    |B    |     |     17 | RONALD GRZEGORCZYK              |4.0  |W  48|W  41|L  26|L   2|W  23|W  22|L   5|    MI | 10297702 / R: 1629   >1610     |N:3  |W    |B    |W    |B    |W    |B    |W    |     18 | DAVID SUNDEEN                   |4.0  |W  47|W   9|L   1|W  32|L  19|W  38|L  10|    MI | 11342094 / R: 1600   >1600     |N:3  |B    |W    |B    |W    |B    |W    |B    |     19 | DIPANKAR ROY                    |4.0  |D  15|W  10|W  52|D  28|W  18|L   4|L   8|    MI | 14862333 / R: 1564   >1570     |N:3  |W    |B    |W    |B    |W    |W    |B    |     20 | JASON ZHENG                     |4.0  |L  40|W  49|W  23|W  41|W  28|L   2|L   9|    MI | 14529060 / R: 1595   >1569     |N:4  |W    |B    |W    |B    |W    |B    |W    |     21 | DINH DANG BUI                   |4.0  |W  43|L   1|W  47|L   3|W  40|W  39|L   6|    ON | 15495066 / R: 1563P22>1562     |N:3  |B    |W    |B    |W    |W    |B    |W    |     22 | EUGENE L MCCLURE                |4.0  |W  64|D  52|L  28|W  15|H    |L  17|W  40|    MI | 12405534 / R: 1555   >1529     |N:4  |W    |B    |W    |B    |     |W    |B    |     23 | ALAN BUI                        |4.0  |L   4|W  43|L  20|W  58|L  17|W  37|W  46|    ON | 15030142 / R: 1363   >1371     |     |B    |W    |B    |W    |B    |W    |B    |     24 | MICHAEL R ALDRICH               |4.0  |L  28|L  47|W  43|L  25|W  60|W  44|W  39|    MI | 13469010 / R: 1229   >1300     |N:4  |B    |W    |B    |B    |W    |W    |B    |     25 | LOREN SCHWIEBERT                |3.5  |L   9|W  53|L   3|W  24|D  34|L  10|W  47|    MI | 12486656 / R: 1745   >1681     |N:4  |B    |W    |B    |W    |B    |W    |B    |     26 | MAX ZHU                         |3.5  |W  49|W  40|W  17|L   4|L   9|D  32|L  11|    ON | 15131520 / R: 1579   >1564     |N:4  |B    |W    |B    |W    |B    |W    |W    |     27 | GAURAV GIDWANI                  |3.5  |W  51|L  13|W  46|W  37|D  14|L   6|U    |    MI | 14476567 / R: 1552   >1539     |N:4  |W    |B    |W    |B    |W    |B    |     |     28 | SOFIA ADINA STANESCUBELLU      |3.5  |W  24|D   4|W  22|D  19|L  20|L   8|D  36|    MI | 14882954 / R: 1507   >1513     |N:3  |W    |W    |B    |W    |B    |B    |W    |     29 | CHIEDOZIE OKORIE                |3.5  |W  50|D   6|L  38|L  34|W  52|W  48|U    |    MI | 15323285 / R: 1602P6 >1508P12  |N:4  |B    |W    |B    |W    |W    |B    |     |     30 | GEORGE AVERY JONES              |3.5  |L  52|D  64|L  15|W  55|L  31|W  61|W  50|    ON | 12577178 / R: 1522   >1444     |     |W    |B    |B    |W    |W    |B    |B    |     31 | RISHI SHETTY                    |3.5  |L  58|D  55|W  64|L  10|W  30|W  50|L  14|    MI | 15131618 / R: 1494   >1444     |     |B    |W    |B    |W    |B    |W    |B    |     32 | JOSHUA PHILIP MATHEWS           |3.5  |W  61|L   8|W  44|L  18|W  51|D  26|L  13|    ON | 14073750 / R: 1441   >1433     |N:4  |W    |B    |W    |B    |W    |B    |W    |     33 | JADE GE                         |3.5  |W  60|L  12|W  50|D  36|L  13|L  15|W  51|    MI | 14691842 / R: 1449   >1421     |     |B    |W    |B    |W    |B    |W    |B    |     34 | MICHAEL JEFFERY THOMAS          |3.5  |L   6|W  60|L  37|W  29|D  25|L  11|W  52|    MI | 15051807 / R: 1399   >1400     |     |B    |W    |B    |B    |W    |B    |W    |     35 | JOSHUA DAVID LEE                |3.5  |L  46|L  38|W  56|L   6|W  57|D  52|W  48|    MI | 14601397 / R: 1438   >1392     |     |W    |W    |B    |W    |B    |B    |W    |     36 | SIDDHARTH JHA                   |3.5  |L  13|W  57|W  51|D  33|H    |L  16|D  28|    MI | 14773163 / R: 1355   >1367     |N:4  |W    |B    |W    |B    |     |W    |B    |     37 | AMIYATOSH PWNANANDAM            |3.5  |B    |L   5|W  34|L  27|H    |L  23|W  61|    MI | 15489571 / R:  980P12>1077P17  |     |     |B    |W    |W    |     |B    |W    |     38 | BRIAN LIU                       |3.0  |D  11|W  35|W  29|L  12|H    |L  18|L  15|    MI | 15108523 / R: 1423   >1439     |N:4  |W    |B    |W    |W    |     |B    |B    |     39 | JOEL R HENDON                   |3.0  |L   1|W  54|W  40|L  16|W  44|L  21|L  24|    MI | 12923035 / R: 1436P23>1413     |N:4  |B    |W    |B    |W    |B    |W    |W    |     40 | FOREST ZHANG                    |3.0  |W  20|L  26|L  39|W  59|L  21|W  56|L  22|    MI | 14892710 / R: 1348   >1346     |     |B    |B    |W    |W    |B    |W    |W    |     41 | KYLE WILLIAM MURPHY             |3.0  |W  59|L  17|W  58|L  20|X    |U    |U    |    MI | 15761443 / R: 1403P5 >1341P9   |     |B    |W    |B    |W    |     |     |     |     42 | JARED GE                        |3.0  |L  12|L  50|L  57|D  60|D  61|W  64|W  56|    MI | 14462326 / R: 1332   >1256     |     |B    |W    |B    |B    |W    |W    |B    |     43 | ROBERT GLEN VASEY               |3.0  |L  21|L  23|L  24|W  63|W  59|L  46|W  55|    MI | 14101068 / R: 1283   >1244     |     |W    |B    |W    |W    |B    |B    |W    |     44 | JUSTIN D SCHILLING              |3.0  |B    |L  14|L  32|W  53|L  39|L  24|W  59|    MI | 15323504 / R: 1199   >1199     |     |     |W    |B    |B    |W    |B    |W    |     45 | DEREK YAN                       |3.0  |L   5|L  51|D  60|L  56|W  63|D  55|W  58|    MI | 15372807 / R: 1242   >1191     |     |W    |B    |W    |B    |W    |B    |W    |     46 | JACOB ALEXANDER LAVALLEY        |3.0  |W  35|L   7|L  27|L  50|W  64|W  43|L  23|    MI | 15490981 / R:  377P3 >1076P10  |     |B    |W    |B    |W    |B    |W    |W    |     47 | ERIC WRIGHT                     |2.5  |L  18|W  24|L  21|W  61|L   8|D  51|L  25|    MI | 12533115 / R: 1362   >1341     |     |W    |B    |W    |B    |W    |B    |W    |     48 | DANIEL KHAIN                    |2.5  |L  17|W  63|H    |D  52|H    |L  29|L  35|    MI | 14369165 / R: 1382   >1335     |     |B    |W    |     |B    |     |W    |B    |     49 | MICHAEL J MARTIN                |2.5  |L  26|L  20|D  63|D  64|W  58|H    |U    |    MI | 12531685 / R: 1291P12>1259P17  |     |W    |W    |B    |W    |B    |     |     |     50 | SHIVAM JHA                      |2.5  |L  29|W  42|L  33|W  46|H    |L  31|L  30|    MI | 14773178 / R: 1056   >1111     |     |W    |B    |W    |B    |     |B    |W    |     51 | TEJAS AYYAGARI                  |2.5  |L  27|W  45|L  36|W  57|L  32|D  47|L  33|    MI | 15205474 / R: 1011   >1097     |     |B    |W    |B    |W    |B    |W    |W    |     52 | ETHAN GUO                       |2.5  |W  30|D  22|L  19|D  48|L  29|D  35|L  34|    MI | 14918803 / R:  935   >1092     |N:4  |B    |W    |B    |W    |B    |W    |B    |     53 | JOSE C YBARRA                   |2.0  |H    |L  25|H    |L  44|U    |W  57|U    |    MI | 12578849 / R: 1393   >1359     |     |     |B    |     |W    |     |W    |     |     54 | LARRY HODGE                     |2.0  |L  14|L  39|L  61|B    |L  15|L  59|W  64|    MI | 12836773 / R: 1270   >1200     |     |B    |B    |W    |     |W    |B    |W    |     55 | ALEX KONG                       |2.0  |L  62|D  31|L  10|L  30|B    |D  45|L  43|    MI | 15412571 / R: 1186   >1163     |     |W    |B    |W    |B    |     |W    |B    |     56 | MARISA RICCI                    |2.0  |H    |L  11|L  35|W  45|H    |L  40|L  42|    MI | 14679887 / R: 1153   >1140     |     |     |B    |W    |W    |     |B    |W    |     57 | MICHAEL LU                      |2.0  |L   7|L  36|W  42|L  51|L  35|L  53|B    |    MI | 15113330 / R: 1092   >1079     |     |B    |W    |W    |B    |W    |B    |     |     58 | VIRAJ MOHILE                    |2.0  |W  31|L   2|L  41|L  23|L  49|B    |L  45|    MI | 14700365 / R:  917   > 941     |     |W    |B    |W    |B    |W    |     |B    |     59 | SEAN M MC CORMICK               |2.0  |L  41|B    |L   9|L  40|L  43|W  54|L  44|    MI | 12841036 / R:  853   > 878     |     |W    |     |B    |B    |W    |W    |B    |     60 | JULIA SHEN                      |1.5  |L  33|L  34|D  45|D  42|L  24|H    |U    |    MI | 14579262 / R:  967   > 984     |     |W    |B    |B    |W    |B    |     |     |     61 | JEZZEL FARKAS                   |1.5  |L  32|L   3|W  54|L  47|D  42|L  30|L  37|    ON | 15771592 / R:  955P11> 979P18  |     |B    |W    |B    |W    |B    |W    |B    |     62 | ASHWIN BALAJI                   |1.0  |W  55|U    |U    |U    |U    |U    |U    |    MI | 15219542 / R: 1530   >1535     |     |B    |     |     |     |     |     |     |     63 | THOMAS JOSEPH HOSMER            |1.0  |L   2|L  48|D  49|L  43|L  45|H    |U    |    MI | 15057092 / R: 1175   >1125     |     |W    |B    |W    |B    |B    |     |     |     64 | BEN LI                          |1.0  |L  22|D  30|L  31|D  49|L  46|L  42|L  54|    MI | 15006561 / R: 1163   >1112     |     |B    |W    |W    |B    |W    |B    |B    | "
# Add extra characters at the end of text data. This helps read complete data. 
# If extra characters are not added, last line will be skipped. 

chess.text.level2 <- paste(chess.text.level2,"_*_*_*_")

# Extract the text using pattern repeating 180 characters
# Idea is to combine 2 lines of each player's info into single line
# Once combined, information about single player takes 180 characters

pattern <- "[[:print:]]{180}"
chess.text.level3 <- unlist(str_extract_all(chess.text.level2, pattern = pattern))
head(chess.text.level3,3)
## [1] "     1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|    ON | 15445895 / R: 1794   >1817     |N:2  |W    |B    |W    |B    |W    |B    |W    | "
## [2] "     2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|    MI | 14598900 / R: 1553   >1663     |N:2  |B    |W    |B    |W    |B    |W    |B    | "
## [3] "     3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12|    MI | 14959604 / R: 1384   >1640     |N:2  |W    |B    |W    |B    |W    |B    |W    | "
# Each player has about 20 columns of information. Split the single string into 20 columns

chess.data <- data.frame(unlist(str_split_fixed(chess.text.level3, "\\|", 21)))
head(chess.data, 5)
##        X1                                X2    X3    X4    X5    X6    X7
## 1      1   GARY HUA                         6.0   W  39 W  21 W  18 W  14
## 2      2   DAKSHESH DARURI                  6.0   W  63 W  58 L   4 W  17
## 3      3   ADITYA BAJAJ                     6.0   L   8 W  61 W  25 W  21
## 4      4   PATRICK H SCHILLING              5.5   W  23 D  28 W   2 W  26
## 5      5   HANSHI ZUO                       5.5   W  45 W  37 D  12 D  13
##      X8    X9   X10     X11                              X12   X13   X14
## 1 W   7 D  12 D   4     ON   15445895 / R: 1794   >1817      N:2   W    
## 2 W  16 W  20 W   7     MI   14598900 / R: 1553   >1663      N:2   B    
## 3 W  11 W  13 W  12     MI   14959604 / R: 1384   >1640      N:2   W    
## 4 D   5 W  19 D   1     MI   12616049 / R: 1716   >1744      N:2   W    
## 5 D   4 W  14 W  17     MI   14601533 / R: 1655   >1690      N:2   B    
##     X15   X16   X17   X18   X19   X20 X21
## 1 B     W     B     W     B     W        
## 2 W     B     W     B     W     B        
## 3 B     W     B     W     B     W        
## 4 B     W     B     W     B     B        
## 5 W     B     W     B     W     B
# Using regular expressions extract the pre match rating for each player

pattern = "(R:\\s+[[:digit:]]{3,4})"
chess.preGameRating <- unlist(str_extract(string = chess.data[,12], pattern = pattern))

# Get the pre-game rating
pattern = "[[:digit:]]{3,4}"
chess.preGameRating <- str_extract(string = chess.preGameRating, pattern = pattern)
chess.preGameRating <- data.frame(chess.data, preGameRating = chess.preGameRating)

head(chess.preGameRating, 5)
##        X1                                X2    X3    X4    X5    X6    X7
## 1      1   GARY HUA                         6.0   W  39 W  21 W  18 W  14
## 2      2   DAKSHESH DARURI                  6.0   W  63 W  58 L   4 W  17
## 3      3   ADITYA BAJAJ                     6.0   L   8 W  61 W  25 W  21
## 4      4   PATRICK H SCHILLING              5.5   W  23 D  28 W   2 W  26
## 5      5   HANSHI ZUO                       5.5   W  45 W  37 D  12 D  13
##      X8    X9   X10     X11                              X12   X13   X14
## 1 W   7 D  12 D   4     ON   15445895 / R: 1794   >1817      N:2   W    
## 2 W  16 W  20 W   7     MI   14598900 / R: 1553   >1663      N:2   B    
## 3 W  11 W  13 W  12     MI   14959604 / R: 1384   >1640      N:2   W    
## 4 D   5 W  19 D   1     MI   12616049 / R: 1716   >1744      N:2   W    
## 5 D   4 W  14 W  17     MI   14601533 / R: 1655   >1690      N:2   B    
##     X15   X16   X17   X18   X19   X20 X21 preGameRating
## 1 B     W     B     W     B     W                  1794
## 2 W     B     W     B     W     B                  1553
## 3 B     W     B     W     B     W                  1384
## 4 B     W     B     W     B     B                  1716
## 5 W     B     W     B     W     B                  1655
# Format the column type

chess.preGameRating$X1 <- as.numeric(as.character(chess.preGameRating$X1))
chess.preGameRating$preGameRating <- as.numeric(as.character(chess.preGameRating$preGameRating))
# Extract the information about opponent for each round
# Opponent player info is set to "0" for all "NA" values

pattern = "[[:digit:]]{1,2}"

chess.preGameRating$G1opponent = ifelse(is.na(str_match(string = chess.preGameRating$X4, pattern = pattern)), 0, as.numeric(str_extract(string = chess.preGameRating$X4, pattern = pattern)))
chess.preGameRating$G2opponent = ifelse(is.na(str_match(string = chess.preGameRating$X5, pattern = pattern)), 0, as.numeric(str_extract(string = chess.preGameRating$X5, pattern = pattern)))
chess.preGameRating$G3opponent = ifelse(is.na(str_match(string = chess.preGameRating$X6, pattern = pattern)), 0, as.numeric(str_extract(string = chess.preGameRating$X6, pattern = pattern)))
chess.preGameRating$G4opponent = ifelse(is.na(str_match(string = chess.preGameRating$X7, pattern = pattern)), 0, as.numeric(str_extract(string = chess.preGameRating$X7, pattern = pattern)))
chess.preGameRating$G5opponent = ifelse(is.na(str_match(string = chess.preGameRating$X8, pattern = pattern)), 0, as.numeric(str_extract(string = chess.preGameRating$X8, pattern = pattern)))
chess.preGameRating$G6opponent = ifelse(is.na(str_match(string = chess.preGameRating$X9, pattern = pattern)), 0, as.numeric(str_extract(string = chess.preGameRating$X9, pattern = pattern)))
chess.preGameRating$G7opponent = ifelse(is.na(str_match(string = chess.preGameRating$X10, pattern = pattern)), 0, as.numeric(str_extract(string = chess.preGameRating$X10, pattern = pattern)))

head(chess.preGameRating,10)
##    X1                                X2    X3    X4    X5    X6    X7
## 1   1  GARY HUA                         6.0   W  39 W  21 W  18 W  14
## 2   2  DAKSHESH DARURI                  6.0   W  63 W  58 L   4 W  17
## 3   3  ADITYA BAJAJ                     6.0   L   8 W  61 W  25 W  21
## 4   4  PATRICK H SCHILLING              5.5   W  23 D  28 W   2 W  26
## 5   5  HANSHI ZUO                       5.5   W  45 W  37 D  12 D  13
## 6   6  HANSEN SONG                      5.0   W  34 D  29 L  11 W  35
## 7   7  GARY DEE SWATHELL                5.0   W  57 W  46 W  13 W  11
## 8   8  EZEKIEL HOUGHTON                 5.0   W   3 W  32 L  14 L   9
## 9   9  STEFANO LEE                      5.0   W  25 L  18 W  59 W   8
## 10 10  ANVIT RAO                        5.0   D  16 L  19 W  55 W  31
##       X8    X9   X10     X11                              X12   X13   X14
## 1  W   7 D  12 D   4     ON   15445895 / R: 1794   >1817      N:2   W    
## 2  W  16 W  20 W   7     MI   14598900 / R: 1553   >1663      N:2   B    
## 3  W  11 W  13 W  12     MI   14959604 / R: 1384   >1640      N:2   W    
## 4  D   5 W  19 D   1     MI   12616049 / R: 1716   >1744      N:2   W    
## 5  D   4 W  14 W  17     MI   14601533 / R: 1655   >1690      N:2   B    
## 6  D  10 W  27 W  21     OH   15055204 / R: 1686   >1687      N:3   W    
## 7  L   1 W   9 L   2     MI   11146376 / R: 1649   >1673      N:3   W    
## 8  W  47 W  28 W  19     MI   15142253 / R: 1641P17>1657P24   N:3   B    
## 9  W  26 L   7 W  20     ON   14954524 / R: 1411   >1564      N:2   W    
## 10 D   6 W  25 W  18     MI   14150362 / R: 1365   >1544      N:3   W    
##      X15   X16   X17   X18   X19   X20 X21 preGameRating G1opponent
## 1  B     W     B     W     B     W                  1794         39
## 2  W     B     W     B     W     B                  1553         63
## 3  B     W     B     W     B     W                  1384          8
## 4  B     W     B     W     B     B                  1716         23
## 5  W     B     W     B     W     B                  1655         45
## 6  B     W     B     B     W     B                  1686         34
## 7  B     W     B     B     W     W                  1649         57
## 8  W     B     W     B     W     W                  1641          3
## 9  B     W     B     W     B     B                  1411         25
## 10 W     B     B     W     B     W                  1365         16
##    G2opponent G3opponent G4opponent G5opponent G6opponent G7opponent
## 1          21         18         14          7         12          4
## 2          58          4         17         16         20          7
## 3          61         25         21         11         13         12
## 4          28          2         26          5         19          1
## 5          37         12         13          4         14         17
## 6          29         11         35         10         27         21
## 7          46         13         11          1          9          2
## 8          32         14          9         47         28         19
## 9          18         59          8         26          7         20
## 10         19         55         31          6         25         18
chessTable <- data.table(chess.preGameRating)

chess.rating <- sqldf("SELECT ct1.*,SUM(ct2.preGameRating) AS sumOfOpponentPreGameRating,
                          SUM(CASE WHEN ct1.G1opponent > 0 THEN 1 
                          WHEN ct1.G2opponent > 0 THEN 1
                          WHEN ct1.G3opponent > 0 THEN 1
                          WHEN ct1.G4opponent > 0 THEN 1
                          WHEN ct1.G5opponent > 0 THEN 1
                          WHEN ct1.G6opponent > 0 THEN 1
                          WHEN ct1.G7opponent > 0 THEN 1
                          ELSE 0 END) AS numberOfGamesPlayed
                      FROM chessTable ct1, chessTable ct2 
                      WHERE ct2.X1 IN(ct1.G1opponent,ct1.G2opponent,ct1.G3opponent,ct1.G4opponent,ct1.G5opponent,ct1.G6opponent,ct1.G7opponent) GROUP BY ct1.X1")
## Loading required package: tcltk
## Warning: Quoted identifiers should have class SQL, use DBI::SQL() if the
## caller performs the quoting.
head(chess.rating,5)
##   X1                                X2    X3    X4    X5    X6    X7    X8
## 1  1  GARY HUA                         6.0   W  39 W  21 W  18 W  14 W   7
## 2  2  DAKSHESH DARURI                  6.0   W  63 W  58 L   4 W  17 W  16
## 3  3  ADITYA BAJAJ                     6.0   L   8 W  61 W  25 W  21 W  11
## 4  4  PATRICK H SCHILLING              5.5   W  23 D  28 W   2 W  26 D   5
## 5  5  HANSHI ZUO                       5.5   W  45 W  37 D  12 D  13 D   4
##      X9   X10     X11                              X12   X13   X14   X15
## 1 D  12 D   4     ON   15445895 / R: 1794   >1817      N:2   W     B    
## 2 W  20 W   7     MI   14598900 / R: 1553   >1663      N:2   B     W    
## 3 W  13 W  12     MI   14959604 / R: 1384   >1640      N:2   W     B    
## 4 W  19 D   1     MI   12616049 / R: 1716   >1744      N:2   W     B    
## 5 W  14 W  17     MI   14601533 / R: 1655   >1690      N:2   B     W    
##     X16   X17   X18   X19   X20 X21 preGameRating G1opponent G2opponent
## 1 W     B     W     B     W                  1794         39         21
## 2 B     W     B     W     B                  1553         63         58
## 3 W     B     W     B     W                  1384          8         61
## 4 W     B     W     B     B                  1716         23         28
## 5 B     W     B     W     B                  1655         45         37
##   G3opponent G4opponent G5opponent G6opponent G7opponent
## 1         18         14          7         12          4
## 2          4         17         16         20          7
## 3         25         21         11         13         12
## 4          2         26          5         19          1
## 5         12         13          4         14         17
##   sumOfOpponentPreGameRating numberOfGamesPlayed
## 1                      11237                   7
## 2                      10285                   7
## 3                      10945                   7
## 4                      11015                   7
## 5                      10506                   7
# Average opponent pre game rating (rounded).  

chess.rating$avgOppPreGameRating <- round(chess.rating$sumOfOpponentPreGameRating / chess.rating$numberOfGamesPlayed,0)
head(chess.rating,10)
##    X1                                X2    X3    X4    X5    X6    X7
## 1   1  GARY HUA                         6.0   W  39 W  21 W  18 W  14
## 2   2  DAKSHESH DARURI                  6.0   W  63 W  58 L   4 W  17
## 3   3  ADITYA BAJAJ                     6.0   L   8 W  61 W  25 W  21
## 4   4  PATRICK H SCHILLING              5.5   W  23 D  28 W   2 W  26
## 5   5  HANSHI ZUO                       5.5   W  45 W  37 D  12 D  13
## 6   6  HANSEN SONG                      5.0   W  34 D  29 L  11 W  35
## 7   7  GARY DEE SWATHELL                5.0   W  57 W  46 W  13 W  11
## 8   8  EZEKIEL HOUGHTON                 5.0   W   3 W  32 L  14 L   9
## 9   9  STEFANO LEE                      5.0   W  25 L  18 W  59 W   8
## 10 10  ANVIT RAO                        5.0   D  16 L  19 W  55 W  31
##       X8    X9   X10     X11                              X12   X13   X14
## 1  W   7 D  12 D   4     ON   15445895 / R: 1794   >1817      N:2   W    
## 2  W  16 W  20 W   7     MI   14598900 / R: 1553   >1663      N:2   B    
## 3  W  11 W  13 W  12     MI   14959604 / R: 1384   >1640      N:2   W    
## 4  D   5 W  19 D   1     MI   12616049 / R: 1716   >1744      N:2   W    
## 5  D   4 W  14 W  17     MI   14601533 / R: 1655   >1690      N:2   B    
## 6  D  10 W  27 W  21     OH   15055204 / R: 1686   >1687      N:3   W    
## 7  L   1 W   9 L   2     MI   11146376 / R: 1649   >1673      N:3   W    
## 8  W  47 W  28 W  19     MI   15142253 / R: 1641P17>1657P24   N:3   B    
## 9  W  26 L   7 W  20     ON   14954524 / R: 1411   >1564      N:2   W    
## 10 D   6 W  25 W  18     MI   14150362 / R: 1365   >1544      N:3   W    
##      X15   X16   X17   X18   X19   X20 X21 preGameRating G1opponent
## 1  B     W     B     W     B     W                  1794         39
## 2  W     B     W     B     W     B                  1553         63
## 3  B     W     B     W     B     W                  1384          8
## 4  B     W     B     W     B     B                  1716         23
## 5  W     B     W     B     W     B                  1655         45
## 6  B     W     B     B     W     B                  1686         34
## 7  B     W     B     B     W     W                  1649         57
## 8  W     B     W     B     W     W                  1641          3
## 9  B     W     B     W     B     B                  1411         25
## 10 W     B     B     W     B     W                  1365         16
##    G2opponent G3opponent G4opponent G5opponent G6opponent G7opponent
## 1          21         18         14          7         12          4
## 2          58          4         17         16         20          7
## 3          61         25         21         11         13         12
## 4          28          2         26          5         19          1
## 5          37         12         13          4         14         17
## 6          29         11         35         10         27         21
## 7          46         13         11          1          9          2
## 8          32         14          9         47         28         19
## 9          18         59          8         26          7         20
## 10         19         55         31          6         25         18
##    sumOfOpponentPreGameRating numberOfGamesPlayed avgOppPreGameRating
## 1                       11237                   7                1605
## 2                       10285                   7                1469
## 3                       10945                   7                1564
## 4                       11015                   7                1574
## 5                       10506                   7                1501
## 6                       10631                   7                1519
## 7                        9605                   7                1372
## 8                       10279                   7                1468
## 9                       10662                   7                1523
## 10                      10879                   7                1554
# Change column names to readable format

names(chess.rating)[names(chess.rating) == "X1"] <- "playerNumber"
names(chess.rating)[names(chess.rating) == "X2"] <- "playerName"
names(chess.rating)[names(chess.rating) == "X11"] <- "playerState"
names(chess.rating)[names(chess.rating) == "X3"] <- "playerPoints"
names(chess.rating)[names(chess.rating) == "preGameRating"] <- "playerPreGameRating"


# Format the column values remove extra spaces

trim <- function (x) gsub("^\\s+|\\s+$", "", x)
chess.rating$playerState <- trim(chess.rating$playerState)
chess.rating$playerPoints <- trim(chess.rating$playerPoints)
chess.rating$playerName <- trim(chess.rating$playerName)

# Create subset of data to be written to CSV file

chess.csvdata <- subset(chess.rating,select = c(playerNumber,playerName,playerState,playerPoints,playerPreGameRating,avgOppPreGameRating))

head(chess.csvdata,10)
##    playerNumber          playerName playerState playerPoints
## 1             1            GARY HUA          ON          6.0
## 2             2     DAKSHESH DARURI          MI          6.0
## 3             3        ADITYA BAJAJ          MI          6.0
## 4             4 PATRICK H SCHILLING          MI          5.5
## 5             5          HANSHI ZUO          MI          5.5
## 6             6         HANSEN SONG          OH          5.0
## 7             7   GARY DEE SWATHELL          MI          5.0
## 8             8    EZEKIEL HOUGHTON          MI          5.0
## 9             9         STEFANO LEE          ON          5.0
## 10           10           ANVIT RAO          MI          5.0
##    playerPreGameRating avgOppPreGameRating
## 1                 1794                1605
## 2                 1553                1469
## 3                 1384                1564
## 4                 1716                1574
## 5                 1655                1501
## 6                 1686                1519
## 7                 1649                1372
## 8                 1641                1468
## 9                 1411                1523
## 10                1365                1554
# Generate CSV files

chessOutFile <- paste(str_trim(localDir),"//chessOutFile.csv",sep="")
write.table(chess.csvdata, file = chessOutFile, row.names = FALSE, sep = ",")

References:

  1. SQL from within R - https://github.com/ggrothendieck/sqldf#example-4-join

  2. Trim function - http://stackoverflow.com/questions/2261079/how-to-trim-leading-and-trailing-whitespace-in-r

  3. Regular expressions - Handling and Processing Strings in R, Gaston Sanchez