Author: Romerl Elizes

Preface

This was one very big, challenging project. Very enthusiastic about this. I could easily do this project in Java or C#, but it was quite a challenge doing it in R. However, I see some commonalities with the functions I have discovered while doing this project.

The best way to solve this project was to use the Divide and Conquer Strategy in solving algorithms: total focus on solving a small problem before moving onto the next. It made implementing the project a lot faster. I followed all requirements in the Project 1 Specs.

In summary, I loaded the entire file onto the R markdown file, parsed it, and placed all of its components in a large data frame. I added all the required modified fields and dummy fields onto the data frame so that I can easily do my calculations according to the Project 1 specs. After I was satisfied that I got all the required and correct data, I created a smaller data frame and only attached the final fields I needed. I just wrote the data frame onto a CSV file and outputted the contents of that CSV file for all users to see.

A more detailed account of the project will be discussed as you peruse the R-Markdown document.

I. Load Appropriate Libraries

library(stringr)

II. Load Data

I stored the tournamentinfo.txt file onto my github account. I found from a previous class I took, that in order to read the file properly, I had to use the url pointing to the raw data otherwise, things won’t work. The result was a string vector, each string representing one line in the input file.

# ref: [HOW]
urlfile <-"https://raw.githubusercontent.com/RommyGraphs/MSDA/master/DATA607/tournamentinfo.txt"
rawdata <- readLines(urlfile)

## Warning in readLines(urlfile): incomplete final line found on
## 'https://raw.githubusercontent.com/RommyGraphs/MSDA/master/DATA607/
## tournamentinfo.txt'

III. Create a Data Frame from the Input File

This was the first major obstacle for my project. My goal was to combine every pair of lines and make them as one row of my data for the data frame that I would use.

A. Eliminate lines that have “—–” and line with “Player Name” heading

The input file contains a bunch of unneeded characters especially the header text and the lines with ———. I eliminated the lines containing ——- using a basic R manipulation of grepl and paste0.

# ref: [REM]
rawdatac <- rawdata[!grepl(paste0("-----", collapse = "|"), rawdata)]

B. Loop through the String Vector and combine each odd and even section

Every pair of lines in the input file contained information for one player. The objective for this part of the project was to combine those rows in order to create a row for one player. Instead of extracting only the information I thought I would need, I just combined every pair of lines into one string into my destination String vector. I outputted the results onto the screen for demonstration purposes.

stringvector <- c()
sizerawdatac <- length(rawdatac)
index <- 1
vectori <- 1
while (index < sizerawdatac) {
  if (index %% 2 != 0) {
    stringcand <- paste (rawdatac[index], rawdatac[index+1], " ")
    stringvector[vectori] <- stringcand
    vectori = vectori + 1
  }
  index = index + 1
}
stringvector

##  [1] " Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round|   Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  |   "
##  [2] "    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|    ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |  "  
##  [3] "    2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|    MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |  "  
##  [4] "    3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12|    MI | 14959604 / R: 1384   ->1640     |N:2  |W    |B    |W    |B    |W    |B    |W    |  "  
##  [5] "    4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1|    MI | 12616049 / R: 1716   ->1744     |N:2  |W    |B    |W    |B    |W    |B    |B    |  "  
##  [6] "    5 | HANSHI ZUO                      |5.5  |W  45|W  37|D  12|D  13|D   4|W  14|W  17|    MI | 14601533 / R: 1655   ->1690     |N:2  |B    |W    |B    |W    |B    |W    |B    |  "  
##  [7] "    6 | HANSEN SONG                     |5.0  |W  34|D  29|L  11|W  35|D  10|W  27|W  21|    OH | 15055204 / R: 1686   ->1687     |N:3  |W    |B    |W    |B    |B    |W    |B    |  "  
##  [8] "    7 | GARY DEE SWATHELL               |5.0  |W  57|W  46|W  13|W  11|L   1|W   9|L   2|    MI | 11146376 / R: 1649   ->1673     |N:3  |W    |B    |W    |B    |B    |W    |W    |  "  
##  [9] "    8 | EZEKIEL HOUGHTON                |5.0  |W   3|W  32|L  14|L   9|W  47|W  28|W  19|    MI | 15142253 / R: 1641P17->1657P24  |N:3  |B    |W    |B    |W    |B    |W    |W    |  "  
## [10] "    9 | STEFANO LEE                     |5.0  |W  25|L  18|W  59|W   8|W  26|L   7|W  20|    ON | 14954524 / R: 1411   ->1564     |N:2  |W    |B    |W    |B    |W    |B    |B    |  "  
## [11] "   10 | ANVIT RAO                       |5.0  |D  16|L  19|W  55|W  31|D   6|W  25|W  18|    MI | 14150362 / R: 1365   ->1544     |N:3  |W    |W    |B    |B    |W    |B    |W    |  "  
## [12] "   11 | CAMERON WILLIAM MC LEMAN        |4.5  |D  38|W  56|W   6|L   7|L   3|W  34|W  26|    MI | 12581589 / R: 1712   ->1696     |N:3  |B    |W    |B    |W    |B    |W    |B    |  "  
## [13] "   12 | KENNETH J TACK                  |4.5  |W  42|W  33|D   5|W  38|H    |D   1|L   3|    MI | 12681257 / R: 1663   ->1670     |N:3  |W    |B    |W    |B    |     |W    |B    |  "  
## [14] "   13 | TORRANCE HENRY JR               |4.5  |W  36|W  27|L   7|D   5|W  33|L   3|W  32|    MI | 15082995 / R: 1666   ->1662     |N:3  |B    |W    |B    |B    |W    |W    |B    |  "  
## [15] "   14 | BRADLEY SHAW                    |4.5  |W  54|W  44|W   8|L   1|D  27|L   5|W  31|    MI | 10131499 / R: 1610   ->1618     |N:3  |W    |B    |W    |W    |B    |B    |W    |  "  
## [16] "   15 | ZACHARY JAMES HOUGHTON          |4.5  |D  19|L  16|W  30|L  22|W  54|W  33|W  38|    MI | 15619130 / R: 1220P13->1416P20  |N:3  |B    |B    |W    |W    |B    |B    |W    |  "  
## [17] "   16 | MIKE NIKITIN                    |4.0  |D  10|W  15|H    |W  39|L   2|W  36|U    |    MI | 10295068 / R: 1604   ->1613     |N:3  |B    |W    |     |B    |W    |B    |     |  "  
## [18] "   17 | RONALD GRZEGORCZYK              |4.0  |W  48|W  41|L  26|L   2|W  23|W  22|L   5|    MI | 10297702 / R: 1629   ->1610     |N:3  |W    |B    |W    |B    |W    |B    |W    |  "  
## [19] "   18 | DAVID SUNDEEN                   |4.0  |W  47|W   9|L   1|W  32|L  19|W  38|L  10|    MI | 11342094 / R: 1600   ->1600     |N:3  |B    |W    |B    |W    |B    |W    |B    |  "  
## [20] "   19 | DIPANKAR ROY                    |4.0  |D  15|W  10|W  52|D  28|W  18|L   4|L   8|    MI | 14862333 / R: 1564   ->1570     |N:3  |W    |B    |W    |B    |W    |W    |B    |  "  
## [21] "   20 | JASON ZHENG                     |4.0  |L  40|W  49|W  23|W  41|W  28|L   2|L   9|    MI | 14529060 / R: 1595   ->1569     |N:4  |W    |B    |W    |B    |W    |B    |W    |  "  
## [22] "   21 | DINH DANG BUI                   |4.0  |W  43|L   1|W  47|L   3|W  40|W  39|L   6|    ON | 15495066 / R: 1563P22->1562     |N:3  |B    |W    |B    |W    |W    |B    |W    |  "  
## [23] "   22 | EUGENE L MCCLURE                |4.0  |W  64|D  52|L  28|W  15|H    |L  17|W  40|    MI | 12405534 / R: 1555   ->1529     |N:4  |W    |B    |W    |B    |     |W    |B    |  "  
## [24] "   23 | ALAN BUI                        |4.0  |L   4|W  43|L  20|W  58|L  17|W  37|W  46|    ON | 15030142 / R: 1363   ->1371     |     |B    |W    |B    |W    |B    |W    |B    |  "  
## [25] "   24 | MICHAEL R ALDRICH               |4.0  |L  28|L  47|W  43|L  25|W  60|W  44|W  39|    MI | 13469010 / R: 1229   ->1300     |N:4  |B    |W    |B    |B    |W    |W    |B    |  "  
## [26] "   25 | LOREN SCHWIEBERT                |3.5  |L   9|W  53|L   3|W  24|D  34|L  10|W  47|    MI | 12486656 / R: 1745   ->1681     |N:4  |B    |W    |B    |W    |B    |W    |B    |  "  
## [27] "   26 | MAX ZHU                         |3.5  |W  49|W  40|W  17|L   4|L   9|D  32|L  11|    ON | 15131520 / R: 1579   ->1564     |N:4  |B    |W    |B    |W    |B    |W    |W    |  "  
## [28] "   27 | GAURAV GIDWANI                  |3.5  |W  51|L  13|W  46|W  37|D  14|L   6|U    |    MI | 14476567 / R: 1552   ->1539     |N:4  |W    |B    |W    |B    |W    |B    |     |  "  
## [29] "   28 | SOFIA ADINA STANESCU-BELLU      |3.5  |W  24|D   4|W  22|D  19|L  20|L   8|D  36|    MI | 14882954 / R: 1507   ->1513     |N:3  |W    |W    |B    |W    |B    |B    |W    |  "  
## [30] "   29 | CHIEDOZIE OKORIE                |3.5  |W  50|D   6|L  38|L  34|W  52|W  48|U    |    MI | 15323285 / R: 1602P6 ->1508P12  |N:4  |B    |W    |B    |W    |W    |B    |     |  "  
## [31] "   30 | GEORGE AVERY JONES              |3.5  |L  52|D  64|L  15|W  55|L  31|W  61|W  50|    ON | 12577178 / R: 1522   ->1444     |     |W    |B    |B    |W    |W    |B    |B    |  "  
## [32] "   31 | RISHI SHETTY                    |3.5  |L  58|D  55|W  64|L  10|W  30|W  50|L  14|    MI | 15131618 / R: 1494   ->1444     |     |B    |W    |B    |W    |B    |W    |B    |  "  
## [33] "   32 | JOSHUA PHILIP MATHEWS           |3.5  |W  61|L   8|W  44|L  18|W  51|D  26|L  13|    ON | 14073750 / R: 1441   ->1433     |N:4  |W    |B    |W    |B    |W    |B    |W    |  "  
## [34] "   33 | JADE GE                         |3.5  |W  60|L  12|W  50|D  36|L  13|L  15|W  51|    MI | 14691842 / R: 1449   ->1421     |     |B    |W    |B    |W    |B    |W    |B    |  "  
## [35] "   34 | MICHAEL JEFFERY THOMAS          |3.5  |L   6|W  60|L  37|W  29|D  25|L  11|W  52|    MI | 15051807 / R: 1399   ->1400     |     |B    |W    |B    |B    |W    |B    |W    |  "  
## [36] "   35 | JOSHUA DAVID LEE                |3.5  |L  46|L  38|W  56|L   6|W  57|D  52|W  48|    MI | 14601397 / R: 1438   ->1392     |     |W    |W    |B    |W    |B    |B    |W    |  "  
## [37] "   36 | SIDDHARTH JHA                   |3.5  |L  13|W  57|W  51|D  33|H    |L  16|D  28|    MI | 14773163 / R: 1355   ->1367     |N:4  |W    |B    |W    |B    |     |W    |B    |  "  
## [38] "   37 | AMIYATOSH PWNANANDAM            |3.5  |B    |L   5|W  34|L  27|H    |L  23|W  61|    MI | 15489571 / R:  980P12->1077P17  |     |     |B    |W    |W    |     |B    |W    |  "  
## [39] "   38 | BRIAN LIU                       |3.0  |D  11|W  35|W  29|L  12|H    |L  18|L  15|    MI | 15108523 / R: 1423   ->1439     |N:4  |W    |B    |W    |W    |     |B    |B    |  "  
## [40] "   39 | JOEL R HENDON                   |3.0  |L   1|W  54|W  40|L  16|W  44|L  21|L  24|    MI | 12923035 / R: 1436P23->1413     |N:4  |B    |W    |B    |W    |B    |W    |W    |  "  
## [41] "   40 | FOREST ZHANG                    |3.0  |W  20|L  26|L  39|W  59|L  21|W  56|L  22|    MI | 14892710 / R: 1348   ->1346     |     |B    |B    |W    |W    |B    |W    |W    |  "  
## [42] "   41 | KYLE WILLIAM MURPHY             |3.0  |W  59|L  17|W  58|L  20|X    |U    |U    |    MI | 15761443 / R: 1403P5 ->1341P9   |     |B    |W    |B    |W    |     |     |     |  "  
## [43] "   42 | JARED GE                        |3.0  |L  12|L  50|L  57|D  60|D  61|W  64|W  56|    MI | 14462326 / R: 1332   ->1256     |     |B    |W    |B    |B    |W    |W    |B    |  "  
## [44] "   43 | ROBERT GLEN VASEY               |3.0  |L  21|L  23|L  24|W  63|W  59|L  46|W  55|    MI | 14101068 / R: 1283   ->1244     |     |W    |B    |W    |W    |B    |B    |W    |  "  
## [45] "   44 | JUSTIN D SCHILLING              |3.0  |B    |L  14|L  32|W  53|L  39|L  24|W  59|    MI | 15323504 / R: 1199   ->1199     |     |     |W    |B    |B    |W    |B    |W    |  "  
## [46] "   45 | DEREK YAN                       |3.0  |L   5|L  51|D  60|L  56|W  63|D  55|W  58|    MI | 15372807 / R: 1242   ->1191     |     |W    |B    |W    |B    |W    |B    |W    |  "  
## [47] "   46 | JACOB ALEXANDER LAVALLEY        |3.0  |W  35|L   7|L  27|L  50|W  64|W  43|L  23|    MI | 15490981 / R:  377P3 ->1076P10  |     |B    |W    |B    |W    |B    |W    |W    |  "  
## [48] "   47 | ERIC WRIGHT                     |2.5  |L  18|W  24|L  21|W  61|L   8|D  51|L  25|    MI | 12533115 / R: 1362   ->1341     |     |W    |B    |W    |B    |W    |B    |W    |  "  
## [49] "   48 | DANIEL KHAIN                    |2.5  |L  17|W  63|H    |D  52|H    |L  29|L  35|    MI | 14369165 / R: 1382   ->1335     |     |B    |W    |     |B    |     |W    |B    |  "  
## [50] "   49 | MICHAEL J MARTIN                |2.5  |L  26|L  20|D  63|D  64|W  58|H    |U    |    MI | 12531685 / R: 1291P12->1259P17  |     |W    |W    |B    |W    |B    |     |     |  "  
## [51] "   50 | SHIVAM JHA                      |2.5  |L  29|W  42|L  33|W  46|H    |L  31|L  30|    MI | 14773178 / R: 1056   ->1111     |     |W    |B    |W    |B    |     |B    |W    |  "  
## [52] "   51 | TEJAS AYYAGARI                  |2.5  |L  27|W  45|L  36|W  57|L  32|D  47|L  33|    MI | 15205474 / R: 1011   ->1097     |     |B    |W    |B    |W    |B    |W    |W    |  "  
## [53] "   52 | ETHAN GUO                       |2.5  |W  30|D  22|L  19|D  48|L  29|D  35|L  34|    MI | 14918803 / R:  935   ->1092     |N:4  |B    |W    |B    |W    |B    |W    |B    |  "  
## [54] "   53 | JOSE C YBARRA                   |2.0  |H    |L  25|H    |L  44|U    |W  57|U    |    MI | 12578849 / R: 1393   ->1359     |     |     |B    |     |W    |     |W    |     |  "  
## [55] "   54 | LARRY HODGE                     |2.0  |L  14|L  39|L  61|B    |L  15|L  59|W  64|    MI | 12836773 / R: 1270   ->1200     |     |B    |B    |W    |     |W    |B    |W    |  "  
## [56] "   55 | ALEX KONG                       |2.0  |L  62|D  31|L  10|L  30|B    |D  45|L  43|    MI | 15412571 / R: 1186   ->1163     |     |W    |B    |W    |B    |     |W    |B    |  "  
## [57] "   56 | MARISA RICCI                    |2.0  |H    |L  11|L  35|W  45|H    |L  40|L  42|    MI | 14679887 / R: 1153   ->1140     |     |     |B    |W    |W    |     |B    |W    |  "  
## [58] "   57 | MICHAEL LU                      |2.0  |L   7|L  36|W  42|L  51|L  35|L  53|B    |    MI | 15113330 / R: 1092   ->1079     |     |B    |W    |W    |B    |W    |B    |     |  "  
## [59] "   58 | VIRAJ MOHILE                    |2.0  |W  31|L   2|L  41|L  23|L  49|B    |L  45|    MI | 14700365 / R:  917   -> 941     |     |W    |B    |W    |B    |W    |     |B    |  "  
## [60] "   59 | SEAN M MC CORMICK               |2.0  |L  41|B    |L   9|L  40|L  43|W  54|L  44|    MI | 12841036 / R:  853   -> 878     |     |W    |     |B    |B    |W    |W    |B    |  "  
## [61] "   60 | JULIA SHEN                      |1.5  |L  33|L  34|D  45|D  42|L  24|H    |U    |    MI | 14579262 / R:  967   -> 984     |     |W    |B    |B    |W    |B    |     |     |  "  
## [62] "   61 | JEZZEL FARKAS                   |1.5  |L  32|L   3|W  54|L  47|D  42|L  30|L  37|    ON | 15771592 / R:  955P11-> 979P18  |     |B    |W    |B    |W    |B    |W    |B    |  "  
## [63] "   62 | ASHWIN BALAJI                   |1.0  |W  55|U    |U    |U    |U    |U    |U    |    MI | 15219542 / R: 1530   ->1535     |     |B    |     |     |     |     |     |     |  "  
## [64] "   63 | THOMAS JOSEPH HOSMER            |1.0  |L   2|L  48|D  49|L  43|L  45|H    |U    |    MI | 15057092 / R: 1175   ->1125     |     |W    |B    |W    |B    |B    |     |     |  "  
## [65] "   64 | BEN LI                          |1.0  |L  22|D  30|L  31|D  49|L  46|L  42|L  54|    MI | 15006561 / R: 1163   ->1112     |     |B    |W    |W    |B    |W    |B    |B    |  "

C. Iterate through each element of the String Vector and making columnar table elements

Now that we have all the required data one line each, we can put the data into a data frame. Here, I created the initial data frame as empty as I don’t know how many rows I will be generating. Currently the first row contains all of the heading information so when adding elements to the data frame, I ignored the first row. When adding to the data frame, I used strsplit to split the fields into their respective columns. Luckily, “|” already separated the fields I needed and this made the job easier. I trimmed all each element to remove any extraneous white space.

#[CRE]
dataframe <- data.frame(
  id = character(), #1
  name = character(), #2
  total = character(), #3
  rdph1 = character(), #4
  rdph2 = character(), #5
  rdph3 = character(), #6
  rdph4 = character(), #7
  rdph5 = character(), #8
  rdph6 = character(), #9
  rdph7 = character(), #10
  num = character(), #11
  idph = character(), #12
  pts = character(), #13
  rdcolor1 = character(), #14
  rdcolor2 = character(), #15
  rdcolor3 = character(), #16
  rdcolor4 = character(), #17
  rdcolor5 = character(), #18
  rdcolor6 = character(), #19
  rdcolor7 = character(), #20
  emptyph = character(), #21
  stringsAsFactors = FALSE
)
for (i in 1:length(stringvector)) {
  if (i != 1) { # Get rid of heading string because we won't need it.
    strcand <- stringvector[i]
    listelement <- strsplit(strcand, "\\|")[[1]]
    trimmedlistelement <- str_trim(listelement)
    dataframe[nrow(dataframe) +1, ] <- trimmedlistelement
  }
}

IV. Add Columns and Conduct Appropriate Calculations for the Data Frame

The data frame has been made and now it’s time to add work columns (fields) to the data frames, to create functions needed for the calculations, and to execute the calculations required for the Project 1 specs.

A. Create Calculation Functions

I created these calculation functions as I addressed each field requirement for the project.

1. Define Function findSumOfAllWinsLossesDrawsForRound

Per the video and project specs, I should only count Wins, Losses, or Draws for each round. This function returns either a 0 or 1 for that round. 1 means that a game was played. 0 means no game was played for this round.

findSumOfAllWinsLossesDrawsForRound <- function(curround) {
  sumvalue <- str_count(curround,"W") + str_count(curround,"L") + str_count(curround,"D")
  return (sumvalue)
}

2. Define Function findNumberOfGamesPlayed

This function returns the total number of games played by a player.

findNumberOfGamesPlayed <-function(round1,
                                   round2,
                                   round3,
                                   round4,
                                   round5,
                                   round6,
                                   round7) {
    numgames <-  findSumOfAllWinsLossesDrawsForRound (round1) + findSumOfAllWinsLossesDrawsForRound (round2) + findSumOfAllWinsLossesDrawsForRound (round3) + findSumOfAllWinsLossesDrawsForRound (round4) + findSumOfAllWinsLossesDrawsForRound (round5) + findSumOfAllWinsLossesDrawsForRound (round6) + findSumOfAllWinsLossesDrawsForRound (round7)
  return(numgames)
}

This function returns the opponent ID from the current round field. The field coming is in this format: “X: somenumber.” X represents either W, L, D, H, or etc. I am only extracting the opponent ID which is represented by somenumber. I know there is a space in between X: and somenumber and I was using that to get the opponent ID using str_extract function.

3. Define Function extractOpponentID

extractOpponentID <- function(curround) {
  opponentid <- str_extract(curround," \\d*$")
  opponentid <- str_trim(opponentid)
  opponentid <- ifelse(!is.na(opponentid),opponentid,0)
  return(opponentid)
}

4. Define Function extractPlayerRating

This function will return the player’s pre-tournament rating as required by the Project 1 specs. The input string could be in the format “11146376 / R: 1649 ->1673” or “15142253 / R: 1641P17->1657P24”. The goal for this function is to get the number between R: and -> If the resulting string contains P, I have to extract that from average rating. The function below uses a combination of str_extract, regexec, and regmataches functions to return the required player rating.

extractPlayerRating <- function(ratingstring) {
  candrating <- str_extract(ratingstring,"R:\\s*\\w*\\s*->")
  # [EXT]
  candrating <- regmatches(candrating,regexec("R:(.*?)->",candrating))[[1]][2]
  candrating <- str_trim(candrating)
  if (str_count(candrating,"P")>0)
    candrating <- regmatches(candrating,regexec("(.*?)P",candrating))[[1]][2]
  return(candrating)
}

5. Define Function retrieveOpponentRating

This function returns the opponent rating associated with the opponent ID. The opponent rating was already created by the extractPlayerRating function and this is just a matter of returning the player rating value based on opponent ID.

retrieveOpponentRating <- function(opponentid) {
  opponentrating <- 0
  if (opponentid != 0)
    opponentrating <- dataframe$playerrating[dataframe$id == opponentid]
  opponentrating <- as.numeric(opponentrating)
  return (opponentrating)
}

6. Define Function calculateAverageOpponentRating

This function takes in all of the opponent ratings of opponents (0 if they do not exist) and divide them by number of games played. The result will be the average opponent rating as required by Project 1 specs.

calculateAverageOpponentRating <- function(opponentrating1,
                                      opponentrating2,
                                      opponentrating3,
                                      opponentrating4,
                                      opponentrating5,
                                      opponentrating6,
                                      opponentrating7,
                                      numgames) {
    calcvalue <- (opponentrating1 + opponentrating2 + opponentrating3 + opponentrating4 + opponentrating5 + opponentrating6 + opponentrating7)  / numgames
    calcvalue <- round(calcvalue,0)
    return(calcvalue)
}

7. Define Function createModifiedName

All of the Player Name strings in the Input file are capitalized. According to the Project 1 specs, the expected output is that only the first letter of first, middle (initial), and last names are upper case. This function converts the capitalized name into the appropriate result. For example, “ROMERL ELIZES” becomes “Romerl Elizes” OR “CAMERON WILLIAM MC LEMAN” becomes “Cameron William Mc Leman”.

The function initially separates the name into a String Vector. I convert any character from the second place onto the end of the string into lower case. Then I combine the strings together to the name required by Project 1 specs.

I got to see the big difference of paste and paste0. Pretty useful and powerful. paste by default pastes strings together with a space in between them while paste0 just pastes strings together with no space in between them.

createModifiedName <- function(name) {
  nameVector <- strsplit(name, " ")[[1]]
  modifiedName <- ""
  for (i in 1:length(nameVector)) {
    testName <- nameVector[i]
    candName <- ""
    if (nchar(testName)==1)
      candName <- testName
    else {
      candName <- substr(testName,1,1)
      otherLetters <- substr(testName,2,nchar(testName))
      otherLetters <- tolower(otherLetters)
      candName <- paste0(candName, otherLetters)
    }
    modifiedName <- paste (modifiedName, candName)
  }
  
  modifiedName <- str_trim(modifiedName)

  return(modifiedName)
}

B. Add Field playerrating and extract them from ID placeholder

I added playerrating field and found that mapply will automatically call extractPlayerRating function with the ID placeholder field I created initially. mapply is a very powerful function.

dataframe$playerrating <- mapply(extractPlayerRating,dataframe$idph)

C. Add Field numgames and calculateNumber of Games Played

I added numgames field and used mapply to call findNumberOfGamesPlayed function with all round strings as input.

# [MAC]
dataframe$numgames <- mapply(findNumberOfGamesPlayed,dataframe$rdph1,dataframe$rdph2,dataframe$rdph3,dataframe$rdph4,dataframe$rdph5,dataframe$rdph6,dataframe$rdph7)

D. Add Fields opponentid’s and extract them from Round Place Holder fields

I added 7 opponent fields and used mapply to retrieve their respective opponent IDs.

dataframe$opponent1 <- mapply(extractOpponentID,dataframe$rdph1)
dataframe$opponent2 <- mapply(extractOpponentID,dataframe$rdph2)
dataframe$opponent3 <- mapply(extractOpponentID,dataframe$rdph3)
dataframe$opponent4 <- mapply(extractOpponentID,dataframe$rdph4)
dataframe$opponent5 <- mapply(extractOpponentID,dataframe$rdph5)
dataframe$opponent6 <- mapply(extractOpponentID,dataframe$rdph6)
dataframe$opponent7 <- mapply(extractOpponentID,dataframe$rdph7)

E. Add Fields opponentrating’s and retrieve them based on opponentid

I added 7 opponent ratings and used mapply to retrieve their respective opponent ratings.

dataframe$opponentrating1 <- mapply(retrieveOpponentRating,dataframe$opponent1)
dataframe$opponentrating2 <- mapply(retrieveOpponentRating,dataframe$opponent2)
dataframe$opponentrating3 <- mapply(retrieveOpponentRating,dataframe$opponent3)
dataframe$opponentrating4 <- mapply(retrieveOpponentRating,dataframe$opponent4)
dataframe$opponentrating5 <- mapply(retrieveOpponentRating,dataframe$opponent5)
dataframe$opponentrating6 <- mapply(retrieveOpponentRating,dataframe$opponent6)
dataframe$opponentrating7 <- mapply(retrieveOpponentRating,dataframe$opponent7)

F. Add Field averageopponentrating and calculate them based on opponentratings and numgames

Based on the newly created temp fields of all opponent ratings and number of games played, I added an opponentavgrating field and used mapply to retrieve the average opponent rating by calling the calculateAverageOpponentRating function.

dataframe$opponentavgrating <- mapply(calculateAverageOpponentRating,
                                      dataframe$opponentrating1,
                                      dataframe$opponentrating2,
                                      dataframe$opponentrating3,
                                      dataframe$opponentrating4,
                                      dataframe$opponentrating5,
                                      dataframe$opponentrating6,
                                      dataframe$opponentrating7,
                                      dataframe$numgames)

G. Add field modifiedname and convert them based on upper and lower case requirements of project

I added modifiedName field and used mapply to return the modified name with proper upper and lower case placements as required by the Project 1 specs.

dataframe$modifiedname <- mapply(createModifiedName,dataframe$name)

V. Create Smaller DataFrame containing all Required Project Values

A smaller data frame was created to contain only the following fields as required by Project 1 specs: the modified player name, the player state, total points, player pre-chess rating, and average rating of opponents.

dataframecand <- data.frame(dataframe$modifiedname,dataframe$num,dataframe$total, dataframe$playerrating, dataframe$opponentavgrating)
names(dataframecand) <- c("Player Name", "Player State", "Total Points", "Player Pre-Chess Rating", "Avg. Pre-Chess Ratings of Opponents")
dataframecand

##                   Player Name Player State Total Points
## 1                    Gary Hua           ON          6.0
## 2             Dakshesh Daruri           MI          6.0
## 3                Aditya Bajaj           MI          6.0
## 4         Patrick H Schilling           MI          5.5
## 5                  Hanshi Zuo           MI          5.5
## 6                 Hansen Song           OH          5.0
## 7           Gary Dee Swathell           MI          5.0
## 8            Ezekiel Houghton           MI          5.0
## 9                 Stefano Lee           ON          5.0
## 10                  Anvit Rao           MI          5.0
## 11   Cameron William Mc Leman           MI          4.5
## 12             Kenneth J Tack           MI          4.5
## 13          Torrance Henry Jr           MI          4.5
## 14               Bradley Shaw           MI          4.5
## 15     Zachary James Houghton           MI          4.5
## 16               Mike Nikitin           MI          4.0
## 17         Ronald Grzegorczyk           MI          4.0
## 18              David Sundeen           MI          4.0
## 19               Dipankar Roy           MI          4.0
## 20                Jason Zheng           MI          4.0
## 21              Dinh Dang Bui           ON          4.0
## 22           Eugene L Mcclure           MI          4.0
## 23                   Alan Bui           ON          4.0
## 24          Michael R Aldrich           MI          4.0
## 25           Loren Schwiebert           MI          3.5
## 26                    Max Zhu           ON          3.5
## 27             Gaurav Gidwani           MI          3.5
## 28 Sofia Adina Stanescu-bellu           MI          3.5
## 29           Chiedozie Okorie           MI          3.5
## 30         George Avery Jones           ON          3.5
## 31               Rishi Shetty           MI          3.5
## 32      Joshua Philip Mathews           ON          3.5
## 33                    Jade Ge           MI          3.5
## 34     Michael Jeffery Thomas           MI          3.5
## 35           Joshua David Lee           MI          3.5
## 36              Siddharth Jha           MI          3.5
## 37       Amiyatosh Pwnanandam           MI          3.5
## 38                  Brian Liu           MI          3.0
## 39              Joel R Hendon           MI          3.0
## 40               Forest Zhang           MI          3.0
## 41        Kyle William Murphy           MI          3.0
## 42                   Jared Ge           MI          3.0
## 43          Robert Glen Vasey           MI          3.0
## 44         Justin D Schilling           MI          3.0
## 45                  Derek Yan           MI          3.0
## 46   Jacob Alexander Lavalley           MI          3.0
## 47                Eric Wright           MI          2.5
## 48               Daniel Khain           MI          2.5
## 49           Michael J Martin           MI          2.5
## 50                 Shivam Jha           MI          2.5
## 51             Tejas Ayyagari           MI          2.5
## 52                  Ethan Guo           MI          2.5
## 53              Jose C Ybarra           MI          2.0
## 54                Larry Hodge           MI          2.0
## 55                  Alex Kong           MI          2.0
## 56               Marisa Ricci           MI          2.0
## 57                 Michael Lu           MI          2.0
## 58               Viraj Mohile           MI          2.0
## 59          Sean M Mc Cormick           MI          2.0
## 60                 Julia Shen           MI          1.5
## 61              Jezzel Farkas           ON          1.5
## 62              Ashwin Balaji           MI          1.0
## 63       Thomas Joseph Hosmer           MI          1.0
## 64                     Ben Li           MI          1.0
##    Player Pre-Chess Rating Avg. Pre-Chess Ratings of Opponents
## 1                     1794                                1605
## 2                     1553                                1469
## 3                     1384                                1564
## 4                     1716                                1574
## 5                     1655                                1501
## 6                     1686                                1519
## 7                     1649                                1372
## 8                     1641                                1468
## 9                     1411                                1523
## 10                    1365                                1554
## 11                    1712                                1468
## 12                    1663                                1506
## 13                    1666                                1498
## 14                    1610                                1515
## 15                    1220                                1484
## 16                    1604                                1386
## 17                    1629                                1499
## 18                    1600                                1480
## 19                    1564                                1426
## 20                    1595                                1411
## 21                    1563                                1470
## 22                    1555                                1300
## 23                    1363                                1214
## 24                    1229                                1357
## 25                    1745                                1363
## 26                    1579                                1507
## 27                    1552                                1222
## 28                    1507                                1522
## 29                    1602                                1314
## 30                    1522                                1144
## 31                    1494                                1260
## 32                    1441                                1379
## 33                    1449                                1277
## 34                    1399                                1375
## 35                    1438                                1150
## 36                    1355                                1388
## 37                     980                                1385
## 38                    1423                                1539
## 39                    1436                                1430
## 40                    1348                                1391
## 41                    1403                                1248
## 42                    1332                                1150
## 43                    1283                                1107
## 44                    1199                                1327
## 45                    1242                                1152
## 46                     377                                1358
## 47                    1362                                1392
## 48                    1382                                1356
## 49                    1291                                1286
## 50                    1056                                1296
## 51                    1011                                1356
## 52                     935                                1495
## 53                    1393                                1345
## 54                    1270                                1206
## 55                    1186                                1406
## 56                    1153                                1414
## 57                    1092                                1363
## 58                     917                                1391
## 59                     853                                1319
## 60                     967                                1330
## 61                     955                                1327
## 62                    1530                                1186
## 63                    1175                                1350
## 64                    1163                                1263

VI. Export output to a CSV file to the local machine and Print file contents for Viewers to see

Per Project 1 requirements, the smaller data frame containing all the correct data is outputted onto a CSV file which I printed to the screen here to show the resulting values to the users.

#[KER]
write.csv(dataframecand,"Elizes_Project1.csv")
cat (readLines('Elizes_Project1.csv'), sep = '\n')

## "","Player Name","Player State","Total Points","Player Pre-Chess Rating","Avg. Pre-Chess Ratings of Opponents"
## "1","Gary Hua","ON","6.0","1794",1605
## "2","Dakshesh Daruri","MI","6.0","1553",1469
## "3","Aditya Bajaj","MI","6.0","1384",1564
## "4","Patrick H Schilling","MI","5.5","1716",1574
## "5","Hanshi Zuo","MI","5.5","1655",1501
## "6","Hansen Song","OH","5.0","1686",1519
## "7","Gary Dee Swathell","MI","5.0","1649",1372
## "8","Ezekiel Houghton","MI","5.0","1641",1468
## "9","Stefano Lee","ON","5.0","1411",1523
## "10","Anvit Rao","MI","5.0","1365",1554
## "11","Cameron William Mc Leman","MI","4.5","1712",1468
## "12","Kenneth J Tack","MI","4.5","1663",1506
## "13","Torrance Henry Jr","MI","4.5","1666",1498
## "14","Bradley Shaw","MI","4.5","1610",1515
## "15","Zachary James Houghton","MI","4.5","1220",1484
## "16","Mike Nikitin","MI","4.0","1604",1386
## "17","Ronald Grzegorczyk","MI","4.0","1629",1499
## "18","David Sundeen","MI","4.0","1600",1480
## "19","Dipankar Roy","MI","4.0","1564",1426
## "20","Jason Zheng","MI","4.0","1595",1411
## "21","Dinh Dang Bui","ON","4.0","1563",1470
## "22","Eugene L Mcclure","MI","4.0","1555",1300
## "23","Alan Bui","ON","4.0","1363",1214
## "24","Michael R Aldrich","MI","4.0","1229",1357
## "25","Loren Schwiebert","MI","3.5","1745",1363
## "26","Max Zhu","ON","3.5","1579",1507
## "27","Gaurav Gidwani","MI","3.5","1552",1222
## "28","Sofia Adina Stanescu-bellu","MI","3.5","1507",1522
## "29","Chiedozie Okorie","MI","3.5","1602",1314
## "30","George Avery Jones","ON","3.5","1522",1144
## "31","Rishi Shetty","MI","3.5","1494",1260
## "32","Joshua Philip Mathews","ON","3.5","1441",1379
## "33","Jade Ge","MI","3.5","1449",1277
## "34","Michael Jeffery Thomas","MI","3.5","1399",1375
## "35","Joshua David Lee","MI","3.5","1438",1150
## "36","Siddharth Jha","MI","3.5","1355",1388
## "37","Amiyatosh Pwnanandam","MI","3.5","980",1385
## "38","Brian Liu","MI","3.0","1423",1539
## "39","Joel R Hendon","MI","3.0","1436",1430
## "40","Forest Zhang","MI","3.0","1348",1391
## "41","Kyle William Murphy","MI","3.0","1403",1248
## "42","Jared Ge","MI","3.0","1332",1150
## "43","Robert Glen Vasey","MI","3.0","1283",1107
## "44","Justin D Schilling","MI","3.0","1199",1327
## "45","Derek Yan","MI","3.0","1242",1152
## "46","Jacob Alexander Lavalley","MI","3.0","377",1358
## "47","Eric Wright","MI","2.5","1362",1392
## "48","Daniel Khain","MI","2.5","1382",1356
## "49","Michael J Martin","MI","2.5","1291",1286
## "50","Shivam Jha","MI","2.5","1056",1296
## "51","Tejas Ayyagari","MI","2.5","1011",1356
## "52","Ethan Guo","MI","2.5","935",1495
## "53","Jose C Ybarra","MI","2.0","1393",1345
## "54","Larry Hodge","MI","2.0","1270",1206
## "55","Alex Kong","MI","2.0","1186",1406
## "56","Marisa Ricci","MI","2.0","1153",1414
## "57","Michael Lu","MI","2.0","1092",1363
## "58","Viraj Mohile","MI","2.0","917",1391
## "59","Sean M Mc Cormick","MI","2.0","853",1319
## "60","Julia Shen","MI","1.5","967",1330
## "61","Jezzel Farkas","ON","1.5","955",1327
## "62","Ashwin Balaji","MI","1.0","1530",1186
## "63","Thomas Joseph Hosmer","MI","1.0","1175",1350
## "64","Ben Li","MI","1.0","1163",1263

References

[ADD] Add row to Dataframe. Retrieved from website: https://stackoverflow.com/questions/28467068/add-row-to-dataframe

[CRE] Creating an Empty Dataframe. Retrieved from website: https://stackoverflow.com/questions/10689055/create-an-empty-data-frame

[EXT] Extract Words between Symbols in R. Retrived from website: https://discuss.analyticsvidhya.com/t/extract-words-between-symbols-in-r/1469

[HOW] How to Read Text Files and Create a Data Frame in R. Retrieved from website: https://stackoverflow.com/questions/33384095/how-to-read-text-files-and-create-a-data-frame-in-r

[KER] Kerns, Jay. Introduction to Probability and Statistics Using R. 2011. Retrieved from website: http://www.atmos.albany.edu/facstaff/timm/ATM315spring14/R/IPSUR.pdf

[MAC] Machlis, Sharon. **4 Data Wrangling Tasks in R for Advanced Beginners.* ComputerWorld. 2015. Retrieved from website: https://www.computerworld.com/article/2486425/business-intelligence/business-intelligence-4-data-wrangling-tasks-in-r-for-advanced-beginners.html

[REM] Remove entries from string vector containing specific characters in R. Retrieved from website: https://stackoverflow.com/questions/40885360/remove-entries-from-string-vector-containing-specific-characters-in-r

[SAN] Sanchez, Gaston. Handling and Processing Strings in R. 2013. Trowchez Editions. Berkeley, CA.

[VEN] Venable, W; Smith D; R Core-Team. An Introduction to R. 2017. Notes on R: A Programming Environment for Data Analysis and Graphics. Retrieved from website: https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf

DATA607 - Project 1