Project 1

Install necessary packages. stringr for splitting and tidyr for splitting columns

install.packages("stringr", repos = "http://cran.us.r-project.org")

## Installing package into 'C:/Users/syeds/OneDrive/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)

install.packages("tidyr", repos = "http://cran.us.r-project.org")

## Installing package into 'C:/Users/syeds/OneDrive/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)

require(stringr)

## Loading required package: stringr

require(tidyr)

## Loading required package: tidyr

Read file into rawData data frame and name column as Info.

file <- "https://raw.githubusercontent.com/sadia-perveen/Project-1-DATA607/master/tournamentinfo.txt"
rawData <- read.delim(file)
names(rawData) <- c("Info")
head(rawData)

##                                                                                         Info
## 1  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 
## 2  Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## 3  -----------------------------------------------------------------------------------------
## 4      1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## 5     ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |
## 6  -----------------------------------------------------------------------------------------

Convert rawData data frame into a single string value and save it to rawData value.

rawData <- paste(rawData$Info, collapse = "")

Then, split rawData string between the dashes and save it into playerData data frame. And set column name to AllPlayerData. Splitting between the dashes creates a seperate row for each players data.

playerData <- data.frame(str_split(rawData, "\\-{2,}"))
names(playerData) <- c("AllPlayerData")
head(playerData$AllPlayerData)

## [1]  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round|  Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## [2]     1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|   ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |  
## [3]     2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|   MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |  
## [4]     3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12|   MI | 14959604 / R: 1384   ->1640     |N:2  |W    |B    |W    |B    |W    |B    |W    |  
## [5]     4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1|   MI | 12616049 / R: 1716   ->1744     |N:2  |W    |B    |W    |B    |W    |B    |B    |  
## [6]     5 | HANSHI ZUO                      |5.5  |W  45|W  37|D  12|D  13|D   4|W  14|W  17|   MI | 14601533 / R: 1655   ->1690     |N:2  |B    |W    |B    |W    |B    |W    |B    |  
## 66 Levels:  ...

Seperating data into columns based on apparent patterns for each row in PlayerData data frame and storing it into PlayerDataFrame. Then furthermore seperating PlayerDataFrame columns into multiple columns basked on apparent patterns.

playerDataFrame <- playerData %>% separate(AllPlayerData, c("PlayerNum", "PlayerName", "TotalPts", "Round1", "Round2", "Round3" , "Round4", "Round5", "Round6", "Round7", "State", "UscfInfo"), sep = "\\|")
playerDataFrame <- playerDataFrame %>% separate(UscfInfo, c("UscfId","regInfo"), sep = "\\/")
playerDataFrame <- playerDataFrame %>% separate(regInfo, c("Pre","Post"), sep = "\\->")
playerDataFrame <- playerDataFrame %>% separate(Round1, c("Round1Status","Round1Opponent"), sep = "\\  ")
playerDataFrame <- playerDataFrame %>% separate(Round2, c("Round2Status","Round2Opponent"), sep = "\\  ")
playerDataFrame <- playerDataFrame %>% separate(Round3, c("Round3Status","Round3Opponent"), sep = "\\  ")
playerDataFrame <- playerDataFrame %>% separate(Round4, c("Round4Status","Round4Opponent"), sep = "\\  ")
playerDataFrame <- playerDataFrame %>% separate(Round5, c("Round5Status","Round5Opponent"), sep = "\\  ")
playerDataFrame <- playerDataFrame %>% separate(Round6, c("Round6Status","Round6Opponent"), sep = "\\  ")
playerDataFrame <- playerDataFrame %>% separate(Round7, c("Round7Status","Round7Opponent"), sep = "\\  ")
head(playerDataFrame)

##   PlayerNum                        PlayerName TotalPts Round1Status
## 1     Pair   Player Name                         Total        Round
## 2        1   GARY HUA                            6.0              W
## 3        2   DAKSHESH DARURI                     6.0              W
## 4        3   ADITYA BAJAJ                        6.0              L
## 5        4   PATRICK H SCHILLING                 5.5              W
## 6        5   HANSHI ZUO                          5.5              W
##   Round1Opponent Round2Status Round2Opponent Round3Status Round3Opponent
## 1           <NA>        Round           <NA>        Round           <NA>
## 2             39            W             21            W             18
## 3             63            W             58            L              4
## 4              8            W             61            W             25
## 5             23            D             28            W              2
## 6             45            W             37            D             12
##   Round4Status Round4Opponent Round5Status Round5Opponent Round6Status
## 1        Round           <NA>        Round           <NA>        Round
## 2            W             14            W              7            D
## 3            W             17            W             16            W
## 4            W             21            W             11            W
## 5            W             26            D              5            W
## 6            D             13            D              4            W
##   Round6Opponent Round7Status Round7Opponent   State     UscfId
## 1           <NA>        Round           <NA>   Num     USCF ID 
## 2             12            D              4     ON   15445895 
## 3             20            W              7     MI   14598900 
## 4             13            W             12     MI   14959604 
## 5             19            D              1     MI   12616049 
## 6             14            W             17     MI   14601533 
##           Pre         Post
## 1    Rtg (Pre Post)       
## 2  R: 1794       1817     
## 3  R: 1553       1663     
## 4  R: 1384       1640     
## 5  R: 1716       1744     
## 6  R: 1655       1690

Removing the header row and ending empty row.

playerDataFrame <- playerDataFrame[-1, ]
playerDataFrame <- playerDataFrame[-nrow(playerDataFrame),]
head(playerDataFrame)

##   PlayerNum                        PlayerName TotalPts Round1Status
## 2        1   GARY HUA                            6.0              W
## 3        2   DAKSHESH DARURI                     6.0              W
## 4        3   ADITYA BAJAJ                        6.0              L
## 5        4   PATRICK H SCHILLING                 5.5              W
## 6        5   HANSHI ZUO                          5.5              W
## 7        6   HANSEN SONG                         5.0              W
##   Round1Opponent Round2Status Round2Opponent Round3Status Round3Opponent
## 2             39            W             21            W             18
## 3             63            W             58            L              4
## 4              8            W             61            W             25
## 5             23            D             28            W              2
## 6             45            W             37            D             12
## 7             34            D             29            L             11
##   Round4Status Round4Opponent Round5Status Round5Opponent Round6Status
## 2            W             14            W              7            D
## 3            W             17            W             16            W
## 4            W             21            W             11            W
## 5            W             26            D              5            W
## 6            D             13            D              4            W
## 7            W             35            D             10            W
##   Round6Opponent Round7Status Round7Opponent  State     UscfId         Pre
## 2             12            D              4    ON   15445895   R: 1794   
## 3             20            W              7    MI   14598900   R: 1553   
## 4             13            W             12    MI   14959604   R: 1384   
## 5             19            D              1    MI   12616049   R: 1716   
## 6             14            W             17    MI   14601533   R: 1655   
## 7             27            W             21    OH   15055204   R: 1686   
##        Post
## 2 1817     
## 3 1663     
## 4 1640     
## 5 1744     
## 6 1690     
## 7 1687

seperate out only the digits in playerDataFrame$Pre value using str_extract and pattern matching with ‘\d{1,}’. str_extract gets the first digit only value found with the pattern of atleast 1 digit.

playerDataFrame$Pre <- str_extract(playerDataFrame$Pre, '\\d{1,}')

Trim out any trailing or leading spaces in playerDataFrame$PlayerNum value to allow matching to player’s round opponents, also trim out trailing and leading spacing in all the opponent values.

playerDataFrame$PlayerNum <- trimws(playerDataFrame$PlayerNum)
playerDataFrame$Round1Opponent <- trimws(playerDataFrame$Round1Opponent)
playerDataFrame$Round2Opponent <- trimws(playerDataFrame$Round2Opponent)
playerDataFrame$Round3Opponent <- trimws(playerDataFrame$Round3Opponent)
playerDataFrame$Round4Opponent <- trimws(playerDataFrame$Round4Opponent)
playerDataFrame$Round5Opponent <- trimws(playerDataFrame$Round5Opponent)
playerDataFrame$Round6Opponent <- trimws(playerDataFrame$Round6Opponent)
playerDataFrame$Round7Opponent <- trimws(playerDataFrame$Round7Opponent)

Add a varaible AvgRating to PlayerDataFrame to store the value that will be generated.

playerDataFrame$AvgRating <- 0

Use a for loop to go through the data frame and pull all opponent ids for each player. Pull the Pre Chess rating for the found opponents and use a counter to keep track of all found opponents. Use a counter to store the sum of all found opponents Pre Chess Rating. Finally, calculte the avergae by dividing the sum by the total found opponents and store value in AvgRating value of PlayerDataFrame.

for (index in 1:nrow(playerDataFrame)){
  #roundCount value stores all found opponents for the current player.
  roundCount <- 0;
  #currTotal value stores the calcuated value of all found opponent's Pre Chess Rating.
  currTotal <- 0;
  #get all opponent rows for current player for all rounds.
  opponent1 <- playerDataFrame[index, "Round1Opponent"]
  opponent2 <- playerDataFrame[index, "Round2Opponent"]
  opponent3 <- playerDataFrame[index, "Round3Opponent"]
  opponent4 <- playerDataFrame[index, "Round4Opponent"]
  opponent5 <- playerDataFrame[index, "Round5Opponent"]
  opponent6 <- playerDataFrame[index, "Round6Opponent"]
  opponent7 <- playerDataFrame[index, "Round7Opponent"]
  
  #if opponent is found (!= "") then get the pre chess rating of opponent, increment roundCount and add pre chess rating to currTotal.
  if(opponent1 != ""){
    get <- as.integer(playerDataFrame[which(playerDataFrame$PlayerNum == opponent1),]$Pre)
    roundCount <- roundCount + 1
    currTotal = currTotal + get
  }
  if(opponent2 != ""){
    get <- as.integer(playerDataFrame[which(playerDataFrame$PlayerNum == opponent2),]$Pre)
    roundCount <- roundCount + 1
    currTotal = currTotal + get
  }
  if(opponent3 != ""){
    get <- as.integer(playerDataFrame[which(playerDataFrame$PlayerNum == opponent3),]$Pre)
    roundCount <- roundCount + 1
    currTotal = currTotal + get
  }
  if(opponent4 != ""){
    get <- as.integer(playerDataFrame[which(playerDataFrame$PlayerNum == opponent4),]$Pre)
    roundCount <- roundCount + 1
    currTotal = currTotal + get
  }
  if(opponent5 != ""){
    get <- as.integer(playerDataFrame[which(playerDataFrame$PlayerNum == opponent5),]$Pre)
    roundCount <- roundCount + 1
    currTotal = currTotal + get
  }
  if(opponent6 != ""){
    get <- as.integer(playerDataFrame[which(playerDataFrame$PlayerNum == opponent6),]$Pre)
    roundCount <- roundCount + 1
    currTotal = currTotal + get
  }
  if(opponent7 != ""){
    get <- as.integer(playerDataFrame[which(playerDataFrame$PlayerNum == opponent7),]$Pre)
    roundCount <- roundCount + 1
    currTotal = currTotal + get
  }
  #get the average chess rating for current player and store in AvgRating variable.
  playerDataFrame[index, "AvgRating"] <- trunc(currTotal/roundCount)
  
}

head(playerDataFrame)

##   PlayerNum                        PlayerName TotalPts Round1Status
## 2         1  GARY HUA                            6.0              W
## 3         2  DAKSHESH DARURI                     6.0              W
## 4         3  ADITYA BAJAJ                        6.0              L
## 5         4  PATRICK H SCHILLING                 5.5              W
## 6         5  HANSHI ZUO                          5.5              W
## 7         6  HANSEN SONG                         5.0              W
##   Round1Opponent Round2Status Round2Opponent Round3Status Round3Opponent
## 2             39            W             21            W             18
## 3             63            W             58            L              4
## 4              8            W             61            W             25
## 5             23            D             28            W              2
## 6             45            W             37            D             12
## 7             34            D             29            L             11
##   Round4Status Round4Opponent Round5Status Round5Opponent Round6Status
## 2            W             14            W              7            D
## 3            W             17            W             16            W
## 4            W             21            W             11            W
## 5            W             26            D              5            W
## 6            D             13            D              4            W
## 7            W             35            D             10            W
##   Round6Opponent Round7Status Round7Opponent  State     UscfId  Pre
## 2             12            D              4    ON   15445895  1794
## 3             20            W              7    MI   14598900  1553
## 4             13            W             12    MI   14959604  1384
## 5             19            D              1    MI   12616049  1716
## 6             14            W             17    MI   14601533  1655
## 7             27            W             21    OH   15055204  1686
##        Post AvgRating
## 2 1817           1605
## 3 1663           1469
## 4 1640           1563
## 5 1744           1573
## 6 1690           1500
## 7 1687           1518

Write required variables (columns) to csv file.

playerDataFrame[,c("PlayerName", "State", "TotalPts", "Pre", "AvgRating")]

##                           PlayerName  State TotalPts  Pre AvgRating
## 2   GARY HUA                            ON     6.0   1794      1605
## 3   DAKSHESH DARURI                     MI     6.0   1553      1469
## 4   ADITYA BAJAJ                        MI     6.0   1384      1563
## 5   PATRICK H SCHILLING                 MI     5.5   1716      1573
## 6   HANSHI ZUO                          MI     5.5   1655      1500
## 7   HANSEN SONG                         OH     5.0   1686      1518
## 8   GARY DEE SWATHELL                   MI     5.0   1649      1372
## 9   EZEKIEL HOUGHTON                    MI     5.0   1641      1468
## 10  STEFANO LEE                         ON     5.0   1411      1523
## 11  ANVIT RAO                           MI     5.0   1365      1554
## 12  CAMERON WILLIAM MC LEMAN            MI     4.5   1712      1467
## 13  KENNETH J TACK                      MI     4.5   1663      1506
## 14  TORRANCE HENRY JR                   MI     4.5   1666      1497
## 15  BRADLEY SHAW                        MI     4.5   1610      1515
## 16  ZACHARY JAMES HOUGHTON              MI     4.5   1220      1483
## 17  MIKE NIKITIN                        MI     4.0   1604      1385
## 18  RONALD GRZEGORCZYK                  MI     4.0   1629      1498
## 19  DAVID SUNDEEN                       MI     4.0   1600      1480
## 20  DIPANKAR ROY                        MI     4.0   1564      1426
## 21  JASON ZHENG                         MI     4.0   1595      1410
## 22  DINH DANG BUI                       ON     4.0   1563      1470
## 23  EUGENE L MCCLURE                    MI     4.0   1555      1300
## 24  ALAN BUI                            ON     4.0   1363      1213
## 25  MICHAEL R ALDRICH                   MI     4.0   1229      1357
## 26  LOREN SCHWIEBERT                    MI     3.5   1745      1363
## 27  MAX ZHU                             ON     3.5   1579      1506
## 28  GAURAV GIDWANI                      MI     3.5   1552      1221
## 29  SOFIA ADINA STANESCU-BELLU          MI     3.5   1507      1522
## 30  CHIEDOZIE OKORIE                    MI     3.5   1602      1313
## 31  GEORGE AVERY JONES                  ON     3.5   1522      1144
## 32  RISHI SHETTY                        MI     3.5   1494      1259
## 33  JOSHUA PHILIP MATHEWS               ON     3.5   1441      1378
## 34  JADE GE                             MI     3.5   1449      1276
## 35  MICHAEL JEFFERY THOMAS              MI     3.5   1399      1375
## 36  JOSHUA DAVID LEE                    MI     3.5   1438      1149
## 37  SIDDHARTH JHA                       MI     3.5   1355      1388
## 38  AMIYATOSH PWNANANDAM                MI     3.5    980      1384
## 39  BRIAN LIU                           MI     3.0   1423      1539
## 40  JOEL R HENDON                       MI     3.0   1436      1429
## 41  FOREST ZHANG                        MI     3.0   1348      1390
## 42  KYLE WILLIAM MURPHY                 MI     3.0   1403      1248
## 43  JARED GE                            MI     3.0   1332      1149
## 44  ROBERT GLEN VASEY                   MI     3.0   1283      1106
## 45  JUSTIN D SCHILLING                  MI     3.0   1199      1327
## 46  DEREK YAN                           MI     3.0   1242      1152
## 47  JACOB ALEXANDER LAVALLEY            MI     3.0    377      1357
## 48  ERIC WRIGHT                         MI     2.5   1362      1392
## 49  DANIEL KHAIN                        MI     2.5   1382      1355
## 50  MICHAEL J MARTIN                    MI     2.5   1291      1285
## 51  SHIVAM JHA                          MI     2.5   1056      1296
## 52  TEJAS AYYAGARI                      MI     2.5   1011      1356
## 53  ETHAN GUO                           MI     2.5    935      1494
## 54  JOSE C YBARRA                       MI     2.0   1393      1345
## 55  LARRY HODGE                         MI     2.0   1270      1206
## 56  ALEX KONG                           MI     2.0   1186      1406
## 57  MARISA RICCI                        MI     2.0   1153      1414
## 58  MICHAEL LU                          MI     2.0   1092      1363
## 59  VIRAJ MOHILE                        MI     2.0    917      1391
## 60  SEAN M MC CORMICK                   MI     2.0    853      1319
## 61  JULIA SHEN                          MI     1.5    967      1330
## 62  JEZZEL FARKAS                       ON     1.5    955      1327
## 63  ASHWIN BALAJI                       MI     1.0   1530      1186
## 64  THOMAS JOSEPH HOSMER                MI     1.0   1175      1350
## 65  BEN LI                              MI     1.0   1163      1263

write.csv(playerDataFrame[,c("PlayerName", "State", "TotalPts", "Pre", "AvgRating")], file="chessOutputFile.csv", row.names=FALSE)

Project 1

Sadia Perveen

9/22/2019