Dplyer will be used to transform the csv file. The file is imported using the ‘|’ separator to clean up the file upon import. This creates a pseudo-tabular dataframe.
library (dplyr)
#Import the CSV
sourcecsv <- read.table("https://raw.githubusercontent.com/johnnydrodriguez/data_607_week4/main/tournamentinfo.txt", header = FALSE, sep = '|', na.strings="", fill = TRUE)
knitr::kable(head(sourcecsv))
V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 |
---|---|---|---|---|---|---|---|---|---|---|
—————————————————————————————– | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Pair | Player Name | Total | Round | Round | Round | Round | Round | Round | Round | NA |
Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 | NA |
—————————————————————————————– | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1 | GARY HUA | 6.0 | W 39 | W 21 | W 18 | W 14 | W 7 | D 12 | D 4 | NA |
ON | 15445895 / R: 1794 ->1817 | N:2 | W | B | W | B | W | B | W | NA |
To begin cleaning up the source file, a column of NA’s was removed, rows with NA values were removed, and the first two unneeded rows of the imported file were removed as well.
Once these NA rows and columns were eliminated, the remaining columns could be renamed.
And finally, leading and trailing white spaces in several columns were removed.
#Drops the unneeded NA column
sourcecsv <- sourcecsv[-c(11)]
#Drops the unneeded NA rows
sourcecsv <- na.omit(sourcecsv)
#Drops the unneeded first 2 rows
sourcecsv <- sourcecsv[-c(1:2),]
#Renames the remaining columns
colnames(sourcecsv) <- c('ID', 'Name', 'Points', 'Round1', 'Round2', 'Round3', 'Round4', 'Round5', 'Round6', 'Round7')
#Removes the leading and trailing white spaces
sourcecsv$ID <- trimws(sourcecsv$ID, which = c("both"))
sourcecsv$Name <- trimws(sourcecsv$Name, which = c("both"))
sourcecsv$Points <- trimws(sourcecsv$Points, which = c("both"))
knitr::kable(head(sourcecsv))
ID | Name | Points | Round1 | Round2 | Round3 | Round4 | Round5 | Round6 | Round7 | |
---|---|---|---|---|---|---|---|---|---|---|
5 | 1 | GARY HUA | 6.0 | W 39 | W 21 | W 18 | W 14 | W 7 | D 12 | D 4 |
6 | ON | 15445895 / R: 1794 ->1817 | N:2 | W | B | W | B | W | B | W |
8 | 2 | DAKSHESH DARURI | 6.0 | W 63 | W 58 | L 4 | W 17 | W 16 | W 20 | W 7 |
9 | MI | 14598900 / R: 1553 ->1663 | N:2 | B | W | B | W | B | W | B |
11 | 3 | ADITYA BAJAJ | 6.0 | L 8 | W 61 | W 25 | W 21 | W 11 | W 13 | W 12 |
12 | MI | 14959604 / R: 1384 ->1640 | N:2 | W | B | W | B | W | B | W |
Because of the alternating pattern of the sourcecsv dataframe rows, the sourcecsv dataframe was split based on the ID column. This allowed for two dataframes: one based on the actual ID (numerical) values and the other based on the (non-numerical) State values.
The first table retained the column names of the original sourcecsv. The second table column names were updated to reflect their column data.
Once both tables were split, they were re-merged based on index to create a new flat, wide table. The merged table is where all additional transformation and calculation will occur.
#Creates table 1 based on numerical ID data values the column
chess1 <- sourcecsv %>% filter(!(is.na(as.numeric(sourcecsv$ID))))
#Creates table 2 based on non numerical State values in column
chess2 <- sourcecsv %>% filter((is.na(as.numeric(sourcecsv$ID))))
#Renames columns in second split table
colnames(chess2) <- c('State', 'Rank', 'Nvalue', 'R1Color', 'R2Color', 'R3Color', 'R4Color', 'R5Color', 'R6Color', 'R7Color')
#Merges the split tables into a single flat, wide table by index.
chess3 <- merge(chess1, chess2, by=0)
knitr::kable(head(chess3))
Row.names | ID | Name | Points | Round1 | Round2 | Round3 | Round4 | Round5 | Round6 | Round7 | State | Rank | Nvalue | R1Color | R2Color | R3Color | R4Color | R5Color | R6Color | R7Color |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | GARY HUA | 6.0 | W 39 | W 21 | W 18 | W 14 | W 7 | D 12 | D 4 | ON | 15445895 / R: 1794 ->1817 | N:2 | W | B | W | B | W | B | W |
10 | 10 | ANVIT RAO | 5.0 | D 16 | L 19 | W 55 | W 31 | D 6 | W 25 | W 18 | MI | 14150362 / R: 1365 ->1544 | N:3 | W | W | B | B | W | B | W |
11 | 11 | CAMERON WILLIAM MC LEMAN | 4.5 | D 38 | W 56 | W 6 | L 7 | L 3 | W 34 | W 26 | MI | 12581589 / R: 1712 ->1696 | N:3 | B | W | B | W | B | W | B |
12 | 12 | KENNETH J TACK | 4.5 | W 42 | W 33 | D 5 | W 38 | H | D 1 | L 3 | MI | 12681257 / R: 1663 ->1670 | N:3 | W | B | W | B | W | B | |
13 | 13 | TORRANCE HENRY JR | 4.5 | W 36 | W 27 | L 7 | D 5 | W 33 | L 3 | W 32 | MI | 15082995 / R: 1666 ->1662 | N:3 | B | W | B | B | W | W | B |
14 | 14 | BRADLEY SHAW | 4.5 | W 54 | W 44 | W 8 | L 1 | D 27 | L 5 | W 31 | MI | 10131499 / R: 1610 ->1618 | N:3 | W | B | W | W | B | B | W |
Since the Opponent ID column for every round is needed for the calculation, the single Round columns containing this information were split into the two separate Round Result and Round Opponent columns.
chess3 <- tidyr::separate(data = chess3, col = Round1, sep = " ", into = c('R1Result', 'R1Opponent'), remove = TRUE)
chess3 <- tidyr::separate(data = chess3, col = Round2, sep = " ", into = c('R2Result', 'R2Opponent'), remove = TRUE)
chess3 <- tidyr::separate(data = chess3, col = Round3, sep = " ", into = c('R3Result', 'R3Opponent'), remove = TRUE)
chess3 <- tidyr::separate(data = chess3, col = Round4, sep = " ", into = c('R4Result', 'R4Opponent'), remove = TRUE)
chess3 <- tidyr::separate(data = chess3, col = Round5, sep = " ", into = c('R5Result', 'R5Opponent'), remove = TRUE)
chess3 <- tidyr::separate(data = chess3, col = Round6, sep = " ", into = c('R6Result', 'R6Opponent'), remove = TRUE)
chess3 <- tidyr::separate(data = chess3, col = Round7, sep = " ", into = c('R7Result', 'R7Opponent'), remove = TRUE)
knitr::kable(head(chess3[c('Name', 'R1Opponent', 'R2Opponent', 'R3Opponent', 'R4Opponent', 'R5Opponent', 'R6Opponent', 'R7Opponent')]))
Name | R1Opponent | R2Opponent | R3Opponent | R4Opponent | R5Opponent | R6Opponent | R7Opponent |
---|---|---|---|---|---|---|---|
GARY HUA | 39 | 21 | 18 | 14 | 7 | 12 | 4 |
ANVIT RAO | 16 | 19 | 55 | 31 | 6 | 25 | 18 |
CAMERON WILLIAM MC LEMAN | 38 | 56 | 6 | 7 | 3 | 34 | 26 |
KENNETH J TACK | 42 | 33 | 5 | 38 | 1 | 3 | |
TORRANCE HENRY JR | 36 | 27 | 7 | 5 | 33 | 3 | 32 |
BRADLEY SHAW | 54 | 44 | 8 | 1 | 27 | 5 | 31 |
In the initial split of the sourcecsv dataframe, the column containing the PreRating values named ‘Rank’. This column contained multiple separate values that were split from the PreRating value.
To perform this split correctly, the white spaces within the Rank column were removed. This permitted the column split to occur along the “:” and “P” characters as delimiters.
#Removes the internal white spaces between the various values
chess3$Rank <- gsub(" ","",chess3$Rank)
#Splits the initial Rank column into its four distinct values
chess3 <- tidyr::separate(data = chess3,col = Rank, sep = ":", into = c('left', 'right'), remove = TRUE)
chess3 <- tidyr::separate(data = chess3,col = right, sep = "-", into = c('left2', 'right2'), remove = TRUE)
chess3 <- tidyr::separate(data = chess3,col = left2, sep = "P", into = c('left3', 'right3'), remove = TRUE)
#Renames the resulting columns including the PreRating column used in the average opponent calculation
chess3 <- chess3 %>% rename(Rvalue = left, PreRating = left3, Provisional = right3, PostRating = right2)
knitr::kable(head(chess3[c('Name', 'PreRating')]))
Name | PreRating |
---|---|
GARY HUA | 1794 |
ANVIT RAO | 1365 |
CAMERON WILLIAM MC LEMAN | 1712 |
KENNETH J TACK | 1663 |
TORRANCE HENRY JR | 1666 |
BRADLEY SHAW | 1610 |
To perform the remaining transformations, several columns were converted to numeric. This has a secondary effect of removing leading or trailing whitespace characters.
# ID, Points, PreRating, and all Opponent id columns were converted to numeric.
chess3$ID <- as.numeric(as.character(chess3$ID))
chess3$Points <- as.numeric(as.character(chess3$Points))
chess3$PreRating <- as.numeric(as.character(chess3$PreRating))
chess3$R1Opponent <- as.numeric(as.character(chess3$R1Opponent))
chess3$R2Opponent <- as.numeric(as.character(chess3$R2Opponent))
chess3$R3Opponent <- as.numeric(as.character(chess3$R3Opponent))
chess3$R4Opponent <- as.numeric(as.character(chess3$R4Opponent))
chess3$R5Opponent <- as.numeric(as.character(chess3$R5Opponent))
chess3$R6Opponent <- as.numeric(as.character(chess3$R6Opponent))
chess3$R7Opponent <- as.numeric(as.character(chess3$R7Opponent))
str(chess3)
## 'data.frame': 64 obs. of 31 variables:
## $ Row.names : 'AsIs' chr "1" "10" "11" "12" ...
## $ ID : num 1 10 11 12 13 14 15 16 17 18 ...
## $ Name : chr "GARY HUA" "ANVIT RAO" "CAMERON WILLIAM MC LEMAN" "KENNETH J TACK" ...
## $ Points : num 6 5 4.5 4.5 4.5 4.5 4.5 4 4 4 ...
## $ R1Result : chr "W" "D" "D" "W" ...
## $ R1Opponent : num 39 16 38 42 36 54 19 10 48 47 ...
## $ R2Result : chr "W" "L" "W" "W" ...
## $ R2Opponent : num 21 19 56 33 27 44 16 15 41 9 ...
## $ R3Result : chr "W" "W" "W" "D" ...
## $ R3Opponent : num 18 55 6 5 7 8 30 NA 26 1 ...
## $ R4Result : chr "W" "W" "L" "W" ...
## $ R4Opponent : num 14 31 7 38 5 1 22 39 2 32 ...
## $ R5Result : chr "W" "D" "L" "H" ...
## $ R5Opponent : num 7 6 3 NA 33 27 54 2 23 19 ...
## $ R6Result : chr "D" "W" "W" "D" ...
## $ R6Opponent : num 12 25 34 1 3 5 33 36 22 38 ...
## $ R7Result : chr "D" "W" "W" "L" ...
## $ R7Opponent : num 4 18 26 3 32 31 38 NA 5 10 ...
## $ State : chr "ON" "MI" "MI" "MI" ...
## $ Rvalue : chr "15445895/R" "14150362/R" "12581589/R" "12681257/R" ...
## $ PreRating : num 1794 1365 1712 1663 1666 ...
## $ Provisional: chr NA NA NA NA ...
## $ PostRating : chr ">1817" ">1544" ">1696" ">1670" ...
## $ Nvalue : chr "N:2" "N:3" "N:3" "N:3" ...
## $ R1Color : chr "W " "W " "B " "W " ...
## $ R2Color : chr "B " "W " "W " "B " ...
## $ R3Color : chr "W " "B " "B " "W " ...
## $ R4Color : chr "B " "B " "W " "B " ...
## $ R5Color : chr "W " "W " "B " " " ...
## $ R6Color : chr "B " "B " "W " "W " ...
## $ R7Color : chr "W " "W " "B " "B " ...
Through 7 table self merges, a new column was added for every round opponent’s PreRating value. This merge has the affect of matching opponent to their PreRating and inserting the PreRating value in the new column.
After each merge, the resulting columns are renamed to reflect the round data. All NA rows (created by Bi weeks where no opponent is matched) are also deleted after merge.
# Merges Round 1 Opponents and PreRatings and inserts value into Round 1 PreRating column
chess3 <- merge(chess3, chess3[c("ID", "PreRating") ], by.x = "R1Opponent", by.y = "ID", all = TRUE, na.rm = TRUE)
chess3 <- chess3 %>% rename(R1ORating = PreRating.y, PreRating = PreRating.x)
chess3 <- chess3 %>% filter(!is.na(ID))
# Merges Round 2 Opponents and PreRatings and inserts value into Round 2 PreRating column
chess3 <- merge(chess3, chess3[c("ID" , "PreRating") ], by.x = "R2Opponent", by.y = "ID", all = TRUE)
chess3 <- chess3 %>% rename(R2ORating = PreRating.y, PreRating = PreRating.x)
chess3 <- chess3 %>% filter(!is.na(ID))
# Merges Round 3 Opponents and PreRatings and inserts value into Round 3 PreRating column
chess3 <- merge(chess3, chess3[c("ID" , "PreRating") ], by.x = "R3Opponent", by.y = "ID", all = TRUE)
chess3 <- chess3 %>% rename(R3ORating = PreRating.y, PreRating = PreRating.x)
chess3 <- chess3 %>% filter(!is.na(ID))
# Merges Round 4 Opponents and PreRatings and inserts value into Round 4 PreRating column
chess3 <- merge(chess3, chess3[c("ID" , "PreRating") ], by.x = "R4Opponent", by.y = "ID", all = TRUE)
chess3 <- chess3 %>% rename(R4ORating = PreRating.y, PreRating = PreRating.x)
chess3 <- chess3 %>% filter(!is.na(ID))
# Merges Round 5 Opponents and PreRatings and inserts value into Round 5 PreRating column
chess3 <- merge(chess3, chess3[c("ID" , "PreRating") ], by.x = "R5Opponent", by.y = "ID", all = TRUE)
chess3 <- chess3 %>% rename(R5ORating = PreRating.y, PreRating = PreRating.x)
chess3 <- chess3 %>% filter(!is.na(ID))
# Merges Round 6 Opponents and PreRatings and inserts value into Round 6 PreRating column
chess3 <- merge(chess3, chess3[c("ID" , "PreRating") ], by.x = "R6Opponent", by.y = "ID", all = TRUE)
chess3 <- chess3 %>% rename(R6ORating = PreRating.y, PreRating = PreRating.x)
chess3 <- chess3 %>% filter(!is.na(ID))
# Merges Round 7 Opponents and PreRatings and inserts value into Round 7 PreRating column
chess3 <- merge(chess3, chess3[c("ID" , "PreRating") ], by.x = "R7Opponent", by.y = "ID", all = TRUE)
chess3 <- chess3 %>% rename(R7ORating = PreRating.y, PreRating = PreRating.x)
chess3 <- chess3 %>% filter(!is.na(ID))
knitr::kable(head(chess3[c('Name', 'R1ORating', 'R2ORating', 'R3ORating', 'R4ORating', 'R5ORating', 'R6ORating', 'R7ORating')]))
Name | R1ORating | R2ORating | R3ORating | R4ORating | R5ORating | R6ORating | R7ORating |
---|---|---|---|---|---|---|---|
PATRICK H SCHILLING | 1363 | 1507 | 1553 | 1579 | 1655 | 1564 | 1794 |
GARY DEE SWATHELL | 1092 | 377 | 1666 | 1712 | 1794 | 1411 | 1553 |
KENNETH J TACK | 1332 | 1449 | 1655 | 1423 | NA | 1794 | 1384 |
GARY HUA | 1436 | 1563 | 1600 | 1610 | 1649 | 1663 | 1716 |
RONALD GRZEGORCZYK | 1382 | 1403 | 1579 | 1553 | 1363 | 1555 | 1655 |
DINH DANG BUI | 1283 | 1794 | 1362 | 1384 | 1348 | 1436 | 1686 |
Once the dataframe has been completely transformed and the opponent PreRating columns have been added, the Average Opponent PreRating for all opponents for all rounds is calculated.
#Calculates the average opponent PreRating for all opponents, all rounds
chess3 <- mutate(chess3, AvgOpponentRating= round(rowMeans(select(chess3, "R1ORating","R2ORating","R3ORating","R4ORating","R5ORating","R6ORating","R7ORating"), na.rm = TRUE)))
chess3 <- chess3[order(chess3$ID),]
knitr::kable(head(chess3[c('Name', 'State', 'Points', 'PreRating', 'AvgOpponentRating')]))
Name | State | Points | PreRating | AvgOpponentRating | |
---|---|---|---|---|---|
4 | GARY HUA | ON | 6.0 | 1794 | 1605 |
7 | DAKSHESH DARURI | MI | 6.0 | 1553 | 1469 |
12 | ADITYA BAJAJ | MI | 6.0 | 1384 | 1564 |
1 | PATRICK H SCHILLING | MI | 5.5 | 1716 | 1574 |
16 | HANSHI ZUO | MI | 5.5 | 1655 | 1501 |
20 | HANSEN SONG | OH | 5.0 | 1686 | 1519 |
Writes the CSV file with Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents
#Writes output CSV using subset of columns from the transformed dataframe
write.csv(chess3[,c("Name","State", "Points", "PreRating", "AvgOpponentRating")], file="chess_outfile.csv",row.names=FALSE)