In this project, you’re given a text file with chess tournament results where the information has some structure.
Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players:
These are the packages that will be necessary to complete the following conversion of the Chess Tournament Data.
The first step is to load the data into r from the text file.
setwd("C:/Users/biguz/Desktop/CUNY Data Science/Fall2020/Data 607/Projects/Project 1")
tournament_data <- readLines("tournamentinfo.txt", warn=FALSE)This data gets read into r as a one large string so we will need to use some string operations to get it to a usable data frame and get it ready to export the data into a csv file.
| tournament_data |
|---|
| —————————————————————————————– |
| Pair | Player Name |Total|Round|Round|Round|Round|Round|Round|Round| |
| Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
| —————————————————————————————– |
| 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4| |
| ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W | |
| —————————————————————————————– |
We want to change the data so the lines are gone and all the data between the lines shows up in one row and not two.
In this section I used some techniques to change the data from the table shown above to a more readable data frame, where it would be possible to create the CSV file requested.
lines <-
c('-----------------------------------------------------------------------------------------')
tournament_data <- tournament_data[!(tournament_data == lines)]
tournament_data <- str_replace(tournament_data," {2}","")#Create dataframe
tournament_raw_df <- data.frame(do.call(rbind, strsplit(tournament_data, "|", fixed=TRUE)))
tournament_raw_df <- tournament_raw_df[-c(11)]#Merge row data
for (column in names(tournament_raw_df)){
for (i in 1:nrow(tournament_raw_df)){
if((i %% 2) > 0){
tournament_raw_df[i,column] <-
paste(tournament_raw_df[i,column], tournament_raw_df[i+1,column],sep="|")}}}
#Removing odd rows
toKeep <- seq(3,nrow(tournament_raw_df),2)
tournament_raw_df <- tournament_raw_df[toKeep,]#Creating new clean dataframe
df_args <- c(tournament_raw_df, sep="|")
tournament_raw_df <- do.call(paste,df_args)
tournament_df <- data.frame(do.call(rbind, strsplit(tournament_raw_df, "|", fixed=TRUE)))
#Renaming columns
tournament_df <- rename(tournament_df,
c("Pair Num" = "X1", "State" = "X2", "Player Name" = "X3", "USCF ID / Rtg (Pre->Post)" = "X4",
"Total Pts" = "X5","N Count" = "X6","Round 1 Outcome" = "X7", "Round 1 Pieces" = "X8",
"Round 2 Outcome" = "X9", "Round 2 Pieces" = "X10", "Round 3 Outcome" = "X11",
"Round 3 Pieces" = "X12", "Round 4 Outcome" = "X13", "Round 4 Pieces" = "X14",
"Round 5 Outcome" = "X15", "Round 5 Pieces" = "X16", "Round 6 Outcome" = "X17",
"Round 6 Pieces" = "X18", "Round 7 Outcome" = "X19", "Round 7 Pieces" = "X20"))for (col in names(tournament_df)){
if (grepl("Pieces",col)){
tournament_df[,c(col)] = ifelse(grepl("W",tournament_df[,c(col)]),"White","Black")}}#Creating ID column
tournament_df$USCFID <-
unlist(lapply(
strsplit(as.character(tournament_df[,c("USCF ID / Rtg (Pre->Post)")]), " / "), '[', 1))
#Creating pre-post column
tournament_df$"Rtg (Pre->Post)" <-
unlist(lapply(
strsplit(as.character(tournament_df[,c("USCF ID / Rtg (Pre->Post)")]), " / "), '[', 2))
#Removing old id/rating column
tournament_df <- tournament_df[, -which(names(tournament_df) %in% "USCF ID / Rtg (Pre->Post)")]
#Creating pre rating column
tournament_df$PreRating <-
unlist(lapply(
strsplit(as.character(tournament_df[,c("Rtg (Pre->Post)")]), "->"), '[', 1))
#Creating post rating column
tournament_df$PostRating <-
unlist(lapply(
strsplit(as.character(tournament_df[,c("Rtg (Pre->Post)")]), "->"), '[', 2))
#Removing old pre/post rating column
tournament_df <- tournament_df[,-which(names(tournament_df) %in% "Rtg (Pre->Post)")]#Remove "R:" from pre rating number
tournament_df$PreRating <- gsub("R: ", "", tournament_df$PreRating)
#Remove P from pre and post rating
tournament_df$PreRating <- gsub("()P.*", "", tournament_df$PreRating)
tournament_df$PostRating <- gsub("()P.*", "", tournament_df$PostRating)#Create new columns and populate values
for (col in names(tournament_df)){
if (grepl("Outcome",col)){
#Opponent column
tournament_df[,c(gsub("Outcome", "Opponent", col))] <-
unlist(lapply(strsplit(as.character(tournament_df[,c(col)]), " "), '[', 2))
#Outcome column
tournament_df[,c(col)] <- gsub("([A-Z]).*","\\1",tournament_df[,c(col)])}}
#Clean up white spaces
for (i in names(tournament_df)) {
tournament_df[[i]] <-
trimws(tournament_df[[i]], which = c("both", "left", "right"), whitespace = "[ \t\r\n]")}for (i in seq(nrow(tournament_df))){
for (col in names(tournament_df)){
if (grepl("Opponent",col)){
tournament_df[i,c(gsub("Opponent","Opp PreRtg",col))] <-
ifelse(
is.na(tournament_df[i,col]), NA,
tournament_df$PreRating[tournament_df$`Pair Num` == tournament_df[i,col]])}}}| Pair Num | State | Player Name | USCFID | PreRating | PostRating | Total Pts | N Count | Round 1 Opponent | Round 1 Opp PreRtg | Round 1 Outcome | Round 1 Pieces | Round 2 Opponent | Round 2 Opp PreRtg | Round 2 Outcome | Round 2 Pieces | Round 3 Opponent | Round 3 Opp PreRtg | Round 3 Outcome | Round 3 Pieces | Round 4 Opponent | Round 4 Opp PreRtg | Round 4 Outcome | Round 4 Pieces | Round 5 Opponent | Round 5 Opp PreRtg | Round 5 Outcome | Round 5 Pieces | Round 6 Opponent | Round 6 Opp PreRtg | Round 6 Outcome | Round 6 Pieces | Round 7 Opponent | Round 7 Opp PreRtg | Round 7 Outcome | Round 7 Pieces |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ON | GARY HUA | 15445895 | 1794 | 1817 | 6.0 | N:2 | 39 | 1436 | W | White | 21 | 1563 | W | Black | 18 | 1600 | W | White | 14 | 1610 | W | Black | 7 | 1649 | W | White | 12 | 1663 | D | Black | 4 | 1716 | D | White |
| 2 | MI | DAKSHESH DARURI | 14598900 | 1553 | 1663 | 6.0 | N:2 | 63 | 1175 | W | Black | 58 | 917 | W | White | 4 | 1716 | L | Black | 17 | 1629 | W | White | 16 | 1604 | W | Black | 20 | 1595 | W | White | 7 | 1649 | W | Black |
| 3 | MI | ADITYA BAJAJ | 14959604 | 1384 | 1640 | 6.0 | N:2 | 8 | 1641 | L | White | 61 | 955 | W | Black | 25 | 1745 | W | White | 21 | 1563 | W | Black | 11 | 1712 | W | White | 13 | 1666 | W | Black | 12 | 1663 | W | White |
| 4 | MI | PATRICK H SCHILLING | 12616049 | 1716 | 1744 | 5.5 | N:2 | 23 | 1363 | W | White | 28 | 1507 | D | Black | 2 | 1553 | W | White | 26 | 1579 | W | Black | 5 | 1655 | D | White | 19 | 1564 | W | Black | 1 | 1794 | D | Black |
| 5 | MI | HANSHI ZUO | 14601533 | 1655 | 1690 | 5.5 | N:2 | 45 | 1242 | W | Black | 37 | 980 | W | White | 12 | 1663 | D | Black | 13 | 1666 | D | White | 4 | 1716 | D | Black | 14 | 1610 | W | White | 17 | 1629 | W | Black |
For this project we were asked to generate a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players:
The first 4 columns of the csv file were completed in the previous section. I used this section to create the final column for the CSV table Average Pre Chess Rating of Opponents.
for (col in names(tournament_df)){
if (grepl("PreRtg",col)){
tournament_df[,c(col)] <- as.numeric(tournament_df[,c(col)])}}#Creating pre rating sum
tournament_df$OppPreRtgSum <-
rowSums(tournament_df[,c("Round 1 Opp PreRtg", "Round 2 Opp PreRtg", "Round 3 Opp PreRtg",
"Round 4 Opp PreRtg", "Round 5 Opp PreRtg", "Round 6 Opp PreRtg",
"Round 7 Opp PreRtg")], na.rm = TRUE)
#Creating games played
tournament_df$GamesPlayed <-
rowSums(!is.na(tournament_df[,c("Round 1 Opp PreRtg", "Round 2 Opp PreRtg", "Round 3 Opp PreRtg",
"Round 4 Opp PreRtg", "Round 5 Opp PreRtg", "Round 6 Opp PreRtg",
"Round 7 Opp PreRtg")]))| Player’s Name | Player’s State | Total Number of Points | Player’s Pre-Rating | Average Pre Chess Rating of Opponents |
|---|---|---|---|---|
| GARY HUA | ON | 6.0 | 1794 | 1605.286 |
| DAKSHESH DARURI | MI | 6.0 | 1553 | 1469.286 |
| ADITYA BAJAJ | MI | 6.0 | 1384 | 1563.571 |
| PATRICK H SCHILLING | MI | 5.5 | 1716 | 1573.571 |
| HANSHI ZUO | MI | 5.5 | 1655 | 1500.857 |
| HANSEN SONG | OH | 5.0 | 1686 | 1518.714 |
| GARY DEE SWATHELL | MI | 5.0 | 1649 | 1372.143 |
| EZEKIEL HOUGHTON | MI | 5.0 | 1641 | 1468.429 |
| STEFANO LEE | ON | 5.0 | 1411 | 1523.143 |
| ANVIT RAO | MI | 5.0 | 1365 | 1554.143 |
| CAMERON WILLIAM MC LEMAN | MI | 4.5 | 1712 | 1467.571 |
| KENNETH J TACK | MI | 4.5 | 1663 | 1506.167 |
| TORRANCE HENRY JR | MI | 4.5 | 1666 | 1497.857 |
| BRADLEY SHAW | MI | 4.5 | 1610 | 1515.000 |
| ZACHARY JAMES HOUGHTON | MI | 4.5 | 1220 | 1483.857 |
| MIKE NIKITIN | MI | 4.0 | 1604 | 1385.800 |
| RONALD GRZEGORCZYK | MI | 4.0 | 1629 | 1498.571 |
| DAVID SUNDEEN | MI | 4.0 | 1600 | 1480.000 |
| DIPANKAR ROY | MI | 4.0 | 1564 | 1426.286 |
| JASON ZHENG | MI | 4.0 | 1595 | 1410.857 |
| DINH DANG BUI | ON | 4.0 | 1563 | 1470.429 |
| EUGENE L MCCLURE | MI | 4.0 | 1555 | 1300.333 |
| ALAN BUI | ON | 4.0 | 1363 | 1213.857 |
| MICHAEL R ALDRICH | MI | 4.0 | 1229 | 1357.000 |
| LOREN SCHWIEBERT | MI | 3.5 | 1745 | 1363.286 |
| MAX ZHU | ON | 3.5 | 1579 | 1506.857 |
| GAURAV GIDWANI | MI | 3.5 | 1552 | 1221.667 |
| SOFIA ADINA STANESCU-BELLU | MI | 3.5 | 1507 | 1522.143 |
| CHIEDOZIE OKORIE | MI | 3.5 | 1602 | 1313.500 |
| GEORGE AVERY JONES | ON | 3.5 | 1522 | 1144.143 |
| RISHI SHETTY | MI | 3.5 | 1494 | 1259.857 |
| JOSHUA PHILIP MATHEWS | ON | 3.5 | 1441 | 1378.714 |
| JADE GE | MI | 3.5 | 1449 | 1276.857 |
| MICHAEL JEFFERY THOMAS | MI | 3.5 | 1399 | 1375.286 |
| JOSHUA DAVID LEE | MI | 3.5 | 1438 | 1149.714 |
| SIDDHARTH JHA | MI | 3.5 | 1355 | 1388.167 |
| AMIYATOSH PWNANANDAM | MI | 3.5 | 980 | 1384.800 |
| BRIAN LIU | MI | 3.0 | 1423 | 1539.167 |
| JOEL R HENDON | MI | 3.0 | 1436 | 1429.571 |
| FOREST ZHANG | MI | 3.0 | 1348 | 1390.571 |
| KYLE WILLIAM MURPHY | MI | 3.0 | 1403 | 1248.500 |
| JARED GE | MI | 3.0 | 1332 | 1149.857 |
| ROBERT GLEN VASEY | MI | 3.0 | 1283 | 1106.571 |
| JUSTIN D SCHILLING | MI | 3.0 | 1199 | 1327.000 |
| DEREK YAN | MI | 3.0 | 1242 | 1152.000 |
| JACOB ALEXANDER LAVALLEY | MI | 3.0 | 377 | 1357.714 |
| ERIC WRIGHT | MI | 2.5 | 1362 | 1392.000 |
| DANIEL KHAIN | MI | 2.5 | 1382 | 1355.800 |
| MICHAEL J MARTIN | MI | 2.5 | 1291 | 1285.800 |
| SHIVAM JHA | MI | 2.5 | 1056 | 1296.000 |
| TEJAS AYYAGARI | MI | 2.5 | 1011 | 1356.143 |
| ETHAN GUO | MI | 2.5 | 935 | 1494.571 |
| JOSE C YBARRA | MI | 2.0 | 1393 | 1345.333 |
| LARRY HODGE | MI | 2.0 | 1270 | 1206.167 |
| ALEX KONG | MI | 2.0 | 1186 | 1406.000 |
| MARISA RICCI | MI | 2.0 | 1153 | 1414.400 |
| MICHAEL LU | MI | 2.0 | 1092 | 1363.000 |
| VIRAJ MOHILE | MI | 2.0 | 917 | 1391.000 |
| SEAN M MC CORMICK | MI | 2.0 | 853 | 1319.000 |
| JULIA SHEN | MI | 1.5 | 967 | 1330.200 |
| JEZZEL FARKAS | ON | 1.5 | 955 | 1327.286 |
| ASHWIN BALAJI | MI | 1.0 | 1530 | 1186.000 |
| THOMAS JOSEPH HOSMER | MI | 1.0 | 1175 | 1350.200 |
| BEN LI | MI | 1.0 | 1163 | 1263.000 |