My approach for this first project is to first establish a way to read the txt file. I think using the read lines function in the stringr library will help read the text file. I will need to use a while loop that will run for as long as there is text in the text file. Within that while loop will parse through the text and use an if loop to put information in appropriated column values. Then create a data frame with those columns.
The main challenge I see with this project is making sure the information get placed in the correct columns. Making sure the parsing the information with a correct character. Learning how to parse the information is what I think is the main challenge here then displaying the data frame and making sure that it is correct.
library(stringr)
## Warning: package 'stringr' was built under R version 4.4.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Chess_lines <- readLines("C:/Users/typem/Documents/GitHub/Data607_Project1/tournamentinfo.txt")
## Warning in
## readLines("C:/Users/typem/Documents/GitHub/Data607_Project1/tournamentinfo.txt"):
## incomplete final line found on
## 'C:/Users/typem/Documents/GitHub/Data607_Project1/tournamentinfo.txt'
Chess_lines
## [1] "-----------------------------------------------------------------------------------------"
## [2] " Pair | Player Name |Total|Round|Round|Round|Round|Round|Round|Round| "
## [3] " Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 | "
## [4] "-----------------------------------------------------------------------------------------"
## [5] " 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|"
## [6] " ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |"
## [7] "-----------------------------------------------------------------------------------------"
## [8] " 2 | DAKSHESH DARURI |6.0 |W 63|W 58|L 4|W 17|W 16|W 20|W 7|"
## [9] " MI | 14598900 / R: 1553 ->1663 |N:2 |B |W |B |W |B |W |B |"
## [10] "-----------------------------------------------------------------------------------------"
## [11] " 3 | ADITYA BAJAJ |6.0 |L 8|W 61|W 25|W 21|W 11|W 13|W 12|"
## [12] " MI | 14959604 / R: 1384 ->1640 |N:2 |W |B |W |B |W |B |W |"
## [13] "-----------------------------------------------------------------------------------------"
## [14] " 4 | PATRICK H SCHILLING |5.5 |W 23|D 28|W 2|W 26|D 5|W 19|D 1|"
## [15] " MI | 12616049 / R: 1716 ->1744 |N:2 |W |B |W |B |W |B |B |"
## [16] "-----------------------------------------------------------------------------------------"
## [17] " 5 | HANSHI ZUO |5.5 |W 45|W 37|D 12|D 13|D 4|W 14|W 17|"
## [18] " MI | 14601533 / R: 1655 ->1690 |N:2 |B |W |B |W |B |W |B |"
## [19] "-----------------------------------------------------------------------------------------"
## [20] " 6 | HANSEN SONG |5.0 |W 34|D 29|L 11|W 35|D 10|W 27|W 21|"
## [21] " OH | 15055204 / R: 1686 ->1687 |N:3 |W |B |W |B |B |W |B |"
## [22] "-----------------------------------------------------------------------------------------"
## [23] " 7 | GARY DEE SWATHELL |5.0 |W 57|W 46|W 13|W 11|L 1|W 9|L 2|"
## [24] " MI | 11146376 / R: 1649 ->1673 |N:3 |W |B |W |B |B |W |W |"
## [25] "-----------------------------------------------------------------------------------------"
## [26] " 8 | EZEKIEL HOUGHTON |5.0 |W 3|W 32|L 14|L 9|W 47|W 28|W 19|"
## [27] " MI | 15142253 / R: 1641P17->1657P24 |N:3 |B |W |B |W |B |W |W |"
## [28] "-----------------------------------------------------------------------------------------"
## [29] " 9 | STEFANO LEE |5.0 |W 25|L 18|W 59|W 8|W 26|L 7|W 20|"
## [30] " ON | 14954524 / R: 1411 ->1564 |N:2 |W |B |W |B |W |B |B |"
## [31] "-----------------------------------------------------------------------------------------"
## [32] " 10 | ANVIT RAO |5.0 |D 16|L 19|W 55|W 31|D 6|W 25|W 18|"
## [33] " MI | 14150362 / R: 1365 ->1544 |N:3 |W |W |B |B |W |B |W |"
## [34] "-----------------------------------------------------------------------------------------"
## [35] " 11 | CAMERON WILLIAM MC LEMAN |4.5 |D 38|W 56|W 6|L 7|L 3|W 34|W 26|"
## [36] " MI | 12581589 / R: 1712 ->1696 |N:3 |B |W |B |W |B |W |B |"
## [37] "-----------------------------------------------------------------------------------------"
## [38] " 12 | KENNETH J TACK |4.5 |W 42|W 33|D 5|W 38|H |D 1|L 3|"
## [39] " MI | 12681257 / R: 1663 ->1670 |N:3 |W |B |W |B | |W |B |"
## [40] "-----------------------------------------------------------------------------------------"
## [41] " 13 | TORRANCE HENRY JR |4.5 |W 36|W 27|L 7|D 5|W 33|L 3|W 32|"
## [42] " MI | 15082995 / R: 1666 ->1662 |N:3 |B |W |B |B |W |W |B |"
## [43] "-----------------------------------------------------------------------------------------"
## [44] " 14 | BRADLEY SHAW |4.5 |W 54|W 44|W 8|L 1|D 27|L 5|W 31|"
## [45] " MI | 10131499 / R: 1610 ->1618 |N:3 |W |B |W |W |B |B |W |"
## [46] "-----------------------------------------------------------------------------------------"
## [47] " 15 | ZACHARY JAMES HOUGHTON |4.5 |D 19|L 16|W 30|L 22|W 54|W 33|W 38|"
## [48] " MI | 15619130 / R: 1220P13->1416P20 |N:3 |B |B |W |W |B |B |W |"
## [49] "-----------------------------------------------------------------------------------------"
## [50] " 16 | MIKE NIKITIN |4.0 |D 10|W 15|H |W 39|L 2|W 36|U |"
## [51] " MI | 10295068 / R: 1604 ->1613 |N:3 |B |W | |B |W |B | |"
## [52] "-----------------------------------------------------------------------------------------"
## [53] " 17 | RONALD GRZEGORCZYK |4.0 |W 48|W 41|L 26|L 2|W 23|W 22|L 5|"
## [54] " MI | 10297702 / R: 1629 ->1610 |N:3 |W |B |W |B |W |B |W |"
## [55] "-----------------------------------------------------------------------------------------"
## [56] " 18 | DAVID SUNDEEN |4.0 |W 47|W 9|L 1|W 32|L 19|W 38|L 10|"
## [57] " MI | 11342094 / R: 1600 ->1600 |N:3 |B |W |B |W |B |W |B |"
## [58] "-----------------------------------------------------------------------------------------"
## [59] " 19 | DIPANKAR ROY |4.0 |D 15|W 10|W 52|D 28|W 18|L 4|L 8|"
## [60] " MI | 14862333 / R: 1564 ->1570 |N:3 |W |B |W |B |W |W |B |"
## [61] "-----------------------------------------------------------------------------------------"
## [62] " 20 | JASON ZHENG |4.0 |L 40|W 49|W 23|W 41|W 28|L 2|L 9|"
## [63] " MI | 14529060 / R: 1595 ->1569 |N:4 |W |B |W |B |W |B |W |"
## [64] "-----------------------------------------------------------------------------------------"
## [65] " 21 | DINH DANG BUI |4.0 |W 43|L 1|W 47|L 3|W 40|W 39|L 6|"
## [66] " ON | 15495066 / R: 1563P22->1562 |N:3 |B |W |B |W |W |B |W |"
## [67] "-----------------------------------------------------------------------------------------"
## [68] " 22 | EUGENE L MCCLURE |4.0 |W 64|D 52|L 28|W 15|H |L 17|W 40|"
## [69] " MI | 12405534 / R: 1555 ->1529 |N:4 |W |B |W |B | |W |B |"
## [70] "-----------------------------------------------------------------------------------------"
## [71] " 23 | ALAN BUI |4.0 |L 4|W 43|L 20|W 58|L 17|W 37|W 46|"
## [72] " ON | 15030142 / R: 1363 ->1371 | |B |W |B |W |B |W |B |"
## [73] "-----------------------------------------------------------------------------------------"
## [74] " 24 | MICHAEL R ALDRICH |4.0 |L 28|L 47|W 43|L 25|W 60|W 44|W 39|"
## [75] " MI | 13469010 / R: 1229 ->1300 |N:4 |B |W |B |B |W |W |B |"
## [76] "-----------------------------------------------------------------------------------------"
## [77] " 25 | LOREN SCHWIEBERT |3.5 |L 9|W 53|L 3|W 24|D 34|L 10|W 47|"
## [78] " MI | 12486656 / R: 1745 ->1681 |N:4 |B |W |B |W |B |W |B |"
## [79] "-----------------------------------------------------------------------------------------"
## [80] " 26 | MAX ZHU |3.5 |W 49|W 40|W 17|L 4|L 9|D 32|L 11|"
## [81] " ON | 15131520 / R: 1579 ->1564 |N:4 |B |W |B |W |B |W |W |"
## [82] "-----------------------------------------------------------------------------------------"
## [83] " 27 | GAURAV GIDWANI |3.5 |W 51|L 13|W 46|W 37|D 14|L 6|U |"
## [84] " MI | 14476567 / R: 1552 ->1539 |N:4 |W |B |W |B |W |B | |"
## [85] "-----------------------------------------------------------------------------------------"
## [86] " 28 | SOFIA ADINA STANESCU-BELLU |3.5 |W 24|D 4|W 22|D 19|L 20|L 8|D 36|"
## [87] " MI | 14882954 / R: 1507 ->1513 |N:3 |W |W |B |W |B |B |W |"
## [88] "-----------------------------------------------------------------------------------------"
## [89] " 29 | CHIEDOZIE OKORIE |3.5 |W 50|D 6|L 38|L 34|W 52|W 48|U |"
## [90] " MI | 15323285 / R: 1602P6 ->1508P12 |N:4 |B |W |B |W |W |B | |"
## [91] "-----------------------------------------------------------------------------------------"
## [92] " 30 | GEORGE AVERY JONES |3.5 |L 52|D 64|L 15|W 55|L 31|W 61|W 50|"
## [93] " ON | 12577178 / R: 1522 ->1444 | |W |B |B |W |W |B |B |"
## [94] "-----------------------------------------------------------------------------------------"
## [95] " 31 | RISHI SHETTY |3.5 |L 58|D 55|W 64|L 10|W 30|W 50|L 14|"
## [96] " MI | 15131618 / R: 1494 ->1444 | |B |W |B |W |B |W |B |"
## [97] "-----------------------------------------------------------------------------------------"
## [98] " 32 | JOSHUA PHILIP MATHEWS |3.5 |W 61|L 8|W 44|L 18|W 51|D 26|L 13|"
## [99] " ON | 14073750 / R: 1441 ->1433 |N:4 |W |B |W |B |W |B |W |"
## [100] "-----------------------------------------------------------------------------------------"
## [101] " 33 | JADE GE |3.5 |W 60|L 12|W 50|D 36|L 13|L 15|W 51|"
## [102] " MI | 14691842 / R: 1449 ->1421 | |B |W |B |W |B |W |B |"
## [103] "-----------------------------------------------------------------------------------------"
## [104] " 34 | MICHAEL JEFFERY THOMAS |3.5 |L 6|W 60|L 37|W 29|D 25|L 11|W 52|"
## [105] " MI | 15051807 / R: 1399 ->1400 | |B |W |B |B |W |B |W |"
## [106] "-----------------------------------------------------------------------------------------"
## [107] " 35 | JOSHUA DAVID LEE |3.5 |L 46|L 38|W 56|L 6|W 57|D 52|W 48|"
## [108] " MI | 14601397 / R: 1438 ->1392 | |W |W |B |W |B |B |W |"
## [109] "-----------------------------------------------------------------------------------------"
## [110] " 36 | SIDDHARTH JHA |3.5 |L 13|W 57|W 51|D 33|H |L 16|D 28|"
## [111] " MI | 14773163 / R: 1355 ->1367 |N:4 |W |B |W |B | |W |B |"
## [112] "-----------------------------------------------------------------------------------------"
## [113] " 37 | AMIYATOSH PWNANANDAM |3.5 |B |L 5|W 34|L 27|H |L 23|W 61|"
## [114] " MI | 15489571 / R: 980P12->1077P17 | | |B |W |W | |B |W |"
## [115] "-----------------------------------------------------------------------------------------"
## [116] " 38 | BRIAN LIU |3.0 |D 11|W 35|W 29|L 12|H |L 18|L 15|"
## [117] " MI | 15108523 / R: 1423 ->1439 |N:4 |W |B |W |W | |B |B |"
## [118] "-----------------------------------------------------------------------------------------"
## [119] " 39 | JOEL R HENDON |3.0 |L 1|W 54|W 40|L 16|W 44|L 21|L 24|"
## [120] " MI | 12923035 / R: 1436P23->1413 |N:4 |B |W |B |W |B |W |W |"
## [121] "-----------------------------------------------------------------------------------------"
## [122] " 40 | FOREST ZHANG |3.0 |W 20|L 26|L 39|W 59|L 21|W 56|L 22|"
## [123] " MI | 14892710 / R: 1348 ->1346 | |B |B |W |W |B |W |W |"
## [124] "-----------------------------------------------------------------------------------------"
## [125] " 41 | KYLE WILLIAM MURPHY |3.0 |W 59|L 17|W 58|L 20|X |U |U |"
## [126] " MI | 15761443 / R: 1403P5 ->1341P9 | |B |W |B |W | | | |"
## [127] "-----------------------------------------------------------------------------------------"
## [128] " 42 | JARED GE |3.0 |L 12|L 50|L 57|D 60|D 61|W 64|W 56|"
## [129] " MI | 14462326 / R: 1332 ->1256 | |B |W |B |B |W |W |B |"
## [130] "-----------------------------------------------------------------------------------------"
## [131] " 43 | ROBERT GLEN VASEY |3.0 |L 21|L 23|L 24|W 63|W 59|L 46|W 55|"
## [132] " MI | 14101068 / R: 1283 ->1244 | |W |B |W |W |B |B |W |"
## [133] "-----------------------------------------------------------------------------------------"
## [134] " 44 | JUSTIN D SCHILLING |3.0 |B |L 14|L 32|W 53|L 39|L 24|W 59|"
## [135] " MI | 15323504 / R: 1199 ->1199 | | |W |B |B |W |B |W |"
## [136] "-----------------------------------------------------------------------------------------"
## [137] " 45 | DEREK YAN |3.0 |L 5|L 51|D 60|L 56|W 63|D 55|W 58|"
## [138] " MI | 15372807 / R: 1242 ->1191 | |W |B |W |B |W |B |W |"
## [139] "-----------------------------------------------------------------------------------------"
## [140] " 46 | JACOB ALEXANDER LAVALLEY |3.0 |W 35|L 7|L 27|L 50|W 64|W 43|L 23|"
## [141] " MI | 15490981 / R: 377P3 ->1076P10 | |B |W |B |W |B |W |W |"
## [142] "-----------------------------------------------------------------------------------------"
## [143] " 47 | ERIC WRIGHT |2.5 |L 18|W 24|L 21|W 61|L 8|D 51|L 25|"
## [144] " MI | 12533115 / R: 1362 ->1341 | |W |B |W |B |W |B |W |"
## [145] "-----------------------------------------------------------------------------------------"
## [146] " 48 | DANIEL KHAIN |2.5 |L 17|W 63|H |D 52|H |L 29|L 35|"
## [147] " MI | 14369165 / R: 1382 ->1335 | |B |W | |B | |W |B |"
## [148] "-----------------------------------------------------------------------------------------"
## [149] " 49 | MICHAEL J MARTIN |2.5 |L 26|L 20|D 63|D 64|W 58|H |U |"
## [150] " MI | 12531685 / R: 1291P12->1259P17 | |W |W |B |W |B | | |"
## [151] "-----------------------------------------------------------------------------------------"
## [152] " 50 | SHIVAM JHA |2.5 |L 29|W 42|L 33|W 46|H |L 31|L 30|"
## [153] " MI | 14773178 / R: 1056 ->1111 | |W |B |W |B | |B |W |"
## [154] "-----------------------------------------------------------------------------------------"
## [155] " 51 | TEJAS AYYAGARI |2.5 |L 27|W 45|L 36|W 57|L 32|D 47|L 33|"
## [156] " MI | 15205474 / R: 1011 ->1097 | |B |W |B |W |B |W |W |"
## [157] "-----------------------------------------------------------------------------------------"
## [158] " 52 | ETHAN GUO |2.5 |W 30|D 22|L 19|D 48|L 29|D 35|L 34|"
## [159] " MI | 14918803 / R: 935 ->1092 |N:4 |B |W |B |W |B |W |B |"
## [160] "-----------------------------------------------------------------------------------------"
## [161] " 53 | JOSE C YBARRA |2.0 |H |L 25|H |L 44|U |W 57|U |"
## [162] " MI | 12578849 / R: 1393 ->1359 | | |B | |W | |W | |"
## [163] "-----------------------------------------------------------------------------------------"
## [164] " 54 | LARRY HODGE |2.0 |L 14|L 39|L 61|B |L 15|L 59|W 64|"
## [165] " MI | 12836773 / R: 1270 ->1200 | |B |B |W | |W |B |W |"
## [166] "-----------------------------------------------------------------------------------------"
## [167] " 55 | ALEX KONG |2.0 |L 62|D 31|L 10|L 30|B |D 45|L 43|"
## [168] " MI | 15412571 / R: 1186 ->1163 | |W |B |W |B | |W |B |"
## [169] "-----------------------------------------------------------------------------------------"
## [170] " 56 | MARISA RICCI |2.0 |H |L 11|L 35|W 45|H |L 40|L 42|"
## [171] " MI | 14679887 / R: 1153 ->1140 | | |B |W |W | |B |W |"
## [172] "-----------------------------------------------------------------------------------------"
## [173] " 57 | MICHAEL LU |2.0 |L 7|L 36|W 42|L 51|L 35|L 53|B |"
## [174] " MI | 15113330 / R: 1092 ->1079 | |B |W |W |B |W |B | |"
## [175] "-----------------------------------------------------------------------------------------"
## [176] " 58 | VIRAJ MOHILE |2.0 |W 31|L 2|L 41|L 23|L 49|B |L 45|"
## [177] " MI | 14700365 / R: 917 -> 941 | |W |B |W |B |W | |B |"
## [178] "-----------------------------------------------------------------------------------------"
## [179] " 59 | SEAN M MC CORMICK |2.0 |L 41|B |L 9|L 40|L 43|W 54|L 44|"
## [180] " MI | 12841036 / R: 853 -> 878 | |W | |B |B |W |W |B |"
## [181] "-----------------------------------------------------------------------------------------"
## [182] " 60 | JULIA SHEN |1.5 |L 33|L 34|D 45|D 42|L 24|H |U |"
## [183] " MI | 14579262 / R: 967 -> 984 | |W |B |B |W |B | | |"
## [184] "-----------------------------------------------------------------------------------------"
## [185] " 61 | JEZZEL FARKAS |1.5 |L 32|L 3|W 54|L 47|D 42|L 30|L 37|"
## [186] " ON | 15771592 / R: 955P11-> 979P18 | |B |W |B |W |B |W |B |"
## [187] "-----------------------------------------------------------------------------------------"
## [188] " 62 | ASHWIN BALAJI |1.0 |W 55|U |U |U |U |U |U |"
## [189] " MI | 15219542 / R: 1530 ->1535 | |B | | | | | | |"
## [190] "-----------------------------------------------------------------------------------------"
## [191] " 63 | THOMAS JOSEPH HOSMER |1.0 |L 2|L 48|D 49|L 43|L 45|H |U |"
## [192] " MI | 15057092 / R: 1175 ->1125 | |W |B |W |B |B | | |"
## [193] "-----------------------------------------------------------------------------------------"
## [194] " 64 | BEN LI |1.0 |L 22|D 30|L 31|D 49|L 46|L 42|L 54|"
## [195] " MI | 15006561 / R: 1163 ->1112 | |B |W |W |B |W |B |B |"
## [196] "-----------------------------------------------------------------------------------------"
records <- list()
i <- 1
while (i <= length(Chess_lines)) {
line <- Chess_lines[i]
# if statement to parse information from each player data line
if (grepl("^\\s{0,4}\\d+\\s*\\|", line)) {
parts <- strsplit(line, "\\|")[[1]]
pair_num <- trimws(parts[1])
player_name <- trimws(parts[2])
total_pts <- trimws(parts[3])
# Extract round results from parts 4-10
rounds <- trimws(parts[4:min(10, length(parts))])
while (length(rounds) < 7) rounds <- c(rounds, "") #padding in the case of a player having less than 7 rounds
# The while loop skips separator lines to find player information
j <- i + 1
while (j <= length(lines) && grepl("^-+$", trimws(lines[j]))) j <- j + 1
#Extracts info, state, pre rating and post rating
info <- if (j <= length(lines)) lines[j] else ""
info_parts <- strsplit(info, "\\|")[[1]]
state <- trimws(str_extract(info_parts[1], "[A-Z]{2}"))
pre_rating <- trimws(str_extract(info_parts[2], "(?<=R:\\s{0,5})\\d+"))
post_rating <- trimws(str_extract(info_parts[2], "(?<=->)\\s*\\d+"))
records[[length(records) + 1]] <- c(
pair_num, player_name, state,
pre_rating, post_rating, total_pts,
rounds
)
i <- j + 1
} else {
i <- i + 1
}
}
#Creating a data frame to then mutate the average opp rating from each round
chessdb <- as.data.frame(do.call(rbind, records), stringsAsFactors = FALSE)
colnames(chessdb) <- c("Pair_Num", "Player_Name", "State",
"Pre_Rating", "Post_Rating", "Total_Pts",
"Round1", "Round2", "Round3", "Round4", "Round5", "Round6", "Round7")
rating_lookup <- setNames(as.integer(gsub("P\\d+", "", chessdb$Pre_Rating)), chessdb$Pair_Num)
get_opp_num <- function(round_cell) {
m <- regmatches(round_cell, regexpr("\\d+", round_cell))
if (length(m) == 0) return(NA)
as.integer(m)
}
#mutate to calculate the average of of the opponents ratings
chess_summary <- chessdb %>%
mutate(Avg_Opp_Pre_Rating = apply(
chessdb[, c("Round1","Round2","Round3","Round4","Round5","Round6","Round7")],
1,
function(rounds) {
opp_nums <- sapply(rounds, get_opp_num)
opp_nums <- opp_nums[!is.na(opp_nums)]
opp_ratings <- rating_lookup[as.character(opp_nums)]
opp_ratings <- opp_ratings[!is.na(opp_ratings)]
round(mean(opp_ratings))
}
)) %>%
select(Player_Name, State, Total_Pts, Pre_Rating, Avg_Opp_Pre_Rating)
write.csv(chess_summary, "tournament_summary.csv", row.names = FALSE)
This project was a little challenging for me. The main challenge for me was trying to figure out how to get the average rating of the opponents. This project was good practice to understand how to parse through a text file and convert it into a csv file. Figuring out what functions and syntax was required to go through the text file and extract information felt very useful. I figured the best way to calculate the average was to use a mutate because that is what I have been using recently for other projects. The main problem with that was that I first had to match player row numbers and pull their ratings then store it to calculate the average.