At first, I tried importing directly from the link provided for the
assignment. However, this resulted in an error due to an “incomplete
final line” - basically, the text in the original link doesn’t end in a
\n
. I had to recreate the file in my local directory to add
one and import from there.
Once the .txt file was ready, I used read_lines()
to
read the lines of the file into a character vector, then filtered out
any lines that consisted solely of dashes. Next, using
read.table()
to indicate the correct column separator, I
created a data frame that resembled (on the surface anyway) the
structure of the table from the original file.
chess_table <- read_lines("tournamentinfo.txt")
chess_table <- chess_table[!grepl("^[-]+$", chess_table)]
chess_data_raw <- read.table(text = chess_table, sep = "|")
glimpse(chess_data_raw)
## Rows: 130
## Columns: 11
## $ V1 <chr> " Pair ", " Num ", " 1 ", " ON ", " 2 ", " MI ", " 3…
## $ V2 <chr> " Player Name ", " USCF ID / Rtg (Pre->Post) …
## $ V3 <chr> "Total", " Pts ", "6.0 ", "N:2 ", "6.0 ", "N:2 ", "6.0 ", "N:…
## $ V4 <chr> "Round", " 1 ", "W 39", "W ", "W 63", "B ", "L 8", "W …
## $ V5 <chr> "Round", " 2 ", "W 21", "B ", "W 58", "W ", "W 61", "B …
## $ V6 <chr> "Round", " 3 ", "W 18", "W ", "L 4", "B ", "W 25", "W …
## $ V7 <chr> "Round", " 4 ", "W 14", "B ", "W 17", "W ", "W 21", "B …
## $ V8 <chr> "Round", " 5 ", "W 7", "W ", "W 16", "B ", "W 11", "W …
## $ V9 <chr> "Round", " 6 ", "D 12", "B ", "W 20", "W ", "W 13", "B …
## $ V10 <chr> "Round", " 7 ", "D 4", "W ", "W 7", "B ", "W 12", "W …
## $ V11 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
Since each observation was split between 2 rows, I subset the odds and evens, and removed the headers.
odd_rows <- chess_data_raw[seq_len(nrow(chess_data_raw)) %% 2 == 1,]
odd_rows <- odd_rows[-1, -11]
even_rows <- chess_data_raw[seq_len(nrow(chess_data_raw)) %% 2 == 0,]
even_rows <- even_rows[-1, -11]
Then, I began building a new data frame with the first 4 required variables: Player’s Name, Player’s State, Total Number of Points, and Player’s Pre-Rating.
The prerating
column was the most complicated and
required that first I extract the string between the “R:” and the arrow
“->” in each row. Taking this raw string value, I then checked each
for a “P” and removed the rest of the string if one appeared. Finally,
the values could be converted to numeric for calculations for the final
column.
player_id <- as.numeric(odd_rows[, 1])
player_name <- odd_rows[, 2]
player_state <- even_rows[, 1]
total_points <- as.numeric(odd_rows[, 3])
# extract pre-rating string
prerating <- str_trim(str_extract(even_rows[, 2], "(?<=R:\\s)(.*?)(?=\\s*->)"))
# strip the "P"s if they exist
prerating <- ifelse(
str_detect(prerating, "P"),
str_remove_all(prerating, "P.*"),
prerating
)
# change type to numeric
prerating <- as.numeric(prerating)
# clean new data frame
chess_data <- data.frame(player_id, player_name, player_state, total_points, prerating)
knitr::kable(head(chess_data, n = 5))
player_id | player_name | player_state | total_points | prerating |
---|---|---|---|---|
1 | GARY HUA | ON | 6.0 | 1794 |
2 | DAKSHESH DARURI | MI | 6.0 | 1553 |
3 | ADITYA BAJAJ | MI | 6.0 | 1384 |
4 | PATRICK H SCHILLING | MI | 5.5 | 1716 |
5 | HANSHI ZUO | MI | 5.5 | 1655 |
For last column - the average pre-rating for each player’s opponents
- I wanted a second table (a sort of “draft” for calculations) for the
opponent IDs for each round. The digits were extracted, then converted
to numeric, and a column added for player ID. Then, I merged them with
some of the chess_data
columns into a single wide
table.
# strip anything not a digit
full_tourn <- odd_rows[,4:10] %>%
mutate(across(everything(), ~gsub("\\D", "", .)))
# change type to numeric and rename columns, add player ID
full_tourn <- full_tourn %>%
mutate(across(everything(), ~as.numeric(.)))
colnames(full_tourn) <- c("Round_1", "Round_2", "Round_3", "Round_4", "Round_5", "Round_6", "Round_7")
full_tourn$player_id <- player_id
# combine into 1 wide data frame
full_tourn <- merge(chess_data[, c(1, 2, 5)], full_tourn, by = "player_id")
knitr::kable(head(full_tourn, n = 5))
player_id | player_name | prerating | Round_1 | Round_2 | Round_3 | Round_4 | Round_5 | Round_6 | Round_7 |
---|---|---|---|---|---|---|---|---|---|
1 | GARY HUA | 1794 | 39 | 21 | 18 | 14 | 7 | 12 | 4 |
2 | DAKSHESH DARURI | 1553 | 63 | 58 | 4 | 17 | 16 | 20 | 7 |
3 | ADITYA BAJAJ | 1384 | 8 | 61 | 25 | 21 | 11 | 13 | 12 |
4 | PATRICK H SCHILLING | 1716 | 23 | 28 | 2 | 26 | 5 | 19 | 1 |
5 | HANSHI ZUO | 1655 | 45 | 37 | 12 | 13 | 4 | 14 | 17 |
Finally, using the player_id
in each
Round_*
variable, I replaced the value with the pre-rating
for the matching player, and added a column for the opponents’ average
pre-rating. I appended that last column to the output table.
# helper function
get_rating <- function(id) {
if (is.na(id)) {
return(NA)
} else {
player_match <- filter(full_tourn, player_id == id)
return(player_match$prerating)
}
}
# replace player IDs with player pre-ratings
full_tourn[, 4:10] <- apply(full_tourn[, 4:10], c(1,2), get_rating)
# add a row for the average
full_tourn <- full_tourn %>%
mutate(opp_pre = round(rowMeans(full_tourn[, 4:10], na.rm = TRUE)))
# include column on output table
chess_data$opp_pre <- full_tourn$opp_pre
colnames(chess_data) <- c("Player's ID", "Player’s Name", "Player’s State", "Total Number of Points", "Player’s Pre-Rating", "Average Pre Chess Rating of Opponents")
write_csv(chess_data, "tournamentinfo.csv")
knitr::kable(chess_data)
Player’s ID | Player’s Name | Player’s State | Total Number of Points | Player’s Pre-Rating | Average Pre Chess Rating of Opponents |
---|---|---|---|---|---|
1 | GARY HUA | ON | 6.0 | 1794 | 1605 |
2 | DAKSHESH DARURI | MI | 6.0 | 1553 | 1469 |
3 | ADITYA BAJAJ | MI | 6.0 | 1384 | 1564 |
4 | PATRICK H SCHILLING | MI | 5.5 | 1716 | 1574 |
5 | HANSHI ZUO | MI | 5.5 | 1655 | 1501 |
6 | HANSEN SONG | OH | 5.0 | 1686 | 1519 |
7 | GARY DEE SWATHELL | MI | 5.0 | 1649 | 1372 |
8 | EZEKIEL HOUGHTON | MI | 5.0 | 1641 | 1468 |
9 | STEFANO LEE | ON | 5.0 | 1411 | 1523 |
10 | ANVIT RAO | MI | 5.0 | 1365 | 1554 |
11 | CAMERON WILLIAM MC LEMAN | MI | 4.5 | 1712 | 1468 |
12 | KENNETH J TACK | MI | 4.5 | 1663 | 1506 |
13 | TORRANCE HENRY JR | MI | 4.5 | 1666 | 1498 |
14 | BRADLEY SHAW | MI | 4.5 | 1610 | 1515 |
15 | ZACHARY JAMES HOUGHTON | MI | 4.5 | 1220 | 1484 |
16 | MIKE NIKITIN | MI | 4.0 | 1604 | 1386 |
17 | RONALD GRZEGORCZYK | MI | 4.0 | 1629 | 1499 |
18 | DAVID SUNDEEN | MI | 4.0 | 1600 | 1480 |
19 | DIPANKAR ROY | MI | 4.0 | 1564 | 1426 |
20 | JASON ZHENG | MI | 4.0 | 1595 | 1411 |
21 | DINH DANG BUI | ON | 4.0 | 1563 | 1470 |
22 | EUGENE L MCCLURE | MI | 4.0 | 1555 | 1300 |
23 | ALAN BUI | ON | 4.0 | 1363 | 1214 |
24 | MICHAEL R ALDRICH | MI | 4.0 | 1229 | 1357 |
25 | LOREN SCHWIEBERT | MI | 3.5 | 1745 | 1363 |
26 | MAX ZHU | ON | 3.5 | 1579 | 1507 |
27 | GAURAV GIDWANI | MI | 3.5 | 1552 | 1222 |
28 | SOFIA ADINA STANESCU-BELLU | MI | 3.5 | 1507 | 1522 |
29 | CHIEDOZIE OKORIE | MI | 3.5 | 1602 | 1314 |
30 | GEORGE AVERY JONES | ON | 3.5 | 1522 | 1144 |
31 | RISHI SHETTY | MI | 3.5 | 1494 | 1260 |
32 | JOSHUA PHILIP MATHEWS | ON | 3.5 | 1441 | 1379 |
33 | JADE GE | MI | 3.5 | 1449 | 1277 |
34 | MICHAEL JEFFERY THOMAS | MI | 3.5 | 1399 | 1375 |
35 | JOSHUA DAVID LEE | MI | 3.5 | 1438 | 1150 |
36 | SIDDHARTH JHA | MI | 3.5 | 1355 | 1388 |
37 | AMIYATOSH PWNANANDAM | MI | 3.5 | 980 | 1385 |
38 | BRIAN LIU | MI | 3.0 | 1423 | 1539 |
39 | JOEL R HENDON | MI | 3.0 | 1436 | 1430 |
40 | FOREST ZHANG | MI | 3.0 | 1348 | 1391 |
41 | KYLE WILLIAM MURPHY | MI | 3.0 | 1403 | 1248 |
42 | JARED GE | MI | 3.0 | 1332 | 1150 |
43 | ROBERT GLEN VASEY | MI | 3.0 | 1283 | 1107 |
44 | JUSTIN D SCHILLING | MI | 3.0 | 1199 | 1327 |
45 | DEREK YAN | MI | 3.0 | 1242 | 1152 |
46 | JACOB ALEXANDER LAVALLEY | MI | 3.0 | 377 | 1358 |
47 | ERIC WRIGHT | MI | 2.5 | 1362 | 1392 |
48 | DANIEL KHAIN | MI | 2.5 | 1382 | 1356 |
49 | MICHAEL J MARTIN | MI | 2.5 | 1291 | 1286 |
50 | SHIVAM JHA | MI | 2.5 | 1056 | 1296 |
51 | TEJAS AYYAGARI | MI | 2.5 | 1011 | 1356 |
52 | ETHAN GUO | MI | 2.5 | 935 | 1495 |
53 | JOSE C YBARRA | MI | 2.0 | 1393 | 1345 |
54 | LARRY HODGE | MI | 2.0 | 1270 | 1206 |
55 | ALEX KONG | MI | 2.0 | 1186 | 1406 |
56 | MARISA RICCI | MI | 2.0 | 1153 | 1414 |
57 | MICHAEL LU | MI | 2.0 | 1092 | 1363 |
58 | VIRAJ MOHILE | MI | 2.0 | 917 | 1391 |
59 | SEAN M MC CORMICK | MI | 2.0 | 853 | 1319 |
60 | JULIA SHEN | MI | 1.5 | 967 | 1330 |
61 | JEZZEL FARKAS | ON | 1.5 | 955 | 1327 |
62 | ASHWIN BALAJI | MI | 1.0 | 1530 | 1186 |
63 | THOMAS JOSEPH HOSMER | MI | 1.0 | 1175 | 1350 |
64 | BEN LI | MI | 1.0 | 1163 | 1263 |