This a project that reads chess tournament results and writes a summarisation to a csv. To begin the data is read in skipping the first several rows that contain what would be the headers. We also want to filter out the dividing rows (those consisting of the “-” character) and trim any leading or trailing whitespace in the column values. This will normalize the data and make it easier to manipulate
chess_raw <- read_delim("tournamentinfo.txt", delim = "|", skip = 4, col_names = FALSE) |>
filter(!str_detect(X1, "^-+-$")) |>
mutate(across(where(is.character), str_trim))
chess_raw
## # A tibble: 128 × 11
## X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl>
## 1 1 GARY HUA 6.0 W 39 W 21 W 18 W 14 W 7 D 12 D 4 NA
## 2 ON 15445895 / R: 17… N:2 W B W B W B W NA
## 3 2 DAKSHESH DARURI 6.0 W 63 W 58 L 4 W 17 W 16 W 20 W 7 NA
## 4 MI 14598900 / R: 15… N:2 B W B W B W B NA
## 5 3 ADITYA BAJAJ 6.0 L 8 W 61 W 25 W 21 W 11 W 13 W 12 NA
## 6 MI 14959604 / R: 13… N:2 W B W B W B W NA
## 7 4 PATRICK H SCHILL… 5.5 W 23 D 28 W 2 W 26 D 5 W 19 D 1 NA
## 8 MI 12616049 / R: 17… N:2 W B W B W B B NA
## 9 5 HANSHI ZUO 5.5 W 45 W 37 D 12 D 13 D 4 W 14 W 17 NA
## 10 MI 14601533 / R: 16… N:2 B W B W B W B NA
## # ℹ 118 more rows
Each entry in the original data is split into two rows in the dataframe. It is necessary to split the odd rows (those with a number in the first column) and even rows (those with the state in the first column) into separate dataframes in order to accurately label the data. By indexing the second dataframe by row number the dataframes can be rejoined in the future.
first_row <- chess_raw |>
filter(str_detect(X1, r"{^\d+}")) |>
mutate(across(X1, as.integer))
second_row <- chess_raw |>
filter(str_detect(X1, r"{^\D+}")) |>
mutate(id = row_number())
first_row
## # A tibble: 64 × 11
## X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
## <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl>
## 1 1 GARY HUA 6.0 W 39 W 21 W 18 W 14 W 7 D 12 D 4 NA
## 2 2 DAKSHESH DARURI 6.0 W 63 W 58 L 4 W 17 W 16 W 20 W 7 NA
## 3 3 ADITYA BAJAJ 6.0 L 8 W 61 W 25 W 21 W 11 W 13 W 12 NA
## 4 4 PATRICK H SCHILL… 5.5 W 23 D 28 W 2 W 26 D 5 W 19 D 1 NA
## 5 5 HANSHI ZUO 5.5 W 45 W 37 D 12 D 13 D 4 W 14 W 17 NA
## 6 6 HANSEN SONG 5.0 W 34 D 29 L 11 W 35 D 10 W 27 W 21 NA
## 7 7 GARY DEE SWATHELL 5.0 W 57 W 46 W 13 W 11 L 1 W 9 L 2 NA
## 8 8 EZEKIEL HOUGHTON 5.0 W 3 W 32 L 14 L 9 W 47 W 28 W 19 NA
## 9 9 STEFANO LEE 5.0 W 25 L 18 W 59 W 8 W 26 L 7 W 20 NA
## 10 10 ANVIT RAO 5.0 D 16 L 19 W 55 W 31 D 6 W 25 W 18 NA
## # ℹ 54 more rows
second_row
## # A tibble: 64 × 12
## X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 id
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <int>
## 1 ON 15445895 /… N:2 W B W B W B W NA 1
## 2 MI 14598900 /… N:2 B W B W B W B NA 2
## 3 MI 14959604 /… N:2 W B W B W B W NA 3
## 4 MI 12616049 /… N:2 W B W B W B B NA 4
## 5 MI 14601533 /… N:2 B W B W B W B NA 5
## 6 OH 15055204 /… N:3 W B W B B W B NA 6
## 7 MI 11146376 /… N:3 W B W B B W W NA 7
## 8 MI 15142253 /… N:3 B W B W B W W NA 8
## 9 ON 14954524 /… N:2 W B W B W B B NA 9
## 10 MI 14150362 /… N:3 W W B B W B W NA 10
## # ℹ 54 more rows
Now that the data is separated it can be accurately labeled and parsed. Using functions such as parse_number makes it easy to ignore the non-numeric characters in many of the fields. The dataframes can then be combined by joining on the id column.
first_row_rename <- first_row |>
select(!X11) |>
rename(id = "X1", name = "X2", score = "X3", r1 = "X4", r2 = "X5", r3 = "X6", r4 = "X7", r5 = "X8", r6 = "X9", r7 = "X10") |>
mutate(across(matches(r"{r\d}"), parse_number))
second_row_rename <- second_row |>
rename(state = "X1", rating = "X2") |>
mutate(prev_rating = parse_number(str_extract(rating, r"-{R:\W+\d{1,4}}-"))) |>
select(id, state, prev_rating)
working_df <- inner_join(first_row_rename, second_row_rename)
working_df
## # A tibble: 64 × 12
## id name score r1 r2 r3 r4 r5 r6 r7 state prev_rating
## <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 1 GARY… 6.0 39 21 18 14 7 12 4 ON 1794
## 2 2 DAKS… 6.0 63 58 4 17 16 20 7 MI 1553
## 3 3 ADIT… 6.0 8 61 25 21 11 13 12 MI 1384
## 4 4 PATR… 5.5 23 28 2 26 5 19 1 MI 1716
## 5 5 HANS… 5.5 45 37 12 13 4 14 17 MI 1655
## 6 6 HANS… 5.0 34 29 11 35 10 27 21 OH 1686
## 7 7 GARY… 5.0 57 46 13 11 1 9 2 MI 1649
## 8 8 EZEK… 5.0 3 32 14 9 47 28 19 MI 1641
## 9 9 STEF… 5.0 25 18 59 8 26 7 20 ON 1411
## 10 10 ANVI… 5.0 16 19 55 31 6 25 18 MI 1365
## # ℹ 54 more rows
The opponent id number in columns r1-r6 can be easily replaced with the initial rating of the opponent, making a row wise average a simple operation. Remembering to drop NA values is important to account for matches not played and prevent the NAs from passing to the final value.
averaged_op <- working_df |>
mutate(across(r1:r7, ~ working_df$prev_rating[.x])) |>
mutate(mean = round(rowMeans(pick(r1:r7), na.rm = TRUE)))
averaged_op
## # A tibble: 64 × 13
## id name score r1 r2 r3 r4 r5 r6 r7 state prev_rating
## <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 1 GARY… 6.0 1436 1563 1600 1610 1649 1663 1716 ON 1794
## 2 2 DAKS… 6.0 1175 917 1716 1629 1604 1595 1649 MI 1553
## 3 3 ADIT… 6.0 1641 955 1745 1563 1712 1666 1663 MI 1384
## 4 4 PATR… 5.5 1363 1507 1553 1579 1655 1564 1794 MI 1716
## 5 5 HANS… 5.5 1242 980 1663 1666 1716 1610 1629 MI 1655
## 6 6 HANS… 5.0 1399 1602 1712 1438 1365 1552 1563 OH 1686
## 7 7 GARY… 5.0 1092 377 1666 1712 1794 1411 1553 MI 1649
## 8 8 EZEK… 5.0 1384 1441 1610 1411 1362 1507 1564 MI 1641
## 9 9 STEF… 5.0 1745 1600 853 1641 1579 1649 1595 ON 1411
## 10 10 ANVI… 5.0 1604 1564 1186 1494 1686 1745 1600 MI 1365
## # ℹ 54 more rows
## # ℹ 1 more variable: mean <dbl>
Renaming and reordering the columns to match the desired format, the results are finally written to a csv.
final_df <- averaged_op |>
select(name, state, score, prev_rating, mean) |>
rename(`Player’s Name` = name, `Player’s State` = state, `Total Number of Points` = score, `Player’s Pre-Rating` = prev_rating, `Average Pre Chess Rating of Opponents` = mean)
write_csv(final_df, "chess_summary.csv")