I will take the provided structured .txt file and upload it into my github repository to ensure my work is reproducible. I will examine the file’s information using tidyverse syntax in RStudio and ultimately generate a .csv file with columns for:
“Player’s Name”
“Total # of Points”
“Player’s State”
“Player’s Pre-Rating”
“Average Pre Chess Rating of Opponents”
Given that the .txt file is formatted in a multi-line structure with pipe delimiters, I believe I will be most challenged by trying to manipulate the columns to create a clean .csv output.
In addition to that, the column for “Average Pre Chess Rating of Opponents” consists of summing all of the player’s opponents’ Pre-Rating score and dividing by the total number of games played. This includes any wins, losses or draws. I will hand-calculate at least two test cases against my coded results to verify my work.
LLM support and enhancement
I will use Google Gemini to assist me in generating the correct code to read the structured .txt file and create the needed columns for my .csv output. Because the original .txt file has some mixed formatting, I will likely have to perform the column extraction in a couple of steps. Once those columns are generated, I will then calculate the average pre chess opponent rating.
AI citation -
(Google DeepMind. (2026). Gemini Pro [Large language model].
I ensure that I am able to use tidyverse syntax in my code by loading the necessary package.
library (tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.0 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
I examine the information by loading the data and then previewing the top 5 players.
# Read the .txt file with all lines raw_chess_data <-read_lines("https://raw.githubusercontent.com/DRA-SPS27/DATA-607---Project-1/refs/heads/main/tournamentinfo.txt")
# Preview first 5 players head(raw_chess_data, 19)
# Remove the header rows (the first two rows after separators are gone) headerless_data <- clean_chess_data[-(1:2)]
Separate multi-line rows
The data follows a pattern: Row A contains the name and points, while Row B contains the state and rating. I will use indexing to split them.
# All of the odd numbered rows are the "Name" rows row_group1 <- headerless_data[seq(1, length(headerless_data), 2)] # All of the even numbered rows are the "State/Rating" rows row_group2 <- headerless_data[seq(2, length(headerless_data), 2)]
Extract specific fields
I use read_delim on these subsets to turn the text into columns based on the pipe (|) delimiter.
library(readr)
# Extract Name and Total Points df1 <-read_delim(paste(row_group1, collapse ="\n"), delim ="|", col_names =FALSE, trim_ws =TRUE) %>%select(Player_Name = X2, Total_Points = X3)
Warning: The `file` argument of `vroom()` must use `I()` for literal data as of vroom
1.5.0.
# Bad:
vroom("X,Y\n1.5,2.3\n")
# Good:
vroom(I("X,Y\n1.5,2.3\n"))
ℹ The deprecated feature was likely used in the readr package.
Please report the issue at <https://github.com/tidyverse/readr/issues>.
Rows: 64 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: "|"
chr (8): X2, X4, X5, X6, X7, X8, X9, X10
dbl (2): X1, X3
lgl (1): X11
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Extract State and Pre-Rating # I will use regex to pull just the numbers before the "->" in the rating column df2 <-read_delim(paste(row_group2, collapse ="\n"), delim ="|", col_names =FALSE, trim_ws =TRUE) %>%mutate(State = X1, Pre_Rating =str_extract(X2, "(?<=R: )\\s*\\d+")) %>%select (State, Pre_Rating)
Rows: 64 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: "|"
chr (10): X1, X2, X3, X4, X5, X6, X7, X8, X9, X10
lgl (1): X11
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Combine needed columns
I bind the columns together and generate a new data frame with the created columns.
# Merge the two pieces side-by-side chess_tournament_data <-bind_cols(df1, df2)
Calculate the Average Pre Chess Rating of Opponents
To calculate the Average Opponent Rating, I need to treat the tournament data like a relational database where I will have each player’s unqiue ID along with their opponent’s ID for each round. This will allow me to look-up their pre rating score and bring in the average based on games played.
Extract Opponent IDs
I need to pull the tournament rounds from the row_group1 I created earlier. The opponent IDs are listed after the win/loss/draw indicator (e.g., “W 39” means they played player 39).
Rows: 64 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: "|"
chr (8): X2, X4, X5, X6, X7, X8, X9, X10
dbl (2): X1, X3
lgl (1): X11
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Create a rating look-up table
I need a simple table where each Player ID is linked to their Pre Rating so I can look up how “strong” each opponent was.
# Use the df2 created previously to make a lookup map rating_lookup <- df1 %>%mutate(Player_ID =row_number(), Pre_Rating =as.numeric(df2$Pre_Rating)) %>%select(Player_ID, Pre_Rating)
Map ratings and calculate average
I will now join the opponent IDs with their actual ratings and calculate the mean for each player
# Join the tables to find the rating for every opponent played avg_opp_rating <- opponents_df %>%left_join(rating_lookup, by =c("Opponent_ID"="Player_ID")) %>%group_by(Player_ID) %>%summarize(avg_opp_rating =round(mean(Pre_Rating, na.rm =TRUE), 0))
# Attach this to final data frame chess_tournament_data <- chess_tournament_data %>%mutate(Player_ID =row_number()) %>%left_join(avg_opp_rating, by ="Player_ID") %>%select(-Player_ID) #Remove the ID helper column before export
# Show preview of newly created data frame library(gt) gt(chess_tournament_data)
Player_Name
Total_Points
State
Pre_Rating
avg_opp_rating
GARY HUA
6.0
ON
1794
1605
DAKSHESH DARURI
6.0
MI
1553
1469
ADITYA BAJAJ
6.0
MI
1384
1564
PATRICK H SCHILLING
5.5
MI
1716
1574
HANSHI ZUO
5.5
MI
1655
1501
HANSEN SONG
5.0
OH
1686
1519
GARY DEE SWATHELL
5.0
MI
1649
1372
EZEKIEL HOUGHTON
5.0
MI
1641
1468
STEFANO LEE
5.0
ON
1411
1523
ANVIT RAO
5.0
MI
1365
1554
CAMERON WILLIAM MC LEMAN
4.5
MI
1712
1468
KENNETH J TACK
4.5
MI
1663
1506
TORRANCE HENRY JR
4.5
MI
1666
1498
BRADLEY SHAW
4.5
MI
1610
1515
ZACHARY JAMES HOUGHTON
4.5
MI
1220
1484
MIKE NIKITIN
4.0
MI
1604
1386
RONALD GRZEGORCZYK
4.0
MI
1629
1499
DAVID SUNDEEN
4.0
MI
1600
1480
DIPANKAR ROY
4.0
MI
1564
1426
JASON ZHENG
4.0
MI
1595
1411
DINH DANG BUI
4.0
ON
1563
1470
EUGENE L MCCLURE
4.0
MI
1555
1300
ALAN BUI
4.0
ON
1363
1214
MICHAEL R ALDRICH
4.0
MI
1229
1357
LOREN SCHWIEBERT
3.5
MI
1745
1363
MAX ZHU
3.5
ON
1579
1507
GAURAV GIDWANI
3.5
MI
1552
1222
SOFIA ADINA STANESCU-BELLU
3.5
MI
1507
1522
CHIEDOZIE OKORIE
3.5
MI
1602
1314
GEORGE AVERY JONES
3.5
ON
1522
1144
RISHI SHETTY
3.5
MI
1494
1260
JOSHUA PHILIP MATHEWS
3.5
ON
1441
1379
JADE GE
3.5
MI
1449
1277
MICHAEL JEFFERY THOMAS
3.5
MI
1399
1375
JOSHUA DAVID LEE
3.5
MI
1438
1150
SIDDHARTH JHA
3.5
MI
1355
1388
AMIYATOSH PWNANANDAM
3.5
MI
980
1385
BRIAN LIU
3.0
MI
1423
1539
JOEL R HENDON
3.0
MI
1436
1430
FOREST ZHANG
3.0
MI
1348
1391
KYLE WILLIAM MURPHY
3.0
MI
1403
1248
JARED GE
3.0
MI
1332
1150
ROBERT GLEN VASEY
3.0
MI
1283
1107
JUSTIN D SCHILLING
3.0
MI
1199
1327
DEREK YAN
3.0
MI
1242
1152
JACOB ALEXANDER LAVALLEY
3.0
MI
377
1358
ERIC WRIGHT
2.5
MI
1362
1392
DANIEL KHAIN
2.5
MI
1382
1356
MICHAEL J MARTIN
2.5
MI
1291
1286
SHIVAM JHA
2.5
MI
1056
1296
TEJAS AYYAGARI
2.5
MI
1011
1356
ETHAN GUO
2.5
MI
935
1495
JOSE C YBARRA
2.0
MI
1393
1345
LARRY HODGE
2.0
MI
1270
1206
ALEX KONG
2.0
MI
1186
1406
MARISA RICCI
2.0
MI
1153
1414
MICHAEL LU
2.0
MI
1092
1363
VIRAJ MOHILE
2.0
MI
917
1391
SEAN M MC CORMICK
2.0
MI
853
1319
JULIA SHEN
1.5
MI
967
1330
JEZZEL FARKAS
1.5
ON
955
1327
ASHWIN BALAJI
1.0
MI
1530
1186
THOMAS JOSEPH HOSMER
1.0
MI
1175
1350
BEN LI
1.0
MI
1163
1263
Manual review of average
I will hand-calculate at least two test cases against my coded results to verify my work.
Test Case #1 -
Player 3, Aditya Bajaj, played all games. Their opponents’ ratings are:
1641, 955, 1745, 1563, 1712, 1666, 1663
This is a total of 10,945 divided by 7 is 1564
Test Case #2 -
Player 16, Mike Nikitin, played only 5 games. Their opponents’ ratings are:
1365, 1220, 1436, 1553, 1355
This is a total of 6,929 divided by 5 is 1386
Both tests verified that my code was correct!
Export .csv file
# Write to csv write_csv(chess_tournament_data, "chess_tournament_results.csv")