This R Markdown file processes a text file containing chess tournament results. It extracts the following information for each player:
The final output is saved as a CSV file, which can be used for further analysis or imported into a SQL database.
Load the chess tournament data from a text file using the readLines() function, which loads each line as a string into R.
# Read file into R
raw_data <- readLines("https://raw.githubusercontent.com/simonchy/DATA607/refs/heads/main/week%205/tournamentinfo.txt")
The text file is structured such that each player’s data starts with a line that includes the player’s number and name. We can identify these lines using grep() and store the line indices.
# Extract player entries
player_entries <- grep("^\\s*\\d+\\s*\\|", raw_data)
get_opponent_ratings <- function(player_info, raw_data) {
# Extract opponent IDs from rounds
rounds <- player_info[grepl("(W|L|D)\\s*(\\d+)", player_info)]
opponent_ids <- as.numeric(sub(".*[WLD]\\s*(\\d+).*", "\\1", rounds))
# Initialize an empty vector to store opponent ratings
opponent_ratings <- numeric()
# Loop through opponent IDs to find their ratings
for (id in opponent_ids) {
# Search for the opponent's line by their ID
opponent_line <- raw_data[grep(paste0("^\\s*", id, "\\s*\\|"), raw_data)]
if (length(opponent_line) > 0) {
pattern <- "\\|\\s*(\\d+\\.\\d+)\\s*\\|"
matches <- regmatches(opponent_line, regexpr(pattern, opponent_line, perl = TRUE))
opponent_rating <- sub(pattern, "\\1", matches)
# Convert to numeric
opponent_rating_num <- as.numeric(opponent_rating)
# Check if the conversion succeeded
if (!is.na(opponent_rating_num)) {
opponent_ratings <- c(opponent_ratings, opponent_rating_num)
}
}
}
# Return the average rating of opponents, or NA if no valid ratings found
if (length(opponent_ratings) > 0) {
return(mean(opponent_ratings, na.rm = TRUE))
} else {
return(NA) # Return NA if no valid ratings were found
}
}
Looping over each player’s entry in the data, extract relevant details (name, state, pre-rating, etc.), and compute the average opponent rating using the function defined above. The data is then stored in the data frame.
# Extract lines of player entries
player_info <- lapply(seq_along(player_entries), function(i) raw_data[player_entries[i]:(player_entries[i]+2)])
# Extract player names
player_names <- sapply(player_info, function(info) sub("^\\s*\\d+\\s*\\|\\s*(.*?)\\s*\\|.*", "\\1", info[1]))
# Extract states
player_states <- sapply(player_info, function(info) sub("^.*\\s([A-Z]{2})\\s.*", "\\1", info[2]))
# Extract pre-ratings
pre_ratings <- sapply(player_info, function(info) as.numeric(sub(".*R:\\s*(\\d+).*", "\\1", info[2])))
# Extract total points
total_points <- sapply(player_info, function(info) as.numeric(sub("^.*\\|\\s*(\\d+\\.\\d+).*", "\\1", info[1])))
# Calculate average opponent ratings
avg_opponent_ratings <- sapply(player_info, function(info) get_opponent_ratings(info, raw_data))
# Create a data frame using the extracted data
chess_data <- data.frame(
Name = player_names,
State = player_states,
TotalPoints = total_points,
PreRating = pre_ratings,
AvgOpponentRating = avg_opponent_ratings
)
# Output the dataframe as a CSV file
write.csv(chess_data, "chess_tournament_results.csv", row.names = FALSE)
# Display the table of results
kable(chess_data, caption = "Chess Tournament Results")
| Name | State | TotalPoints | PreRating | AvgOpponentRating |
|---|---|---|---|---|
| GARY HUA | ON | 6.0 | 1794 | 5.5 |
| DAKSHESH DARURI | MI | 6.0 | 1553 | 5.0 |
| ADITYA BAJAJ | MI | 6.0 | 1384 | 4.5 |
| PATRICK H SCHILLING | MI | 5.5 | 1716 | 6.0 |
| HANSHI ZUO | MI | 5.5 | 1655 | 4.0 |
| HANSEN SONG | OH | 5.0 | 1686 | 4.0 |
| GARY DEE SWATHELL | MI | 5.0 | 1649 | 6.0 |
| EZEKIEL HOUGHTON | MI | 5.0 | 1641 | 4.0 |
| STEFANO LEE | ON | 5.0 | 1411 | 4.0 |
| ANVIT RAO | MI | 5.0 | 1365 | 4.0 |
| CAMERON WILLIAM MC LEMAN | MI | 4.5 | 1712 | 3.5 |
| KENNETH J TACK | MI | 4.5 | 1663 | 6.0 |
| TORRANCE HENRY JR | MI | 4.5 | 1666 | 3.5 |
| BRADLEY SHAW | MI | 4.5 | 1610 | 3.5 |
| ZACHARY JAMES HOUGHTON | MI | 4.5 | 1220 | 3.0 |
| MIKE NIKITIN | MI | 4.0 | 1604 | 3.5 |
| RONALD GRZEGORCZYK | MI | 4.0 | 1629 | 5.5 |
| DAVID SUNDEEN | MI | 4.0 | 1600 | 5.0 |
| DIPANKAR ROY | MI | 4.0 | 1564 | 5.0 |
| JASON ZHENG | MI | 4.0 | 1595 | 5.0 |
| DINH DANG BUI | ON | 4.0 | 1563 | 5.0 |
| EUGENE L MCCLURE | MI | 4.0 | 1555 | 3.0 |
| ALAN BUI | ON | 4.0 | 1363 | 3.0 |
| MICHAEL R ALDRICH | MI | 4.0 | 1229 | 3.0 |
| LOREN SCHWIEBERT | MI | 3.5 | 1745 | 2.5 |
| MAX ZHU | ON | 3.5 | 1579 | 4.5 |
| GAURAV GIDWANI | MI | 3.5 | 1552 | 5.0 |
| SOFIA ADINA STANESCU-BELLU | MI | 3.5 | 1507 | 3.5 |
| CHIEDOZIE OKORIE | MI | 3.5 | 1602 | 2.5 |
| GEORGE AVERY JONES | ON | 3.5 | 1522 | 2.5 |
| RISHI SHETTY | MI | 3.5 | 1494 | 4.5 |
| JOSHUA PHILIP MATHEWS | ON | 3.5 | 1441 | 4.5 |
| JADE GE | MI | 3.5 | 1449 | 2.5 |
| MICHAEL JEFFERY THOMAS | MI | 3.5 | 1399 | 2.5 |
| JOSHUA DAVID LEE | MI | 3.5 | 1438 | 2.5 |
| SIDDHARTH JHA | MI | 3.5 | 1355 | 3.5 |
| AMIYATOSH PWNANANDAM | MI | 3.5 | 980 | 1.5 |
| BRIAN LIU | MI | 3.0 | 1423 | 4.5 |
| JOEL R HENDON | MI | 3.0 | 1436 | 4.0 |
| FOREST ZHANG | MI | 3.0 | 1348 | 4.0 |
| KYLE WILLIAM MURPHY | MI | 3.0 | 1403 | 4.0 |
| JARED GE | MI | 3.0 | 1332 | 2.0 |
| ROBERT GLEN VASEY | MI | 3.0 | 1283 | 2.0 |
| JUSTIN D SCHILLING | MI | 3.0 | 1199 | 2.0 |
| DEREK YAN | MI | 3.0 | 1242 | 2.0 |
| JACOB ALEXANDER LAVALLEY | MI | 3.0 | 377 | 4.0 |
| ERIC WRIGHT | MI | 2.5 | 1362 | 3.5 |
| DANIEL KHAIN | MI | 2.5 | 1382 | 3.5 |
| MICHAEL J MARTIN | MI | 2.5 | 1291 | 2.0 |
| SHIVAM JHA | MI | 2.5 | 1056 | 3.5 |
| TEJAS AYYAGARI | MI | 2.5 | 1011 | 3.5 |
| ETHAN GUO | MI | 2.5 | 935 | 3.5 |
| JOSE C YBARRA | MI | 2.0 | 1393 | 2.0 |
| LARRY HODGE | MI | 2.0 | 1270 | 1.0 |
| ALEX KONG | MI | 2.0 | 1186 | 3.0 |
| MARISA RICCI | MI | 2.0 | 1153 | 3.0 |
| MICHAEL LU | MI | 2.0 | 1092 | 2.0 |
| VIRAJ MOHILE | MI | 2.0 | 917 | 3.0 |
| SEAN M MC CORMICK | MI | 2.0 | 853 | 3.0 |
| JULIA SHEN | MI | 1.5 | 967 | 4.0 |
| JEZZEL FARKAS | ON | 1.5 | 955 | 3.5 |
| ASHWIN BALAJI | MI | 1.0 | 1530 | 2.0 |
| THOMAS JOSEPH HOSMER | MI | 1.0 | 1175 | 3.0 |
| BEN LI | MI | 1.0 | 1163 | 2.0 |
This output can be useful for evaluating player performance or further analysis in data science projects.