Introduction

This R Markdown file processes a text file containing chess tournament results. It extracts the following information for each player:

The final output is saved as a CSV file, which can be used for further analysis or imported into a SQL database.

Loading the Data

Load the chess tournament data from a text file using the readLines() function, which loads each line as a string into R.

# Read file into R
raw_data <- readLines("https://raw.githubusercontent.com/simonchy/DATA607/refs/heads/main/week%205/tournamentinfo.txt")

Identifying Player Entries

The text file is structured such that each player’s data starts with a line that includes the player’s number and name. We can identify these lines using grep() and store the line indices.

# Extract player entries 
player_entries <- grep("^\\s*\\d+\\s*\\|", raw_data)

Function to Calculate Average Opponent Rating

  1. Extracting opponent IDs from the rounds.
  2. Looking up the opponent’s rating by searching for their ID in the dataset.
  3. Returning the average rating of the opponents.
get_opponent_ratings <- function(player_info, raw_data) {
  # Extract opponent IDs from rounds
  rounds <- player_info[grepl("(W|L|D)\\s*(\\d+)", player_info)]
  opponent_ids <- as.numeric(sub(".*[WLD]\\s*(\\d+).*", "\\1", rounds))

  # Initialize an empty vector to store opponent ratings
  opponent_ratings <- numeric()

  # Loop through opponent IDs to find their ratings
  for (id in opponent_ids) {
    # Search for the opponent's line by their ID
    opponent_line <- raw_data[grep(paste0("^\\s*", id, "\\s*\\|"), raw_data)]

    if (length(opponent_line) > 0) {
      pattern <- "\\|\\s*(\\d+\\.\\d+)\\s*\\|"
      matches <- regmatches(opponent_line, regexpr(pattern, opponent_line, perl = TRUE))
      
      opponent_rating <- sub(pattern, "\\1", matches)
      
      # Convert to numeric
      opponent_rating_num <- as.numeric(opponent_rating)
      
      # Check if the conversion succeeded
      if (!is.na(opponent_rating_num)) {
        opponent_ratings <- c(opponent_ratings, opponent_rating_num)
      }
    }
  }

  # Return the average rating of opponents, or NA if no valid ratings found
  if (length(opponent_ratings) > 0) {
    return(mean(opponent_ratings, na.rm = TRUE))
  } else {
    return(NA)  # Return NA if no valid ratings were found
  }
}

Extracting Information for Each Player

Looping over each player’s entry in the data, extract relevant details (name, state, pre-rating, etc.), and compute the average opponent rating using the function defined above. The data is then stored in the data frame.

# Extract lines of player entries
player_info <- lapply(seq_along(player_entries), function(i) raw_data[player_entries[i]:(player_entries[i]+2)])

# Extract player names
player_names <- sapply(player_info, function(info) sub("^\\s*\\d+\\s*\\|\\s*(.*?)\\s*\\|.*", "\\1", info[1]))

# Extract states
player_states <- sapply(player_info, function(info) sub("^.*\\s([A-Z]{2})\\s.*", "\\1", info[2]))

# Extract pre-ratings
pre_ratings <- sapply(player_info, function(info) as.numeric(sub(".*R:\\s*(\\d+).*", "\\1", info[2])))

# Extract total points
total_points <- sapply(player_info, function(info) as.numeric(sub("^.*\\|\\s*(\\d+\\.\\d+).*", "\\1", info[1])))

# Calculate average opponent ratings
avg_opponent_ratings <- sapply(player_info, function(info) get_opponent_ratings(info, raw_data))

# Create a data frame using the extracted data
chess_data <- data.frame(
  Name = player_names,
  State = player_states,
  TotalPoints = total_points,
  PreRating = pre_ratings,
  AvgOpponentRating = avg_opponent_ratings
)

Exporting the Data to a CSV File

# Output the dataframe as a CSV file
write.csv(chess_data, "chess_tournament_results.csv", row.names = FALSE)

Conclusion

# Display the table of results
kable(chess_data, caption = "Chess Tournament Results")
Chess Tournament Results
Name State TotalPoints PreRating AvgOpponentRating
GARY HUA ON 6.0 1794 5.5
DAKSHESH DARURI MI 6.0 1553 5.0
ADITYA BAJAJ MI 6.0 1384 4.5
PATRICK H SCHILLING MI 5.5 1716 6.0
HANSHI ZUO MI 5.5 1655 4.0
HANSEN SONG OH 5.0 1686 4.0
GARY DEE SWATHELL MI 5.0 1649 6.0
EZEKIEL HOUGHTON MI 5.0 1641 4.0
STEFANO LEE ON 5.0 1411 4.0
ANVIT RAO MI 5.0 1365 4.0
CAMERON WILLIAM MC LEMAN MI 4.5 1712 3.5
KENNETH J TACK MI 4.5 1663 6.0
TORRANCE HENRY JR MI 4.5 1666 3.5
BRADLEY SHAW MI 4.5 1610 3.5
ZACHARY JAMES HOUGHTON MI 4.5 1220 3.0
MIKE NIKITIN MI 4.0 1604 3.5
RONALD GRZEGORCZYK MI 4.0 1629 5.5
DAVID SUNDEEN MI 4.0 1600 5.0
DIPANKAR ROY MI 4.0 1564 5.0
JASON ZHENG MI 4.0 1595 5.0
DINH DANG BUI ON 4.0 1563 5.0
EUGENE L MCCLURE MI 4.0 1555 3.0
ALAN BUI ON 4.0 1363 3.0
MICHAEL R ALDRICH MI 4.0 1229 3.0
LOREN SCHWIEBERT MI 3.5 1745 2.5
MAX ZHU ON 3.5 1579 4.5
GAURAV GIDWANI MI 3.5 1552 5.0
SOFIA ADINA STANESCU-BELLU MI 3.5 1507 3.5
CHIEDOZIE OKORIE MI 3.5 1602 2.5
GEORGE AVERY JONES ON 3.5 1522 2.5
RISHI SHETTY MI 3.5 1494 4.5
JOSHUA PHILIP MATHEWS ON 3.5 1441 4.5
JADE GE MI 3.5 1449 2.5
MICHAEL JEFFERY THOMAS MI 3.5 1399 2.5
JOSHUA DAVID LEE MI 3.5 1438 2.5
SIDDHARTH JHA MI 3.5 1355 3.5
AMIYATOSH PWNANANDAM MI 3.5 980 1.5
BRIAN LIU MI 3.0 1423 4.5
JOEL R HENDON MI 3.0 1436 4.0
FOREST ZHANG MI 3.0 1348 4.0
KYLE WILLIAM MURPHY MI 3.0 1403 4.0
JARED GE MI 3.0 1332 2.0
ROBERT GLEN VASEY MI 3.0 1283 2.0
JUSTIN D SCHILLING MI 3.0 1199 2.0
DEREK YAN MI 3.0 1242 2.0
JACOB ALEXANDER LAVALLEY MI 3.0 377 4.0
ERIC WRIGHT MI 2.5 1362 3.5
DANIEL KHAIN MI 2.5 1382 3.5
MICHAEL J MARTIN MI 2.5 1291 2.0
SHIVAM JHA MI 2.5 1056 3.5
TEJAS AYYAGARI MI 2.5 1011 3.5
ETHAN GUO MI 2.5 935 3.5
JOSE C YBARRA MI 2.0 1393 2.0
LARRY HODGE MI 2.0 1270 1.0
ALEX KONG MI 2.0 1186 3.0
MARISA RICCI MI 2.0 1153 3.0
MICHAEL LU MI 2.0 1092 2.0
VIRAJ MOHILE MI 2.0 917 3.0
SEAN M MC CORMICK MI 2.0 853 3.0
JULIA SHEN MI 1.5 967 4.0
JEZZEL FARKAS ON 1.5 955 3.5
ASHWIN BALAJI MI 1.0 1530 2.0
THOMAS JOSEPH HOSMER MI 1.0 1175 3.0
BEN LI MI 1.0 1163 2.0

This output can be useful for evaluating player performance or further analysis in data science projects.