Project 1

Introduction

This R Markdown file processes a text file containing chess tournament results. It extracts the following information for each player:

Player’s Name
Player’s State
Total Number of Points
Player’s Pre-Tournament Rating
Average Pre-Tournament Rating of Opponents

The final output is saved as a CSV file, which can be used for further analysis or imported into a SQL database.

Loading the Data

Load the chess tournament data from a text file using the readLines() function, which loads each line as a string into R.

# Read file into R
raw_data <- readLines("https://raw.githubusercontent.com/simonchy/DATA607/refs/heads/main/week%205/tournamentinfo.txt")

Identifying Player Entries

The text file is structured such that each player’s data starts with a line that includes the player’s number and name. We can identify these lines using grep() and store the line indices.

# Extract player entries 
player_entries <- grep("^\\s*\\d+\\s*\\|", raw_data)

Function to Calculate Average Opponent Rating

Extracting opponent IDs from the rounds.
Looking up the opponent’s rating by searching for their ID in the dataset.
Returning the average rating of the opponents.

get_opponent_ratings <- function(player_info, raw_data) {
  # Extract opponent IDs from rounds
  rounds <- player_info[grepl("(W|L|D)\\s*(\\d+)", player_info)]
  opponent_ids <- as.numeric(sub(".*[WLD]\\s*(\\d+).*", "\\1", rounds))

  # Initialize an empty vector to store opponent ratings
  opponent_ratings <- numeric()

  # Loop through opponent IDs to find their ratings
  for (id in opponent_ids) {
    # Search for the opponent's line by their ID
    opponent_line <- raw_data[grep(paste0("^\\s*", id, "\\s*\\|"), raw_data)]

    if (length(opponent_line) > 0) {
      pattern <- "\\|\\s*(\\d+\\.\\d+)\\s*\\|"
      matches <- regmatches(opponent_line, regexpr(pattern, opponent_line, perl = TRUE))
      
      opponent_rating <- sub(pattern, "\\1", matches)
      
      # Convert to numeric
      opponent_rating_num <- as.numeric(opponent_rating)
      
      # Check if the conversion succeeded
      if (!is.na(opponent_rating_num)) {
        opponent_ratings <- c(opponent_ratings, opponent_rating_num)
      }
    }
  }

  # Return the average rating of opponents, or NA if no valid ratings found
  if (length(opponent_ratings) > 0) {
    return(mean(opponent_ratings, na.rm = TRUE))
  } else {
    return(NA)  # Return NA if no valid ratings were found
  }
}

Extracting Information for Each Player

Looping over each player’s entry in the data, extract relevant details (name, state, pre-rating, etc.), and compute the average opponent rating using the function defined above. The data is then stored in the data frame.

# Extract lines of player entries
player_info <- lapply(seq_along(player_entries), function(i) raw_data[player_entries[i]:(player_entries[i]+2)])

# Extract player names
player_names <- sapply(player_info, function(info) sub("^\\s*\\d+\\s*\\|\\s*(.*?)\\s*\\|.*", "\\1", info[1]))

# Extract states
player_states <- sapply(player_info, function(info) sub("^.*\\s([A-Z]{2})\\s.*", "\\1", info[2]))

# Extract pre-ratings
pre_ratings <- sapply(player_info, function(info) as.numeric(sub(".*R:\\s*(\\d+).*", "\\1", info[2])))

# Extract total points
total_points <- sapply(player_info, function(info) as.numeric(sub("^.*\\|\\s*(\\d+\\.\\d+).*", "\\1", info[1])))

# Calculate average opponent ratings
avg_opponent_ratings <- sapply(player_info, function(info) get_opponent_ratings(info, raw_data))

# Create a data frame using the extracted data
chess_data <- data.frame(
  Name = player_names,
  State = player_states,
  TotalPoints = total_points,
  PreRating = pre_ratings,
  AvgOpponentRating = avg_opponent_ratings
)

Exporting the Data to a CSV File

# Output the dataframe as a CSV file
write.csv(chess_data, "chess_tournament_results.csv", row.names = FALSE)

Conclusion

# Display the table of results
kable(chess_data, caption = "Chess Tournament Results")

Chess Tournament Results
Name	State	TotalPoints	PreRating	AvgOpponentRating
GARY HUA	ON	6.0	1794	5.5
DAKSHESH DARURI	MI	6.0	1553	5.0
ADITYA BAJAJ	MI	6.0	1384	4.5
PATRICK H SCHILLING	MI	5.5	1716	6.0
HANSHI ZUO	MI	5.5	1655	4.0
HANSEN SONG	OH	5.0	1686	4.0
GARY DEE SWATHELL	MI	5.0	1649	6.0
EZEKIEL HOUGHTON	MI	5.0	1641	4.0
STEFANO LEE	ON	5.0	1411	4.0
ANVIT RAO	MI	5.0	1365	4.0
CAMERON WILLIAM MC LEMAN	MI	4.5	1712	3.5
KENNETH J TACK	MI	4.5	1663	6.0
TORRANCE HENRY JR	MI	4.5	1666	3.5
BRADLEY SHAW	MI	4.5	1610	3.5
ZACHARY JAMES HOUGHTON	MI	4.5	1220	3.0
MIKE NIKITIN	MI	4.0	1604	3.5
RONALD GRZEGORCZYK	MI	4.0	1629	5.5
DAVID SUNDEEN	MI	4.0	1600	5.0
DIPANKAR ROY	MI	4.0	1564	5.0
JASON ZHENG	MI	4.0	1595	5.0
DINH DANG BUI	ON	4.0	1563	5.0
EUGENE L MCCLURE	MI	4.0	1555	3.0
ALAN BUI	ON	4.0	1363	3.0
MICHAEL R ALDRICH	MI	4.0	1229	3.0
LOREN SCHWIEBERT	MI	3.5	1745	2.5
MAX ZHU	ON	3.5	1579	4.5
GAURAV GIDWANI	MI	3.5	1552	5.0
SOFIA ADINA STANESCU-BELLU	MI	3.5	1507	3.5
CHIEDOZIE OKORIE	MI	3.5	1602	2.5
GEORGE AVERY JONES	ON	3.5	1522	2.5
RISHI SHETTY	MI	3.5	1494	4.5
JOSHUA PHILIP MATHEWS	ON	3.5	1441	4.5
JADE GE	MI	3.5	1449	2.5
MICHAEL JEFFERY THOMAS	MI	3.5	1399	2.5
JOSHUA DAVID LEE	MI	3.5	1438	2.5
SIDDHARTH JHA	MI	3.5	1355	3.5
AMIYATOSH PWNANANDAM	MI	3.5	980	1.5
BRIAN LIU	MI	3.0	1423	4.5
JOEL R HENDON	MI	3.0	1436	4.0
FOREST ZHANG	MI	3.0	1348	4.0
KYLE WILLIAM MURPHY	MI	3.0	1403	4.0
JARED GE	MI	3.0	1332	2.0
ROBERT GLEN VASEY	MI	3.0	1283	2.0
JUSTIN D SCHILLING	MI	3.0	1199	2.0
DEREK YAN	MI	3.0	1242	2.0
JACOB ALEXANDER LAVALLEY	MI	3.0	377	4.0
ERIC WRIGHT	MI	2.5	1362	3.5
DANIEL KHAIN	MI	2.5	1382	3.5
MICHAEL J MARTIN	MI	2.5	1291	2.0
SHIVAM JHA	MI	2.5	1056	3.5
TEJAS AYYAGARI	MI	2.5	1011	3.5
ETHAN GUO	MI	2.5	935	3.5
JOSE C YBARRA	MI	2.0	1393	2.0
LARRY HODGE	MI	2.0	1270	1.0
ALEX KONG	MI	2.0	1186	3.0
MARISA RICCI	MI	2.0	1153	3.0
MICHAEL LU	MI	2.0	1092	2.0
VIRAJ MOHILE	MI	2.0	917	3.0
SEAN M MC CORMICK	MI	2.0	853	3.0
JULIA SHEN	MI	1.5	967	4.0
JEZZEL FARKAS	ON	1.5	955	3.5
ASHWIN BALAJI	MI	1.0	1530	2.0
THOMAS JOSEPH HOSMER	MI	1.0	1175	3.0
BEN LI	MI	1.0	1163	2.0

This output can be useful for evaluating player performance or further analysis in data science projects.