Project 1: Chess Tournament

Author

Kristoff Oliphant

Introduction

We are now up to our first project! In this project we’re tasked with a structured text file that has results from 64 players from a chess tournament info text file. This file includes values such as player name, player state, total number of points, players pre-rating, and average pre-chess rating of opponents. The goal for this project is to create a quarto file that generates a csv file which could also be successfully imported into a SQL database.

Planned Workflow

I’ll use tidyverse to pull the raw text into R and proceed to isolate the player data that’s relevant to the task by filtering out the dashed lines and skipping the initial header rows. This will help leave ideally a clean structure for me to use and to help easily differentiate the data on each player provided. I’ll split the data into two subsets by using odd and even indexing where one subset includes information like name and total points, and the other contains secondary details like state and pre-rating. After separating the data I’ll use the stringr package to extract specific attributes and target strings between pipe delimiters for names and states. For the ratings I’ll isolate the pre-tournament rating digits and ensure that any post-rating are excluded for cleanliness. A lookup table would be something to create so it can help map every player ID to their corresponding pre-rating, while also writing a function to identify each player’s opponents. This can help look-up their respective ratings from the table and then calculate the arithmetic mean, using na.rm = TRUE to ensure that “byes” or unrated games don’t interfere with the final averages. Once we have all variables extracted and calculated, I’ll merge the components into a single unified data frame, export the data using write_csv to generate the completed tournament results csv file.

Anticipated Challenges

Some challenges I anticipate facing is to manage the players that have byes during their tournament run. It’s important that this is kept in mind so that it doesn’t interfere with the calculations of the average pre rating of opponents for each player. I can also see that transforming this data from its dotted lines to a usable dataframe in R can prove to be rough due to the fact that it has to be re-organized and maintain the integrity of the data.