Take chess tournament results text file, and create an R Markdown document which generates a structured CSV file. Output CSV should follow the format: 1. Player’s Name 2. Player’s State 3. Total Number of Points 4. Player’s Pre-Rating 5. Average Pre Chess Rating of Opponents
For example, the first player would be Gary Hua, ON, 6.0, 1794, 1605
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.1.0
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.4 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
Because I can’t import the text from Blackboard (due to it being behind a login wall), I uploaded the text file to my own Github, where I’ll read it from.
Reading the file is somewhat difficult because it has an unconventional structure. The simplest solution I could find was using readLines.
chess <- readLines("https://raw.githubusercontent.com/rossboehme/DATA607/main/project1/data607-project1-chess.txt")
My approach is to first remove unnecessary info and split the df into three groups based on the information a given line provided. This should make my regex easier.
#Remove first four rows
chess <- chess[-c(0:4)]
#Splitting dataframe into three groups based on rows
#Rows 1, 4, 7, etc. contain Pair Number, Player Name, Total Points, Opponent info,
#Aliasing "PairNamePointsOpp"
PairNamePointsOpp <- chess[seq(1, length(chess),3)]
#Rows 2, 5, 8, etc. contain Player State, Pre Rating.
#Aliasing "StateRating"
StateRating <- chess[seq(2, length(chess),3)]
#Rows 3, 6, 9, etc. can be removed. Don't need to be saved.
#Note: I put my regexes on lines by themselves so I can visualize them easier
pair_num <- as.integer(unlist(str_extract_all(PairNamePointsOpp,
"(?<=\\s{3,4})\\d{1,2}(?=\\s)"
)))
name <- unlist(trimws(str_extract(PairNamePointsOpp,
"([[A-Z]]+\\s){2,3}"
)))
points <- as.numeric(unlist(str_extract(PairNamePointsOpp,
"\\d+\\.\\d+"
)))
opponents <- as.integer(unlist(str_extract_all(str_extract_all(PairNamePointsOpp,
"[[0-9]]+\\|"),"[[0-9]]+"
)))
## Warning in stri_extract_all_regex(string, pattern, simplify = simplify, :
## argument is not an atomic vector; coercing
state <- str_extract(StateRating,
"[[A-Z]]+"
)
# I'll pull two versions of the "pre ratings":
# 1) The full version includes the alphabetical characters e.g. "EZEKIEL HOUGHTON": 1641P17. I'll call this simply "pre-rating"
# 2) However, since I need to average players opponents' Pre Ratings for my final product, I'll pull another adjusted ("adj") version which is only the numeric characters, and use that for my calculations
pre_rating <- unlist(trimws(str_extract(StateRating,
"(?<=>)(\\s)?[0-9A-Z]{3,7}"
)))
pre_rating_adj <- as.integer(unlist(trimws(str_extract(StateRating,
"(?<=>)(\\s)?[0-9]{3,4}"
))))
chess_cleaned <- data.frame(name,state,points,pre_rating)
As a final step, I need to add the Average Pre Chess Rating of Opponents to my chess_cleaned df created above. There are 6.375 opponents for every player (“name”), therefore there are some players who played fewer than 7 matches.
#Average of 6.375 opponents for every player
length(opponents) / length(name)
## [1] 6.375
As a solution, I’ll create a matrix containing the pair numbers each player (“name”) played, this means there will be NA values accounting for missed matches.
col_df <- str_split(PairNamePointsOpp, pattern = "\\|",simplify = TRUE)
opp_matrix <- matrix(as.numeric(str_extract_all(col_df[,4:10], pattern = "..$")), ncol = 7)
I’ll run a for loop over that matrix, averaging for each row (while skipping over NA values) the opponents’ pre ratings.
avg_opp_pre_rating <- c()
for(i in 1:nrow(opp_matrix)){
avg_opp_pre_rating[i] <- round(mean(pre_rating_adj[opp_matrix[i,]], na.rm = TRUE),0)
}
Finally I’ll add this column to my df.
chess_cleaned$avg_opp_pre_rating = avg_opp_pre_rating
chess_cleaned
## name state points pre_rating avg_opp_pre_rating
## 1 GARY HUA ON 6.0 1817 1611
## 2 DAKSHESH DARURI MI 6.0 1663 1468
## 3 ADITYA BAJAJ MI 6.0 1640 1558
## 4 PATRICK H SCHILLING MI 5.5 1744 1598
## 5 HANSHI ZUO MI 5.5 1690 1510
## 6 HANSEN SONG OH 5.0 1687 1520
## 7 GARY DEE SWATHELL MI 5.0 1673 1508
## 8 EZEKIEL HOUGHTON MI 5.0 1657P24 1526
## 9 STEFANO LEE ON 5.0 1564 1517
## 10 ANVIT RAO MI 5.0 1544 1537
## 11 CAMERON WILLIAM MC MI 4.5 1696 1506
## 12 KENNETH J TACK MI 4.5 1670 1544
## 13 TORRANCE HENRY JR MI 4.5 1662 1538
## 14 BRADLEY SHAW MI 4.5 1618 1507
## 15 ZACHARY JAMES HOUGHTON MI 4.5 1416P20 1459
## 16 MIKE NIKITIN MI 4.0 1613 1481
## 17 RONALD GRZEGORCZYK MI 4.0 1610 1499
## 18 DAVID SUNDEEN MI 4.0 1600 1530
## 19 DIPANKAR ROY MI 4.0 1570 1509
## 20 JASON ZHENG MI 4.0 1569 1437
## 21 DINH DANG BUI ON 4.0 1562 1498
## 22 EUGENE L MCCLURE MI 4.0 1529 1348
## 23 ALAN BUI ON 4.0 1371 1323
## 24 MICHAEL R ALDRICH MI 4.0 1300 1339
## 25 LOREN SCHWIEBERT MI 3.5 1681 1450
## 26 MAX ZHU ON 3.5 1564 1522
## 27 GAURAV GIDWANI MI 3.5 1539 1370
## 28 SOFIA ADINA MI 3.5 1513 1534
## 29 CHIEDOZIE OKORIE MI 3.5 1508P12 1344
## 30 GEORGE AVERY JONES ON 3.5 1444 1188
## 31 RISHI SHETTY MI 3.5 1444 1276
## 32 JOSHUA PHILIP MATHEWS ON 3.5 1433 1394
## 33 JADE GE MI 3.5 1421 1330
## 34 MICHAEL JEFFERY THOMAS MI 3.5 1400 1389
## 35 JOSHUA DAVID LEE MI 3.5 1392 1264
## 36 SIDDHARTH JHA MI 3.5 1367 1398
## 37 AMIYATOSH PWNANANDAM MI 3.5 1077P17 1396
## 38 BRIAN LIU MI 3.0 1439 1547
## 39 JOEL R HENDON MI 3.0 1413 1434
## 40 FOREST ZHANG MI 3.0 1346 1379
## 41 KYLE WILLIAM MURPHY MI 3.0 1341P9 1250
## 42 JARED GE MI 3.0 1256 1154
## 43 ROBERT GLEN VASEY MI 3.0 1244 1211
## 44 JUSTIN D SCHILLING MI 3.0 1199 1334
## 45 DEREK YAN MI 3.0 1191 1163
## 46 JACOB ALEXANDER LAVALLEY MI 3.0 1076P10 1349
## 47 ERIC WRIGHT MI 2.5 1341 1411
## 48 DANIEL KHAIN MI 2.5 1335 1345
## 49 MICHAEL J MARTIN MI 2.5 1259P17 1262
## 50 SHIVAM JHA MI 2.5 1111 1358
## 51 TEJAS AYYAGARI MI 2.5 1097 1339
## 52 ETHAN GUO MI 2.5 1092 1454
## 53 JOSE C YBARRA MI 2.0 1359 1320
## 54 LARRY HODGE MI 2.0 1200 1236
## 55 ALEX KONG MI 2.0 1163 1400
## 56 MARISA RICCI MI 2.0 1140 1376
## 57 MICHAEL LU MI 2.0 1079 1357
## 58 VIRAJ MOHILE MI 2.0 941 1378
## 59 SEAN M MC MI 2.0 878 1316
## 60 JULIA SHEN MI 1.5 984 1314
## 61 JEZZEL FARKAS ON 1.5 979P18 1342
## 62 ASHWIN BALAJI MI 1.0 1535 1163
## 63 THOMAS JOSEPH HOSMER MI 1.0 1125 1338
## 64 BEN LI MI 1.0 1112 1315
#User should write to whatever path they want. For me, it's my desktop.
write.csv(chess_cleaned,"C:\\Users\\rossboehme\\Desktop\\chess.csv",row.names=FALSE)