This is my project for the reading a chess tournament cross table. I used various tools for manipulation of data that mimicked more of the Excel vlookup funtion to change the vales in the matrix of who a player played against vs. their pretournament scores.
require(stringr)
## Loading required package: stringr
require(stringi)
## Loading required package: stringi
library(stringr)
library(qdapTools)
library(stringi)
This section reads in the data and removes unecessary dashes.
#load text file by reading lines
setwd("/Users/cesarespitia/Documents/CUNYMSDA/Data 607/CUNYDATA607/Project 1")
df <- readLines("tournamentinfo.txt")
## Warning in readLines("tournamentinfo.txt"): incomplete final line found on
## 'tournamentinfo.txt'
df<-df[-(1:4)]
#remove dashes
lines<-c("-----------------------------------------------------------------------------------------")
df<-setdiff(df,lines)
#remove spaces with 2 or more
df<-str_replace_all(df," {2}","")
I then noticed that different data lived in the even and odd rows of the remaining data. Even rows had the ID of the player, name, points and who they played against.
#collapse table by odd and even rows
table<-cbind(rep(1:2,64),df)
table1<-table[(table[,1]=="1"),2]
table2<-table[(table[,1]=="2"),2]
From here, specific information was extraced using the newly learned stringr/stringi tools from the HW. When pulling the player ID i looked for 1 or 2 digits and then looked only for complete cases and then trimmed any extra spaces or ’|’s in the lists.
For the names, i looked for all capital characters of varying lengths that had word edges.
#pull PlayerID
ID<-str_extract(table1,"\\d{1,2} \\| ")
ID<-ID[complete.cases(ID)]
ID<-str_extract(ID,"\\d{1,2}")
#pull names
names<- str_extract_all(table1,"\\b[A-Z]{2,40}\\b")
names<- sapply(names, paste0, collapse=" ")
#pull state
ST<-str_extract(table2,"\\w{2} \\| ")
ST<-ST[complete.cases(ST)]
ST<-str_extract(ST,"\\w{2}")
#pull points
pts<- data.frame(str_extract(table1,"\\d{1}.\\d{1}"))
#prerating
prerating<-str_extract_all(table2,"\\d{3,4}")
prerating<-matrix(unlist(prerating),ncol=4,byrow=TRUE)
preratings<-prerating[,3]
#Values per match extraction
vals<-str_extract_all(table1,"\\d{1,2}\\|")
vals<-str_extract_all(table1,"\\d{1,2}")
#lapply(vals, as.character)
At this point, I had most of my data assembled except for the math. I had some trouble trying to figure this out, but ended up using various items from qdaptools and stringi and a lot of data frame, list to matrix manipulation.
#convertdata to numeric from list/character
temp <- stri_list2matrix(vals, byrow = TRUE)
final <- `dim<-`(as.numeric(temp), dim(temp))
final <- final[,(4:10)]
#convert preratings to factor to insert into data
preratings<-as.factor(preratings)
matchvals<-data.frame(ID,preratings,
stringsAsFactors = TRUE)
matchvals[,1]<-as.numeric(as.character(matchvals[[1]]))
matchvals[,2]<-as.numeric(as.character(matchvals[[2]]))
#replace player with pre tournament scores - prerating
k <- 1
scores<-matrix(nrow=64,ncol=7)
while(k<8) {
scores[,k]<-final[,k]%l%matchvals
k<-k+1
}
Scores were manipulated once they were in a numeric matrix that I could manipulate. I first converted scores to 1s and NAs as zeros so I could calculate the number of games played.
#manipulate scores to determine number of games played by converting scores to 1 and NA to 0s.
played<-scores
played[!is.na(played)]<-1
played[is.na(played)]<-0
played<-rowSums(played)
#manipulate matrix to remove NAs
scores[is.na(scores)]<-0
#sumrows
scores<-rowSums(scores)
#average pre chess rating
PreChess<-scores%/%played
Once all the data manipulation was done, the data was assembled and shown as a table using the formattable library. Data was not sorted and is shown as it was read in from the text file.
#combined Data
totaldata<-cbind(ID,names,ST,pts,preratings,PreChess)
colnames(totaldata)<-c("ID","Player's Name","Player's State","Total Number of Points","Player's Pre-Rating","Average Pre-Chess Rating")
library(formattable)
formattable(totaldata)
ID | Player’s Name | Player’s State | Total Number of Points | Player’s Pre-Rating | Average Pre-Chess Rating |
---|---|---|---|---|---|
1 | GARY HUA | ON | 6.0 | 1794 | 1605 |
2 | DAKSHESH DARURI | MI | 6.0 | 1553 | 1469 |
3 | ADITYA BAJAJ | MI | 6.0 | 1384 | 1563 |
4 | PATRICK SCHILLING | MI | 5.5 | 1716 | 1573 |
5 | HANSHI ZUO | MI | 5.5 | 1655 | 1500 |
6 | HANSEN SONG | OH | 5.0 | 1686 | 1518 |
7 | GARY DEE SWATHELL | MI | 5.0 | 1649 | 1372 |
8 | EZEKIEL HOUGHTON | MI | 5.0 | 1641 | 1468 |
9 | STEFANO LEE | ON | 5.0 | 1411 | 1523 |
10 | ANVIT RAO | MI | 5.0 | 1365 | 1554 |
11 | CAMERON WILLIAM MC LEMAN | MI | 4.5 | 1712 | 1467 |
12 | KENNETH TACK | MI | 4.5 | 1663 | 1506 |
13 | TORRANCE HENRY JR | MI | 4.5 | 1666 | 1497 |
14 | BRADLEY SHAW | MI | 4.5 | 1610 | 1515 |
15 | ZACHARY JAMES HOUGHTON | MI | 4.5 | 1220 | 1483 |
16 | MIKE NIKITIN | MI | 4.0 | 1604 | 1385 |
17 | RONALD GRZEGORCZYK | MI | 4.0 | 1629 | 1498 |
18 | DAVID SUNDEEN | MI | 4.0 | 1600 | 1480 |
19 | DIPANKAR ROY | MI | 4.0 | 1564 | 1426 |
20 | JASON ZHENG | MI | 4.0 | 1595 | 1410 |
21 | DINH DANG BUI | ON | 4.0 | 1563 | 1470 |
22 | EUGENE MCCLURE | MI | 4.0 | 1555 | 1300 |
23 | ALAN BUI | ON | 4.0 | 1363 | 1213 |
24 | MICHAEL ALDRICH | MI | 4.0 | 1229 | 1357 |
25 | LOREN SCHWIEBERT | MI | 3.5 | 1745 | 1363 |
26 | MAX ZHU | ON | 3.5 | 1579 | 1506 |
27 | GAURAV GIDWANI | MI | 3.5 | 1552 | 1221 |
28 | SOFIA ADINA STANESCU BELLU | MI | 3.5 | 1507 | 1522 |
29 | CHIEDOZIE OKORIE | MI | 3.5 | 1602 | 1313 |
30 | GEORGE AVERY JONES | ON | 3.5 | 1522 | 1144 |
31 | RISHI SHETTY | MI | 3.5 | 1494 | 1259 |
32 | JOSHUA PHILIP MATHEWS | ON | 3.5 | 1441 | 1378 |
33 | JADE GE | MI | 3.5 | 1449 | 1276 |
34 | MICHAEL JEFFERY THOMAS | MI | 3.5 | 1399 | 1375 |
35 | JOSHUA DAVID LEE | MI | 3.5 | 1438 | 1149 |
36 | SIDDHARTH JHA | MI | 3.5 | 1355 | 1388 |
37 | AMIYATOSH PWNANANDAM | MI | 3.5 | 980 | 1384 |
38 | BRIAN LIU | MI | 3.0 | 1423 | 1539 |
39 | JOEL HENDON | MI | 3.0 | 1436 | 1429 |
40 | FOREST ZHANG | MI | 3.0 | 1348 | 1390 |
41 | KYLE WILLIAM MURPHY | MI | 3.0 | 1403 | 1248 |
42 | JARED GE | MI | 3.0 | 1332 | 1149 |
43 | ROBERT GLEN VASEY | MI | 3.0 | 1283 | 1106 |
44 | JUSTIN SCHILLING | MI | 3.0 | 1199 | 1327 |
45 | DEREK YAN | MI | 3.0 | 1242 | 1152 |
46 | JACOB ALEXANDER LAVALLEY | MI | 3.0 | 377 | 1357 |
47 | ERIC WRIGHT | MI | 2.5 | 1362 | 1392 |
48 | DANIEL KHAIN | MI | 2.5 | 1382 | 1355 |
49 | MICHAEL MARTIN | MI | 2.5 | 1291 | 1285 |
50 | SHIVAM JHA | MI | 2.5 | 1056 | 1296 |
51 | TEJAS AYYAGARI | MI | 2.5 | 1011 | 1356 |
52 | ETHAN GUO | MI | 2.5 | 935 | 1494 |
53 | JOSE YBARRA | MI | 2.0 | 1393 | 1345 |
54 | LARRY HODGE | MI | 2.0 | 1270 | 1206 |
55 | ALEX KONG | MI | 2.0 | 1186 | 1406 |
56 | MARISA RICCI | MI | 2.0 | 1153 | 1414 |
57 | MICHAEL LU | MI | 2.0 | 1092 | 1363 |
58 | VIRAJ MOHILE | MI | 2.0 | 917 | 1391 |
59 | SEAN MC CORMICK | MI | 2.0 | 853 | 1319 |
60 | JULIA SHEN | MI | 1.5 | 967 | 1330 |
61 | JEZZEL FARKAS | ON | 1.5 | 955 | 1327 |
62 | ASHWIN BALAJI | MI | 1.0 | 1530 | 1186 |
63 | THOMAS JOSEPH HOSMER | MI | 1.0 | 1175 | 1350 |
64 | BEN LI | MI | 1.0 | 1163 | 1263 |