library(stringr)
library(qdapTools)
library(stringi)
Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players: Player’s Name Player’s State Total Number of Points Player’s Pre-Rating and Average Pre Chess Rating of Opponents
For the first player, the information would be:Gary Hua, ON, 6.0,1794, 1605 1605 was calculated by using the pre-tournament opponents’ ratings of 1436, 1563, 1600, 1610, 1649, 1663, 1716, and dividing by the total number of games played.
Below are different block of commands I try to use to create a table and manipulate special characters from the text file; however I found much easier work with a vectors. Those command also removes special characters from a text file.
“tournament.res <- read.table(”https://raw.githubusercontent.com/doradu8030/Data607/master/tournamentinfo.txt“, header=FALSE, sep=”“) tournament.res <- readLines(”tournamentinfo.txt“) str_replace_all(tournament.res,”[[:punct:]]“,” “) tournament.res<- tournament.res[-(1:4)] tournament.res<-setdiff(tournament.res,lines) tournament.res<-str_replace_all(tournament.res,” {2}“,”“) tournament.res<- str_replace_all(tournament.res,”([/])“,”“) gsub(”[^[:alnum:][:blank:]+?&/\-]“,”“, tournament.res) gsub(”[[:punct:]]“,” “, tournament.res) regex (”[[:punct:]]“, tournament.res)”"
I use the command read.delim which removes most of the special characters from the text file
tournament.res <- read.delim("https://raw.githubusercontent.com/doradu8030/Data607/master/tournamentinfo.txt", header=FALSE, sep="|", skip = 4, stringsAsFactors = FALSE)
head(tournament.res)
## V1
## 1 1
## 2 ON
## 3 -----------------------------------------------------------------------------------------
## 4 2
## 5 MI
## 6 -----------------------------------------------------------------------------------------
## V2 V3 V4 V5 V6 V7 V8
## 1 GARY HUA 6.0 W 39 W 21 W 18 W 14 W 7
## 2 15445895 / R: 1794 ->1817 N:2 W B W B W
## 3
## 4 DAKSHESH DARURI 6.0 W 63 W 58 L 4 W 17 W 16
## 5 14598900 / R: 1553 ->1663 N:2 B W B W B
## 6
## V9 V10 V11
## 1 D 12 D 4 NA
## 2 B W NA
## 3 NA
## 4 W 20 W 7 NA
## 5 W B NA
## 6 NA
After several attempts I created only one data frame with all fields needed.
tournament.rounds <- data.frame(
playername = tournament.res$V2[c(TRUE, FALSE, FALSE)],
playerstate = tournament.res$V1[c(FALSE, TRUE, FALSE)],
totnumpoints = round(as.numeric(tournament.res$V3[c(TRUE, FALSE, FALSE)]), 1),
round1 = as.numeric(str_extract(tournament.res$V4[c(TRUE, FALSE, FALSE)],regex("[:digit:]+"))),
round2 = as.numeric(str_extract(tournament.res$V5[c(TRUE, FALSE, FALSE)],regex("[:digit:]+"))),
round3 = as.numeric(str_extract(tournament.res$V6[c(TRUE, FALSE, FALSE)],regex("[:digit:]+"))),
round4 = as.numeric(str_extract(tournament.res$V7[c(TRUE, FALSE, FALSE)],regex("[:digit:]+"))),
round5 = as.numeric(str_extract(tournament.res$V8[c(TRUE, FALSE, FALSE)],regex("[:digit:]+"))),
round6 = as.numeric(str_extract(tournament.res$V9[c(TRUE, FALSE, FALSE)],regex("[:digit:]+"))),
round7 = as.numeric(str_extract(tournament.res$V10[c(TRUE, FALSE, FALSE)],regex("[:digit:]+"))),
prerating = integer(64),
avgrateoppon = integer(64))
# the definicion of interger only allows the value 64 which corresponde to the number of rows in the data set
How the instruction c(TRUE,FALSE,FALSE) included above works; See the original file
head(tournament.res)
## V1
## 1 1
## 2 ON
## 3 -----------------------------------------------------------------------------------------
## 4 2
## 5 MI
## 6 -----------------------------------------------------------------------------------------
## V2 V3 V4 V5 V6 V7 V8
## 1 GARY HUA 6.0 W 39 W 21 W 18 W 14 W 7
## 2 15445895 / R: 1794 ->1817 N:2 W B W B W
## 3
## 4 DAKSHESH DARURI 6.0 W 63 W 58 L 4 W 17 W 16
## 5 14598900 / R: 1553 ->1663 N:2 B W B W B
## 6
## V9 V10 V11
## 1 D 12 D 4 NA
## 2 B W NA
## 3 NA
## 4 W 20 W 7 NA
## 5 W B NA
## 6 NA
Note that the information needed to be extrated is found every third row; row 1,4,7 and so on. see the changes when I added logical values to the c vector
playername = tournament.res$V2[c(TRUE)]
head(playername)
## [1] " GARY HUA " " 15445895 / R: 1794 ->1817 "
## [3] "" " DAKSHESH DARURI "
## [5] " 14598900 / R: 1553 ->1663 " ""
playername = tournament.res$V2[c(TRUE, FALSE)]
head(playername)
## [1] " GARY HUA " ""
## [3] " 14598900 / R: 1553 ->1663 " " ADITYA BAJAJ "
## [5] "" " 12616049 / R: 1716 ->1744 "
playername = tournament.res$V2[c(TRUE, FALSE, TRUE)]
head(playername)
## [1] " GARY HUA " ""
## [3] " DAKSHESH DARURI " ""
## [5] " ADITYA BAJAJ " ""
Now I will convert as numeric the column “prerating”“; at firt extract the character Ex.”R: 1553" and assig it to a variable then R allows youto convert this variable as numeric and extracted as numeric as well
test <- str_extract(tournament.res$V2[c(FALSE, TRUE, FALSE)], "R:[:blank:]+[:digit:]+")
tournament.rounds$prerating <- as.numeric(str_extract(test, "[:digit:]+"))
head(test)
## [1] "R: 1794" "R: 1553" "R: 1384" "R: 1716" "R: 1655" "R: 1686"
head(tournament.rounds$prerating)
## [1] 1794 1553 1384 1716 1655 1686
The final output file is;
tournament.rounds
## playername playerstate totnumpoints round1
## 1 GARY HUA ON 6.0 39
## 2 DAKSHESH DARURI MI 6.0 63
## 3 ADITYA BAJAJ MI 6.0 8
## 4 PATRICK H SCHILLING MI 5.5 23
## 5 HANSHI ZUO MI 5.5 45
## 6 HANSEN SONG OH 5.0 34
## 7 GARY DEE SWATHELL MI 5.0 57
## 8 EZEKIEL HOUGHTON MI 5.0 3
## 9 STEFANO LEE ON 5.0 25
## 10 ANVIT RAO MI 5.0 16
## 11 CAMERON WILLIAM MC LEMAN MI 4.5 38
## 12 KENNETH J TACK MI 4.5 42
## 13 TORRANCE HENRY JR MI 4.5 36
## 14 BRADLEY SHAW MI 4.5 54
## 15 ZACHARY JAMES HOUGHTON MI 4.5 19
## 16 MIKE NIKITIN MI 4.0 10
## 17 RONALD GRZEGORCZYK MI 4.0 48
## 18 DAVID SUNDEEN MI 4.0 47
## 19 DIPANKAR ROY MI 4.0 15
## 20 JASON ZHENG MI 4.0 40
## 21 DINH DANG BUI ON 4.0 43
## 22 EUGENE L MCCLURE MI 4.0 64
## 23 ALAN BUI ON 4.0 4
## 24 MICHAEL R ALDRICH MI 4.0 28
## 25 LOREN SCHWIEBERT MI 3.5 9
## 26 MAX ZHU ON 3.5 49
## 27 GAURAV GIDWANI MI 3.5 51
## 28 SOFIA ADINA STANESCU-BELLU MI 3.5 24
## 29 CHIEDOZIE OKORIE MI 3.5 50
## 30 GEORGE AVERY JONES ON 3.5 52
## 31 RISHI SHETTY MI 3.5 58
## 32 JOSHUA PHILIP MATHEWS ON 3.5 61
## 33 JADE GE MI 3.5 60
## 34 MICHAEL JEFFERY THOMAS MI 3.5 6
## 35 JOSHUA DAVID LEE MI 3.5 46
## 36 SIDDHARTH JHA MI 3.5 13
## 37 AMIYATOSH PWNANANDAM MI 3.5 NA
## 38 BRIAN LIU MI 3.0 11
## 39 JOEL R HENDON MI 3.0 1
## 40 FOREST ZHANG MI 3.0 20
## 41 KYLE WILLIAM MURPHY MI 3.0 59
## 42 JARED GE MI 3.0 12
## 43 ROBERT GLEN VASEY MI 3.0 21
## 44 JUSTIN D SCHILLING MI 3.0 NA
## 45 DEREK YAN MI 3.0 5
## 46 JACOB ALEXANDER LAVALLEY MI 3.0 35
## 47 ERIC WRIGHT MI 2.5 18
## 48 DANIEL KHAIN MI 2.5 17
## 49 MICHAEL J MARTIN MI 2.5 26
## 50 SHIVAM JHA MI 2.5 29
## 51 TEJAS AYYAGARI MI 2.5 27
## 52 ETHAN GUO MI 2.5 30
## 53 JOSE C YBARRA MI 2.0 NA
## 54 LARRY HODGE MI 2.0 14
## 55 ALEX KONG MI 2.0 62
## 56 MARISA RICCI MI 2.0 NA
## 57 MICHAEL LU MI 2.0 7
## 58 VIRAJ MOHILE MI 2.0 31
## 59 SEAN M MC CORMICK MI 2.0 41
## 60 JULIA SHEN MI 1.5 33
## 61 JEZZEL FARKAS ON 1.5 32
## 62 ASHWIN BALAJI MI 1.0 55
## 63 THOMAS JOSEPH HOSMER MI 1.0 2
## 64 BEN LI MI 1.0 22
## round2 round3 round4 round5 round6 round7 prerating avgrateoppon
## 1 21 18 14 7 12 4 1794 0
## 2 58 4 17 16 20 7 1553 0
## 3 61 25 21 11 13 12 1384 0
## 4 28 2 26 5 19 1 1716 0
## 5 37 12 13 4 14 17 1655 0
## 6 29 11 35 10 27 21 1686 0
## 7 46 13 11 1 9 2 1649 0
## 8 32 14 9 47 28 19 1641 0
## 9 18 59 8 26 7 20 1411 0
## 10 19 55 31 6 25 18 1365 0
## 11 56 6 7 3 34 26 1712 0
## 12 33 5 38 NA 1 3 1663 0
## 13 27 7 5 33 3 32 1666 0
## 14 44 8 1 27 5 31 1610 0
## 15 16 30 22 54 33 38 1220 0
## 16 15 NA 39 2 36 NA 1604 0
## 17 41 26 2 23 22 5 1629 0
## 18 9 1 32 19 38 10 1600 0
## 19 10 52 28 18 4 8 1564 0
## 20 49 23 41 28 2 9 1595 0
## 21 1 47 3 40 39 6 1563 0
## 22 52 28 15 NA 17 40 1555 0
## 23 43 20 58 17 37 46 1363 0
## 24 47 43 25 60 44 39 1229 0
## 25 53 3 24 34 10 47 1745 0
## 26 40 17 4 9 32 11 1579 0
## 27 13 46 37 14 6 NA 1552 0
## 28 4 22 19 20 8 36 1507 0
## 29 6 38 34 52 48 NA 1602 0
## 30 64 15 55 31 61 50 1522 0
## 31 55 64 10 30 50 14 1494 0
## 32 8 44 18 51 26 13 1441 0
## 33 12 50 36 13 15 51 1449 0
## 34 60 37 29 25 11 52 1399 0
## 35 38 56 6 57 52 48 1438 0
## 36 57 51 33 NA 16 28 1355 0
## 37 5 34 27 NA 23 61 980 0
## 38 35 29 12 NA 18 15 1423 0
## 39 54 40 16 44 21 24 1436 0
## 40 26 39 59 21 56 22 1348 0
## 41 17 58 20 NA NA NA 1403 0
## 42 50 57 60 61 64 56 1332 0
## 43 23 24 63 59 46 55 1283 0
## 44 14 32 53 39 24 59 1199 0
## 45 51 60 56 63 55 58 1242 0
## 46 7 27 50 64 43 23 377 0
## 47 24 21 61 8 51 25 1362 0
## 48 63 NA 52 NA 29 35 1382 0
## 49 20 63 64 58 NA NA 1291 0
## 50 42 33 46 NA 31 30 1056 0
## 51 45 36 57 32 47 33 1011 0
## 52 22 19 48 29 35 34 935 0
## 53 25 NA 44 NA 57 NA 1393 0
## 54 39 61 NA 15 59 64 1270 0
## 55 31 10 30 NA 45 43 1186 0
## 56 11 35 45 NA 40 42 1153 0
## 57 36 42 51 35 53 NA 1092 0
## 58 2 41 23 49 NA 45 917 0
## 59 NA 9 40 43 54 44 853 0
## 60 34 45 42 24 NA NA 967 0
## 61 3 54 47 42 30 37 955 0
## 62 NA NA NA NA NA NA 1530 0
## 63 48 49 43 45 NA NA 1175 0
## 64 30 31 49 46 42 54 1163 0
for (i in 1:64) {
tournament.rounds$round1[i] <- tournament.rounds$prerating[tournament.rounds$round1[i]]
tournament.rounds$round2[i] <- tournament.rounds$prerating[tournament.rounds$round2[i]]
tournament.rounds$round3[i] <- tournament.rounds$prerating[tournament.rounds$round3[i]]
tournament.rounds$round4[i] <- tournament.rounds$prerating[tournament.rounds$round4[i]]
tournament.rounds$round5[i] <- tournament.rounds$prerating[tournament.rounds$round5[i]]
tournament.rounds$round6[i] <- tournament.rounds$prerating[tournament.rounds$round6[i]]
tournament.rounds$round7[i] <- tournament.rounds$prerating[tournament.rounds$round7[i]]
}
for (i in 1:64) {
tournament.rounds$avgrateoppon[i] <- rowSums(tournament.rounds[i, 4:10], na.rm = TRUE) %/% 7
}
foutput <- tournament.rounds[c(1,2,3,11,12)]
names(foutput) = c("PlayerName","PlayerState","NumofPoints","Pre-Rating","AvgRatingOp")
foutput
## PlayerName PlayerState NumofPoints Pre-Rating
## 1 GARY HUA ON 6.0 1794
## 2 DAKSHESH DARURI MI 6.0 1553
## 3 ADITYA BAJAJ MI 6.0 1384
## 4 PATRICK H SCHILLING MI 5.5 1716
## 5 HANSHI ZUO MI 5.5 1655
## 6 HANSEN SONG OH 5.0 1686
## 7 GARY DEE SWATHELL MI 5.0 1649
## 8 EZEKIEL HOUGHTON MI 5.0 1641
## 9 STEFANO LEE ON 5.0 1411
## 10 ANVIT RAO MI 5.0 1365
## 11 CAMERON WILLIAM MC LEMAN MI 4.5 1712
## 12 KENNETH J TACK MI 4.5 1663
## 13 TORRANCE HENRY JR MI 4.5 1666
## 14 BRADLEY SHAW MI 4.5 1610
## 15 ZACHARY JAMES HOUGHTON MI 4.5 1220
## 16 MIKE NIKITIN MI 4.0 1604
## 17 RONALD GRZEGORCZYK MI 4.0 1629
## 18 DAVID SUNDEEN MI 4.0 1600
## 19 DIPANKAR ROY MI 4.0 1564
## 20 JASON ZHENG MI 4.0 1595
## 21 DINH DANG BUI ON 4.0 1563
## 22 EUGENE L MCCLURE MI 4.0 1555
## 23 ALAN BUI ON 4.0 1363
## 24 MICHAEL R ALDRICH MI 4.0 1229
## 25 LOREN SCHWIEBERT MI 3.5 1745
## 26 MAX ZHU ON 3.5 1579
## 27 GAURAV GIDWANI MI 3.5 1552
## 28 SOFIA ADINA STANESCU-BELLU MI 3.5 1507
## 29 CHIEDOZIE OKORIE MI 3.5 1602
## 30 GEORGE AVERY JONES ON 3.5 1522
## 31 RISHI SHETTY MI 3.5 1494
## 32 JOSHUA PHILIP MATHEWS ON 3.5 1441
## 33 JADE GE MI 3.5 1449
## 34 MICHAEL JEFFERY THOMAS MI 3.5 1399
## 35 JOSHUA DAVID LEE MI 3.5 1438
## 36 SIDDHARTH JHA MI 3.5 1355
## 37 AMIYATOSH PWNANANDAM MI 3.5 980
## 38 BRIAN LIU MI 3.0 1423
## 39 JOEL R HENDON MI 3.0 1436
## 40 FOREST ZHANG MI 3.0 1348
## 41 KYLE WILLIAM MURPHY MI 3.0 1403
## 42 JARED GE MI 3.0 1332
## 43 ROBERT GLEN VASEY MI 3.0 1283
## 44 JUSTIN D SCHILLING MI 3.0 1199
## 45 DEREK YAN MI 3.0 1242
## 46 JACOB ALEXANDER LAVALLEY MI 3.0 377
## 47 ERIC WRIGHT MI 2.5 1362
## 48 DANIEL KHAIN MI 2.5 1382
## 49 MICHAEL J MARTIN MI 2.5 1291
## 50 SHIVAM JHA MI 2.5 1056
## 51 TEJAS AYYAGARI MI 2.5 1011
## 52 ETHAN GUO MI 2.5 935
## 53 JOSE C YBARRA MI 2.0 1393
## 54 LARRY HODGE MI 2.0 1270
## 55 ALEX KONG MI 2.0 1186
## 56 MARISA RICCI MI 2.0 1153
## 57 MICHAEL LU MI 2.0 1092
## 58 VIRAJ MOHILE MI 2.0 917
## 59 SEAN M MC CORMICK MI 2.0 853
## 60 JULIA SHEN MI 1.5 967
## 61 JEZZEL FARKAS ON 1.5 955
## 62 ASHWIN BALAJI MI 1.0 1530
## 63 THOMAS JOSEPH HOSMER MI 1.0 1175
## 64 BEN LI MI 1.0 1163
## AvgRatingOp
## 1 1605
## 2 1469
## 3 1563
## 4 1573
## 5 1500
## 6 1518
## 7 1372
## 8 1468
## 9 1523
## 10 1554
## 11 1467
## 12 1291
## 13 1497
## 14 1515
## 15 1483
## 16 989
## 17 1498
## 18 1480
## 19 1426
## 20 1410
## 21 1470
## 22 1114
## 23 1213
## 24 1357
## 25 1363
## 26 1506
## 27 1047
## 28 1522
## 29 1125
## 30 1144
## 31 1259
## 32 1378
## 33 1276
## 34 1375
## 35 1149
## 36 1189
## 37 989
## 38 1319
## 39 1429
## 40 1390
## 41 713
## 42 1149
## 43 1106
## 44 1137
## 45 1152
## 46 1357
## 47 1392
## 48 968
## 49 918
## 50 1110
## 51 1356
## 52 1494
## 53 576
## 54 1033
## 55 1205
## 56 1010
## 57 1168
## 58 1192
## 59 1130
## 60 950
## 61 1327
## 62 169
## 63 964
## 64 1263
Then after all information needed is cleaned up, the final file could be save as csv file that could be exported or copy into a postgreSQL or MySQL. Note the R allows to work with dots (.) or underscore (-) in between of a variable names Ex.player.state However, postgreSQL will present this message error;
player.state in posgreSQL
ERROR: syntax error at or near "."
LINE 5: player.name character(20) NOT NULL,
If the final file will be exported to postgreSQL don’t use those separators, postgreSQL wouldn’t recognized (.),(-) as part of a variable name.
write.csv(foutput, file = "C:/Users/Dora/CUNY/DATA607-Data Acquisition/Project1/tournamentfinal.csv")