In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players:
Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents
For the first player, the information would be:
Gary Hua, ON, 6.0, 1794, 1605
1605 was calculated by using the pre-tournament opponents’ ratings of 1436, 1563, 1600, 1610, 1649, 1663, 1716, and dividing by the total number of games played.
In order to to this assignment we need to include the package stringr
require(stringr)
## Loading required package: stringr
In order to do that I entered the .txt file into my github repository and I read in the url.
tournament_info <- readLines("https://raw.githubusercontent.com/Luz917/tournamentinfo/master/tournamentinfo.txt", warn=FALSE)##this was needed because it gave a warning about incomplete final line
Head of the table
head(tournament_info)
## [1] "-----------------------------------------------------------------------------------------"
## [2] " Pair | Player Name |Total|Round|Round|Round|Round|Round|Round|Round| "
## [3] " Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 | "
## [4] "-----------------------------------------------------------------------------------------"
## [5] " 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|"
## [6] " ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |"
Tail of the Table
tail(tournament_info)
## [1] " 63 | THOMAS JOSEPH HOSMER |1.0 |L 2|L 48|D 49|L 43|L 45|H |U |"
## [2] " MI | 15057092 / R: 1175 ->1125 | |W |B |W |B |B | | |"
## [3] "-----------------------------------------------------------------------------------------"
## [4] " 64 | BEN LI |1.0 |L 22|D 30|L 31|D 49|L 46|L 42|L 54|"
## [5] " MI | 15006561 / R: 1163 ->1112 | |B |W |W |B |W |B |B |"
## [6] "-----------------------------------------------------------------------------------------"
We clean the table removing all lines with ——— and the column names, to get ready for the extractions.
tournament_c<-unlist(str_extract_all(tournament_info,"[:alpha:]+.{2,}"))
tournament_c<-tournament_c[c(3:130)]
head(tournament_c)
## [1] "GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|"
## [2] "ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |"
## [3] "DAKSHESH DARURI |6.0 |W 63|W 58|L 4|W 17|W 16|W 20|W 7|"
## [4] "MI | 14598900 / R: 1553 ->1663 |N:2 |B |W |B |W |B |W |B |"
## [5] "ADITYA BAJAJ |6.0 |L 8|W 61|W 25|W 21|W 11|W 13|W 12|"
## [6] "MI | 14959604 / R: 1384 ->1640 |N:2 |W |B |W |B |W |B |W |"
First step is to get the Player ID. Since all the player id numbers were removed I have to input the string. I may have cleaned it too much.
player_id<-c(1:64)
player_id
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
## [47] 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
Next we extract all of the players names.
pname<-unlist(str_extract_all(tournament_c,"[:alpha:]+(\\s\\w+ ([:alpha:])*[:alpha:]*)"))
pname<-str_trim(pname,side = "right")##this removes the spacing
pname
## [1] "GARY HUA" "DAKSHESH DARURI"
## [3] "ADITYA BAJAJ" "PATRICK H SCHILLING"
## [5] "HANSHI ZUO" "HANSEN SONG"
## [7] "GARY DEE SWATHELL" "EZEKIEL HOUGHTON"
## [9] "STEFANO LEE" "ANVIT RAO"
## [11] "CAMERON WILLIAM MC" "KENNETH J TACK"
## [13] "TORRANCE HENRY JR" "BRADLEY SHAW"
## [15] "ZACHARY JAMES HOUGHTON" "MIKE NIKITIN"
## [17] "RONALD GRZEGORCZYK" "DAVID SUNDEEN"
## [19] "DIPANKAR ROY" "JASON ZHENG"
## [21] "DINH DANG BUI" "EUGENE L MCCLURE"
## [23] "ALAN BUI" "MICHAEL R ALDRICH"
## [25] "LOREN SCHWIEBERT" "MAX ZHU"
## [27] "GAURAV GIDWANI" "SOFIA ADINA STANESCU"
## [29] "CHIEDOZIE OKORIE" "GEORGE AVERY JONES"
## [31] "RISHI SHETTY" "JOSHUA PHILIP MATHEWS"
## [33] "JADE GE" "MICHAEL JEFFERY THOMAS"
## [35] "JOSHUA DAVID LEE" "SIDDHARTH JHA"
## [37] "AMIYATOSH PWNANANDAM" "BRIAN LIU"
## [39] "JOEL R HENDON" "FOREST ZHANG"
## [41] "KYLE WILLIAM MURPHY" "JARED GE"
## [43] "ROBERT GLEN VASEY" "JUSTIN D SCHILLING"
## [45] "DEREK YAN" "JACOB ALEXANDER LAVALLEY"
## [47] "ERIC WRIGHT" "DANIEL KHAIN"
## [49] "MICHAEL J MARTIN" "SHIVAM JHA"
## [51] "TEJAS AYYAGARI" "ETHAN GUO"
## [53] "JOSE C YBARRA" "LARRY HODGE"
## [55] "ALEX KONG" "MARISA RICCI"
## [57] "MICHAEL LU" "VIRAJ MOHILE"
## [59] "SEAN M MC" "JULIA SHEN"
## [61] "JEZZEL FARKAS" "ASHWIN BALAJI"
## [63] "THOMAS JOSEPH HOSMER" "BEN LI"
Next step is to extraxct the players state.
state<-unlist(str_extract_all(tournament_c,"\\b^[:alpha:]{2}\\b"))
state
## [1] "ON" "MI" "MI" "MI" "MI" "OH" "MI" "MI" "ON" "MI" "MI" "MI" "MI" "MI"
## [15] "MI" "MI" "MI" "MI" "MI" "MI" "ON" "MI" "ON" "MI" "MI" "ON" "MI" "MI"
## [29] "MI" "ON" "MI" "ON" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI"
## [43] "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI"
## [57] "MI" "MI" "MI" "MI" "ON" "MI" "MI" "MI"
Next step we extract all of the player’s points.
points<-unlist(str_extract_all(tournament_c,"[:digit:][:punct:][:digit:]"))
points
## [1] "6.0" "6.0" "6.0" "5.5" "5.5" "5.0" "5.0" "5.0" "5.0" "5.0" "4.5"
## [12] "4.5" "4.5" "4.5" "4.5" "4.0" "4.0" "4.0" "4.0" "4.0" "4.0" "4.0"
## [23] "4.0" "4.0" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5"
## [34] "3.5" "3.5" "3.5" "3.5" "3.0" "3.0" "3.0" "3.0" "3.0" "3.0" "3.0"
## [45] "3.0" "3.0" "2.5" "2.5" "2.5" "2.5" "2.5" "2.5" "2.0" "2.0" "2.0"
## [56] "2.0" "2.0" "2.0" "2.0" "1.5" "1.5" "1.0" "1.0" "1.0"
Next we have to get the players pre rating. This is a little more complicated since we have to distinguish between the pre and the post rating and the only way to do that is to include R: in the extraction. But since some of the ratings are not four numbers or include a P and a few numbers after it I decided to do each case by step until all are 3 to four letters.
pre_rating<-unlist(str_extract_all(tournament_c,"R:\\s+[:alnum:]*"))
pre_rating<-str_replace_all(pre_rating,"R:","")##this removes the R:
pre_rating<-str_replace_all(pre_rating,"P\\d+","") ##this removes P and the numbers follewed by the P
pre_rating<-str_trim(pre_rating,side = "both")
pre_rating
## [1] "1794" "1553" "1384" "1716" "1655" "1686" "1649" "1641" "1411" "1365"
## [11] "1712" "1663" "1666" "1610" "1220" "1604" "1629" "1600" "1564" "1595"
## [21] "1563" "1555" "1363" "1229" "1745" "1579" "1552" "1507" "1602" "1522"
## [31] "1494" "1441" "1449" "1399" "1438" "1355" "980" "1423" "1436" "1348"
## [41] "1403" "1332" "1283" "1199" "1242" "377" "1362" "1382" "1291" "1056"
## [51] "1011" "935" "1393" "1270" "1186" "1153" "1092" "917" "853" "967"
## [61] "955" "1530" "1175" "1163"
This is the part where I get confused and was unable to calculate the average opponet rating, but I did extract the wins and losses.
wins_losses<-unlist(str_extract_all(tournament_c,"\\w \\s\\d+"))
head(wins_losses)##shows the first player
## [1] "W 39" "W 21" "W 18" "W 14" "D 12" "W 63"
tail(wins_losses)##shows the last player
## [1] "D 30" "L 31" "D 49" "L 46" "L 42" "L 54"
We have to put the extracted data and join them to create the columns and make a data frame.
final_table_chess<-data.frame(player_id,pname, state, points,pre_rating)
head(final_table_chess)
## player_id pname state points pre_rating
## 1 1 GARY HUA ON 6.0 1794
## 2 2 DAKSHESH DARURI MI 6.0 1553
## 3 3 ADITYA BAJAJ MI 6.0 1384
## 4 4 PATRICK H SCHILLING MI 5.5 1716
## 5 5 HANSHI ZUO MI 5.5 1655
## 6 6 HANSEN SONG OH 5.0 1686
write.csv(final_table_chess,"Final_Table_Chess.csv",row.names = FALSE)
read.csv("Final_Table_Chess.csv")
## player_id pname state points pre_rating
## 1 1 GARY HUA ON 6.0 1794
## 2 2 DAKSHESH DARURI MI 6.0 1553
## 3 3 ADITYA BAJAJ MI 6.0 1384
## 4 4 PATRICK H SCHILLING MI 5.5 1716
## 5 5 HANSHI ZUO MI 5.5 1655
## 6 6 HANSEN SONG OH 5.0 1686
## 7 7 GARY DEE SWATHELL MI 5.0 1649
## 8 8 EZEKIEL HOUGHTON MI 5.0 1641
## 9 9 STEFANO LEE ON 5.0 1411
## 10 10 ANVIT RAO MI 5.0 1365
## 11 11 CAMERON WILLIAM MC MI 4.5 1712
## 12 12 KENNETH J TACK MI 4.5 1663
## 13 13 TORRANCE HENRY JR MI 4.5 1666
## 14 14 BRADLEY SHAW MI 4.5 1610
## 15 15 ZACHARY JAMES HOUGHTON MI 4.5 1220
## 16 16 MIKE NIKITIN MI 4.0 1604
## 17 17 RONALD GRZEGORCZYK MI 4.0 1629
## 18 18 DAVID SUNDEEN MI 4.0 1600
## 19 19 DIPANKAR ROY MI 4.0 1564
## 20 20 JASON ZHENG MI 4.0 1595
## 21 21 DINH DANG BUI ON 4.0 1563
## 22 22 EUGENE L MCCLURE MI 4.0 1555
## 23 23 ALAN BUI ON 4.0 1363
## 24 24 MICHAEL R ALDRICH MI 4.0 1229
## 25 25 LOREN SCHWIEBERT MI 3.5 1745
## 26 26 MAX ZHU ON 3.5 1579
## 27 27 GAURAV GIDWANI MI 3.5 1552
## 28 28 SOFIA ADINA STANESCU MI 3.5 1507
## 29 29 CHIEDOZIE OKORIE MI 3.5 1602
## 30 30 GEORGE AVERY JONES ON 3.5 1522
## 31 31 RISHI SHETTY MI 3.5 1494
## 32 32 JOSHUA PHILIP MATHEWS ON 3.5 1441
## 33 33 JADE GE MI 3.5 1449
## 34 34 MICHAEL JEFFERY THOMAS MI 3.5 1399
## 35 35 JOSHUA DAVID LEE MI 3.5 1438
## 36 36 SIDDHARTH JHA MI 3.5 1355
## 37 37 AMIYATOSH PWNANANDAM MI 3.5 980
## 38 38 BRIAN LIU MI 3.0 1423
## 39 39 JOEL R HENDON MI 3.0 1436
## 40 40 FOREST ZHANG MI 3.0 1348
## 41 41 KYLE WILLIAM MURPHY MI 3.0 1403
## 42 42 JARED GE MI 3.0 1332
## 43 43 ROBERT GLEN VASEY MI 3.0 1283
## 44 44 JUSTIN D SCHILLING MI 3.0 1199
## 45 45 DEREK YAN MI 3.0 1242
## 46 46 JACOB ALEXANDER LAVALLEY MI 3.0 377
## 47 47 ERIC WRIGHT MI 2.5 1362
## 48 48 DANIEL KHAIN MI 2.5 1382
## 49 49 MICHAEL J MARTIN MI 2.5 1291
## 50 50 SHIVAM JHA MI 2.5 1056
## 51 51 TEJAS AYYAGARI MI 2.5 1011
## 52 52 ETHAN GUO MI 2.5 935
## 53 53 JOSE C YBARRA MI 2.0 1393
## 54 54 LARRY HODGE MI 2.0 1270
## 55 55 ALEX KONG MI 2.0 1186
## 56 56 MARISA RICCI MI 2.0 1153
## 57 57 MICHAEL LU MI 2.0 1092
## 58 58 VIRAJ MOHILE MI 2.0 917
## 59 59 SEAN M MC MI 2.0 853
## 60 60 JULIA SHEN MI 1.5 967
## 61 61 JEZZEL FARKAS ON 1.5 955
## 62 62 ASHWIN BALAJI MI 1.0 1530
## 63 63 THOMAS JOSEPH HOSMER MI 1.0 1175
## 64 64 BEN LI MI 1.0 1163