Project 1 Data 607

Assignment

In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players:

Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents

For the first player, the information would be:

Gary Hua, ON, 6.0, 1794, 1605

1605 was calculated by using the pre-tournament opponents’ ratings of 1436, 1563, 1600, 1610, 1649, 1663, 1716, and dividing by the total number of games played.

Include the Package Stringr

In order to to this assignment we need to include the package stringr

require(stringr)

## Loading required package: stringr

Insert Table into R

In order to do that I entered the .txt file into my github repository and I read in the url.

tournament_info <- readLines("https://raw.githubusercontent.com/Luz917/tournamentinfo/master/tournamentinfo.txt", warn=FALSE)##this was needed because it gave a warning about incomplete final line

Head of the table

head(tournament_info)

## [1] "-----------------------------------------------------------------------------------------" 
## [2] " Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| "
## [3] " Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | "
## [4] "-----------------------------------------------------------------------------------------" 
## [5] "    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|" 
## [6] "   ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |"

Tail of the Table

tail(tournament_info)

## [1] "   63 | THOMAS JOSEPH HOSMER            |1.0  |L   2|L  48|D  49|L  43|L  45|H    |U    |"
## [2] "   MI | 15057092 / R: 1175   ->1125     |     |W    |B    |W    |B    |B    |     |     |"
## [3] "-----------------------------------------------------------------------------------------"
## [4] "   64 | BEN LI                          |1.0  |L  22|D  30|L  31|D  49|L  46|L  42|L  54|"
## [5] "   MI | 15006561 / R: 1163   ->1112     |     |B    |W    |W    |B    |W    |B    |B    |"
## [6] "-----------------------------------------------------------------------------------------"

Cleaning the table

We clean the table removing all lines with ——— and the column names, to get ready for the extractions.

tournament_c<-unlist(str_extract_all(tournament_info,"[:alpha:]+.{2,}"))
tournament_c<-tournament_c[c(3:130)]
head(tournament_c)

## [1] "GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|"     
## [2] "ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |"
## [3] "DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|"     
## [4] "MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |"
## [5] "ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12|"     
## [6] "MI | 14959604 / R: 1384   ->1640     |N:2  |W    |B    |W    |B    |W    |B    |W    |"

Player ID

First step is to get the Player ID. Since all the player id numbers were removed I have to input the string. I may have cleaned it too much.

player_id<-c(1:64)
player_id

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
## [47] 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64

Players Name

Next we extract all of the players names.

pname<-unlist(str_extract_all(tournament_c,"[:alpha:]+(\\s\\w+ ([:alpha:])*[:alpha:]*)"))
pname<-str_trim(pname,side = "right")##this removes the spacing 
pname

##  [1] "GARY HUA"                 "DAKSHESH DARURI"         
##  [3] "ADITYA BAJAJ"             "PATRICK H SCHILLING"     
##  [5] "HANSHI ZUO"               "HANSEN SONG"             
##  [7] "GARY DEE SWATHELL"        "EZEKIEL HOUGHTON"        
##  [9] "STEFANO LEE"              "ANVIT RAO"               
## [11] "CAMERON WILLIAM MC"       "KENNETH J TACK"          
## [13] "TORRANCE HENRY JR"        "BRADLEY SHAW"            
## [15] "ZACHARY JAMES HOUGHTON"   "MIKE NIKITIN"            
## [17] "RONALD GRZEGORCZYK"       "DAVID SUNDEEN"           
## [19] "DIPANKAR ROY"             "JASON ZHENG"             
## [21] "DINH DANG BUI"            "EUGENE L MCCLURE"        
## [23] "ALAN BUI"                 "MICHAEL R ALDRICH"       
## [25] "LOREN SCHWIEBERT"         "MAX ZHU"                 
## [27] "GAURAV GIDWANI"           "SOFIA ADINA STANESCU"    
## [29] "CHIEDOZIE OKORIE"         "GEORGE AVERY JONES"      
## [31] "RISHI SHETTY"             "JOSHUA PHILIP MATHEWS"   
## [33] "JADE GE"                  "MICHAEL JEFFERY THOMAS"  
## [35] "JOSHUA DAVID LEE"         "SIDDHARTH JHA"           
## [37] "AMIYATOSH PWNANANDAM"     "BRIAN LIU"               
## [39] "JOEL R HENDON"            "FOREST ZHANG"            
## [41] "KYLE WILLIAM MURPHY"      "JARED GE"                
## [43] "ROBERT GLEN VASEY"        "JUSTIN D SCHILLING"      
## [45] "DEREK YAN"                "JACOB ALEXANDER LAVALLEY"
## [47] "ERIC WRIGHT"              "DANIEL KHAIN"            
## [49] "MICHAEL J MARTIN"         "SHIVAM JHA"              
## [51] "TEJAS AYYAGARI"           "ETHAN GUO"               
## [53] "JOSE C YBARRA"            "LARRY HODGE"             
## [55] "ALEX KONG"                "MARISA RICCI"            
## [57] "MICHAEL LU"               "VIRAJ MOHILE"            
## [59] "SEAN M MC"                "JULIA SHEN"              
## [61] "JEZZEL FARKAS"            "ASHWIN BALAJI"           
## [63] "THOMAS JOSEPH HOSMER"     "BEN LI"

Player’s State

Next step is to extraxct the players state.

state<-unlist(str_extract_all(tournament_c,"\\b^[:alpha:]{2}\\b"))
state

##  [1] "ON" "MI" "MI" "MI" "MI" "OH" "MI" "MI" "ON" "MI" "MI" "MI" "MI" "MI"
## [15] "MI" "MI" "MI" "MI" "MI" "MI" "ON" "MI" "ON" "MI" "MI" "ON" "MI" "MI"
## [29] "MI" "ON" "MI" "ON" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI"
## [43] "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI" "MI"
## [57] "MI" "MI" "MI" "MI" "ON" "MI" "MI" "MI"

Player’s Points

Next step we extract all of the player’s points.

points<-unlist(str_extract_all(tournament_c,"[:digit:][:punct:][:digit:]"))
points

##  [1] "6.0" "6.0" "6.0" "5.5" "5.5" "5.0" "5.0" "5.0" "5.0" "5.0" "4.5"
## [12] "4.5" "4.5" "4.5" "4.5" "4.0" "4.0" "4.0" "4.0" "4.0" "4.0" "4.0"
## [23] "4.0" "4.0" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5" "3.5"
## [34] "3.5" "3.5" "3.5" "3.5" "3.0" "3.0" "3.0" "3.0" "3.0" "3.0" "3.0"
## [45] "3.0" "3.0" "2.5" "2.5" "2.5" "2.5" "2.5" "2.5" "2.0" "2.0" "2.0"
## [56] "2.0" "2.0" "2.0" "2.0" "1.5" "1.5" "1.0" "1.0" "1.0"

Player’s Pre Rating

Next we have to get the players pre rating. This is a little more complicated since we have to distinguish between the pre and the post rating and the only way to do that is to include R: in the extraction. But since some of the ratings are not four numbers or include a P and a few numbers after it I decided to do each case by step until all are 3 to four letters.

pre_rating<-unlist(str_extract_all(tournament_c,"R:\\s+[:alnum:]*"))
pre_rating<-str_replace_all(pre_rating,"R:","")##this removes the R:
pre_rating<-str_replace_all(pre_rating,"P\\d+","") ##this removes P and the numbers follewed by the P
pre_rating<-str_trim(pre_rating,side = "both")
pre_rating

##  [1] "1794" "1553" "1384" "1716" "1655" "1686" "1649" "1641" "1411" "1365"
## [11] "1712" "1663" "1666" "1610" "1220" "1604" "1629" "1600" "1564" "1595"
## [21] "1563" "1555" "1363" "1229" "1745" "1579" "1552" "1507" "1602" "1522"
## [31] "1494" "1441" "1449" "1399" "1438" "1355" "980"  "1423" "1436" "1348"
## [41] "1403" "1332" "1283" "1199" "1242" "377"  "1362" "1382" "1291" "1056"
## [51] "1011" "935"  "1393" "1270" "1186" "1153" "1092" "917"  "853"  "967" 
## [61] "955"  "1530" "1175" "1163"

Average Opponet Rating

This is the part where I get confused and was unable to calculate the average opponet rating, but I did extract the wins and losses.

wins_losses<-unlist(str_extract_all(tournament_c,"\\w \\s\\d+"))
head(wins_losses)##shows the first player

## [1] "W  39" "W  21" "W  18" "W  14" "D  12" "W  63"

tail(wins_losses)##shows the last player

## [1] "D  30" "L  31" "D  49" "L  46" "L  42" "L  54"

Create the final table

We have to put the extracted data and join them to create the columns and make a data frame.

final_table_chess<-data.frame(player_id,pname, state, points,pre_rating)
head(final_table_chess)

##   player_id               pname state points pre_rating
## 1         1            GARY HUA    ON    6.0       1794
## 2         2     DAKSHESH DARURI    MI    6.0       1553
## 3         3        ADITYA BAJAJ    MI    6.0       1384
## 4         4 PATRICK H SCHILLING    MI    5.5       1716
## 5         5          HANSHI ZUO    MI    5.5       1655
## 6         6         HANSEN SONG    OH    5.0       1686

Create the csv file.

write.csv(final_table_chess,"Final_Table_Chess.csv",row.names = FALSE)
read.csv("Final_Table_Chess.csv")

##    player_id                    pname state points pre_rating
## 1          1                 GARY HUA    ON    6.0       1794
## 2          2          DAKSHESH DARURI    MI    6.0       1553
## 3          3             ADITYA BAJAJ    MI    6.0       1384
## 4          4      PATRICK H SCHILLING    MI    5.5       1716
## 5          5               HANSHI ZUO    MI    5.5       1655
## 6          6              HANSEN SONG    OH    5.0       1686
## 7          7        GARY DEE SWATHELL    MI    5.0       1649
## 8          8         EZEKIEL HOUGHTON    MI    5.0       1641
## 9          9              STEFANO LEE    ON    5.0       1411
## 10        10                ANVIT RAO    MI    5.0       1365
## 11        11       CAMERON WILLIAM MC    MI    4.5       1712
## 12        12           KENNETH J TACK    MI    4.5       1663
## 13        13        TORRANCE HENRY JR    MI    4.5       1666
## 14        14             BRADLEY SHAW    MI    4.5       1610
## 15        15   ZACHARY JAMES HOUGHTON    MI    4.5       1220
## 16        16             MIKE NIKITIN    MI    4.0       1604
## 17        17       RONALD GRZEGORCZYK    MI    4.0       1629
## 18        18            DAVID SUNDEEN    MI    4.0       1600
## 19        19             DIPANKAR ROY    MI    4.0       1564
## 20        20              JASON ZHENG    MI    4.0       1595
## 21        21            DINH DANG BUI    ON    4.0       1563
## 22        22         EUGENE L MCCLURE    MI    4.0       1555
## 23        23                 ALAN BUI    ON    4.0       1363
## 24        24        MICHAEL R ALDRICH    MI    4.0       1229
## 25        25         LOREN SCHWIEBERT    MI    3.5       1745
## 26        26                  MAX ZHU    ON    3.5       1579
## 27        27           GAURAV GIDWANI    MI    3.5       1552
## 28        28     SOFIA ADINA STANESCU    MI    3.5       1507
## 29        29         CHIEDOZIE OKORIE    MI    3.5       1602
## 30        30       GEORGE AVERY JONES    ON    3.5       1522
## 31        31             RISHI SHETTY    MI    3.5       1494
## 32        32    JOSHUA PHILIP MATHEWS    ON    3.5       1441
## 33        33                  JADE GE    MI    3.5       1449
## 34        34   MICHAEL JEFFERY THOMAS    MI    3.5       1399
## 35        35         JOSHUA DAVID LEE    MI    3.5       1438
## 36        36            SIDDHARTH JHA    MI    3.5       1355
## 37        37     AMIYATOSH PWNANANDAM    MI    3.5        980
## 38        38                BRIAN LIU    MI    3.0       1423
## 39        39            JOEL R HENDON    MI    3.0       1436
## 40        40             FOREST ZHANG    MI    3.0       1348
## 41        41      KYLE WILLIAM MURPHY    MI    3.0       1403
## 42        42                 JARED GE    MI    3.0       1332
## 43        43        ROBERT GLEN VASEY    MI    3.0       1283
## 44        44       JUSTIN D SCHILLING    MI    3.0       1199
## 45        45                DEREK YAN    MI    3.0       1242
## 46        46 JACOB ALEXANDER LAVALLEY    MI    3.0        377
## 47        47              ERIC WRIGHT    MI    2.5       1362
## 48        48             DANIEL KHAIN    MI    2.5       1382
## 49        49         MICHAEL J MARTIN    MI    2.5       1291
## 50        50               SHIVAM JHA    MI    2.5       1056
## 51        51           TEJAS AYYAGARI    MI    2.5       1011
## 52        52                ETHAN GUO    MI    2.5        935
## 53        53            JOSE C YBARRA    MI    2.0       1393
## 54        54              LARRY HODGE    MI    2.0       1270
## 55        55                ALEX KONG    MI    2.0       1186
## 56        56             MARISA RICCI    MI    2.0       1153
## 57        57               MICHAEL LU    MI    2.0       1092
## 58        58             VIRAJ MOHILE    MI    2.0        917
## 59        59                SEAN M MC    MI    2.0        853
## 60        60               JULIA SHEN    MI    1.5        967
## 61        61            JEZZEL FARKAS    ON    1.5        955
## 62        62            ASHWIN BALAJI    MI    1.0       1530
## 63        63     THOMAS JOSEPH HOSMER    MI    1.0       1175
## 64        64                   BEN LI    MI    1.0       1163