This is the Markdown file for my project1. The problem statement is as follows:

" In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players:

Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents

For the first player, the information would be:

Gary Hua, ON, 6.0, 1794, 1605

1605 was calculated by using the pre-tournament opponents’ ratings of 1436, 1563, 1600, 1610, 1649, 1663, 1716, and dividing by the total number of games played. "

A step by step solution is being provided below.

  1. Before we begin, let’s get these libraries stringr
## Warning: package 'stringr' was built under R version 3.6.3
## hash-2.2.6.1 provided by Decision Patterns
  1. Yank file into program: please note that the file couldn’t be directly scraped from below site, given in blackboard assignment.

    https://bbhosted.cuny.edu/bbcswebdav/pid-42265232-dt-content-rid-347468182_1/courses/SPS01_DATA_607_01_1199_1/tournamentinfo.txt

But it was possible to download the file. So, I downloaded the file, in my folder, and uploaded to my github. Below link is for the raw file:

https://raw.githubusercontent.com/ShovanBiswas/DATA607/master/Week04-Project1/tournamentinfo.txt

I scraped from there.

##                                                                                          V1
## 1 -----------------------------------------------------------------------------------------
## 2                                                                                     Pair 
## 3                                                                                     Num  
## 4 -----------------------------------------------------------------------------------------
## 5                                                                                        1 
## 6                                                                                       ON 
##                                  V2    V3    V4    V5    V6    V7    V8    V9
## 1                                                                            
## 2  Player Name                      Total Round Round Round Round Round Round
## 3  USCF ID / Rtg (Pre->Post)         Pts    1     2     3     4     5     6  
## 4                                                                            
## 5  GARY HUA                         6.0   W  39 W  21 W  18 W  14 W   7 D  12
## 6  15445895 / R: 1794   ->1817      N:2   W     B     W     B     W     B    
##     V10 V11
## 1        NA
## 2 Round  NA
## 3   7    NA
## 4        NA
## 5 D   4  NA
## 6 W      NA
  1. The given file has two record types, which will be read separately as separate record types, and first few records will be displayed.
##        V1                                V2    V3    V4    V5    V6    V7    V8
## 5      1   GARY HUA                         6.0   W  39 W  21 W  18 W  14 W   7
## 8      2   DAKSHESH DARURI                  6.0   W  63 W  58 L   4 W  17 W  16
## 11     3   ADITYA BAJAJ                     6.0   L   8 W  61 W  25 W  21 W  11
##       V9   V10 V11
## 5  D  12 D   4  NA
## 8  W  20 W   7  NA
## 11 W  13 W  12  NA
##        V1                                V2    V3    V4    V5    V6    V7    V8
## 6     ON   15445895 / R: 1794   ->1817      N:2   W     B     W     B     W    
## 9     MI   14598900 / R: 1553   ->1663      N:2   B     W     B     W     B    
## 12    MI   14959604 / R: 1384   ->1640      N:2   W     B     W     B     W    
##       V9   V10 V11
## 6  B     W      NA
## 9  W     B      NA
## 12 B     W      NA
  1. Creating a dataframe, with relevant columns of both records types, and thereby flattening them. This will be the base dataframe, which I’ll process.
  1. Provisioning an additional column, for containing the average Pre-ratings of each player’s opponent, who are at most 7 in number. This will be computed in the sequel.
  1. Adding column names to the dataframe.
##   Id                             Name Points R1 R2 R3 R4 R5 R6 R7 St Pre_rtg
## 1  1 GARY HUA                            6.0 39 21 18 14  7 12  4 ON    1794
## 2  2 DAKSHESH DARURI                     6.0 63 58  4 17 16 20  7 MI    1553
## 3  3 ADITYA BAJAJ                        6.0  8 61 25 21 11 13 12 MI    1384
## 4  4 PATRICK H SCHILLING                 5.5 23 28  2 26  5 19  1 MI    1716
## 5  5 HANSHI ZUO                          5.5 45 37 12 13  4 14 17 MI    1655
## 6  6 HANSEN SONG                         5.0 34 29 11 35 10 27 21 OH    1686
##   Opp_av_pre_rtg
## 1              1
## 2              0
## 3              1
## 4              0
## 5              1
## 6              0
  1. Each player played with 7 opponents, whose Id (i.e. opponents’) are stored in the columns named Round 1, 2 etc. In some cases, there is no data. In such cases, I am assuming zero, as opponent’s pre-rating–since the opponent’s Id doesn’t exist, there is no question of opponent’s pre-rating. But, those were stored as NA in the dataframe. In this step, I’ll replace the NA, with -1, which will be explained in the sequel.
##   Id                             Name Points R1 R2 R3 R4 R5 R6 R7 St Pre_rtg
## 1  1 GARY HUA                            6.0 39 21 18 14  7 12  4 ON    1794
## 2  2 DAKSHESH DARURI                     6.0 63 58  4 17 16 20  7 MI    1553
## 3  3 ADITYA BAJAJ                        6.0  8 61 25 21 11 13 12 MI    1384
## 4  4 PATRICK H SCHILLING                 5.5 23 28  2 26  5 19  1 MI    1716
## 5  5 HANSHI ZUO                          5.5 45 37 12 13  4 14 17 MI    1655
## 6  6 HANSEN SONG                         5.0 34 29 11 35 10 27 21 OH    1686
##   Opp_av_pre_rtg
## 1              1
## 2              0
## 3              1
## 4              0
## 5              1
## 6              0
  1. This explanation is important. There are 64 players, who have Id from 1 through 64, and incidentally they are sorted. So, in the present condition, 45th player’s pre-rating can be accessed with tbl.df$Pre_rtg[45]. But, in a more general situation, the players may not have Id running from 1 thorugh 64, but could be something like “A3452”, or “A3456”. The player in the 45th row could have an Id as “A3452”, and the player in the 11th row could have an Id as “A3456”. Furthermore, the data could be assorted. If so, the 45th player’s pre-rating (i.e. in 4th row) would have to be accessed as tbl.df$Pre_rtg[“A3452”]. In order to take care of such general situations, I created a dictionary (using hash), in this step. Earlier I replaced NA cases with -1. So, in the dictionary, I’ll create one (key:value) pair, as (“-1”:0). Please note that by default hash maps stores keys are character type.
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 1794 1553 1384 1716 1655 1686 1649 1641 1411 1365 1712 1663 1666 1610 1220 1604 
##   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32 
## 1629 1600 1564 1595 1563 1555 1363 1229 1745 1579 1552 1507 1602 1522 1494 1441 
##   33   34   35   36   37   38   39   40   41   42   43   44   45   46   47   48 
## 1449 1399 1438 1355  980 1423 1436 1348 1403 1332 1283 1199 1242  377 1362 1382 
##   49   50   51   52   53   54   55   56   57   58   59   60   61   62   63   64 
## 1291 1056 1011  935 1393 1270 1186 1153 1092  917  853  967  955 1530 1175 1163 
##   -1 
##    0
  1. In this step, I’ll run through the entire dataframe (record, by recrord), and process the average of opponents’ ratings. Note that I am not directly accessing the players’ pre-ratings, by the natural index, but using the disctionary’s key.
  1. Subsetting relevant columns from the dataframe.
##                                Name St Points Pre_rtg Opp_av_pre_rtg
## 1  GARY HUA                         ON    6.0    1794           1605
## 2  DAKSHESH DARURI                  MI    6.0    1553           1469
## 3  ADITYA BAJAJ                     MI    6.0    1384           1564
## 4  PATRICK H SCHILLING              MI    5.5    1716           1574
## 5  HANSHI ZUO                       MI    5.5    1655           1501
## 6  HANSEN SONG                      OH    5.0    1686           1519
## 7  GARY DEE SWATHELL                MI    5.0    1649           1372
## 8  EZEKIEL HOUGHTON                 MI    5.0    1641           1468
## 9  STEFANO LEE                      ON    5.0    1411           1523
## 10 ANVIT RAO                        MI    5.0    1365           1554
## 11 CAMERON WILLIAM MC LEMAN         MI    4.5    1712           1468
## 12 KENNETH J TACK                   MI    4.5    1663           1291
## 13 TORRANCE HENRY JR                MI    4.5    1666           1498
## 14 BRADLEY SHAW                     MI    4.5    1610           1515
## 15 ZACHARY JAMES HOUGHTON           MI    4.5    1220           1484
## 16 MIKE NIKITIN                     MI    4.0    1604            990
## 17 RONALD GRZEGORCZYK               MI    4.0    1629           1499
## 18 DAVID SUNDEEN                    MI    4.0    1600           1480
## 19 DIPANKAR ROY                     MI    4.0    1564           1426
## 20 JASON ZHENG                      MI    4.0    1595           1411
## 21 DINH DANG BUI                    ON    4.0    1563           1470
## 22 EUGENE L MCCLURE                 MI    4.0    1555           1115
## 23 ALAN BUI                         ON    4.0    1363           1214
## 24 MICHAEL R ALDRICH                MI    4.0    1229           1357
## 25 LOREN SCHWIEBERT                 MI    3.5    1745           1363
## 26 MAX ZHU                          ON    3.5    1579           1507
## 27 GAURAV GIDWANI                   MI    3.5    1552           1047
## 28 SOFIA ADINA STANESCU-BELLU       MI    3.5    1507           1522
## 29 CHIEDOZIE OKORIE                 MI    3.5    1602           1126
## 30 GEORGE AVERY JONES               ON    3.5    1522           1144
## 31 RISHI SHETTY                     MI    3.5    1494           1260
## 32 JOSHUA PHILIP MATHEWS            ON    3.5    1441           1379
## 33 JADE GE                          MI    3.5    1449           1277
## 34 MICHAEL JEFFERY THOMAS           MI    3.5    1399           1375
## 35 JOSHUA DAVID LEE                 MI    3.5    1438           1150
## 36 SIDDHARTH JHA                    MI    3.5    1355           1190
## 37 AMIYATOSH PWNANANDAM             MI    3.5     980            989
## 38 BRIAN LIU                        MI    3.0    1423           1319
## 39 JOEL R HENDON                    MI    3.0    1436           1430
## 40 FOREST ZHANG                     MI    3.0    1348           1391
## 41 KYLE WILLIAM MURPHY              MI    3.0    1403            713
## 42 JARED GE                         MI    3.0    1332           1150
## 43 ROBERT GLEN VASEY                MI    3.0    1283           1107
## 44 JUSTIN D SCHILLING               MI    3.0    1199           1137
## 45 DEREK YAN                        MI    3.0    1242           1152
## 46 JACOB ALEXANDER LAVALLEY         MI    3.0     377           1358
## 47 ERIC WRIGHT                      MI    2.5    1362           1392
## 48 DANIEL KHAIN                     MI    2.5    1382            968
## 49 MICHAEL J MARTIN                 MI    2.5    1291            918
## 50 SHIVAM JHA                       MI    2.5    1056           1111
## 51 TEJAS AYYAGARI                   MI    2.5    1011           1356
## 52 ETHAN GUO                        MI    2.5     935           1495
## 53 JOSE C YBARRA                    MI    2.0    1393            577
## 54 LARRY HODGE                      MI    2.0    1270           1034
## 55 ALEX KONG                        MI    2.0    1186           1205
## 56 MARISA RICCI                     MI    2.0    1153           1010
## 57 MICHAEL LU                       MI    2.0    1092           1168
## 58 VIRAJ MOHILE                     MI    2.0     917           1192
## 59 SEAN M MC CORMICK                MI    2.0     853           1131
## 60 JULIA SHEN                       MI    1.5     967            950
## 61 JEZZEL FARKAS                    ON    1.5     955           1327
## 62 ASHWIN BALAJI                    MI    1.0    1530            169
## 63 THOMAS JOSEPH HOSMER             MI    1.0    1175            964
## 64 BEN LI                           MI    1.0    1163           1263
  1. Writing final_tbl, as CSV file.

Marker: 607-04_p