607 Reg Ex Project 1

1. Read the Tourist information text file

library(stringr)
getwd()
## [1] "/Users/Raghu"
#Read the tourist info file
tour <-read.csv(("/Users/Raghu/tournamentinfo.txt"), header=F)
#Display the first 10 lines
head(tour,10)
##                                                                                            V1
## 1   -----------------------------------------------------------------------------------------
## 2   Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 
## 3   Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## 4   -----------------------------------------------------------------------------------------
## 5       1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## 6      ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |
## 7   -----------------------------------------------------------------------------------------
## 8       2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|
## 9      MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |
## 10  -----------------------------------------------------------------------------------------
#Display the last 10 lines
tail(tour,10)
##                                                                                            V1
## 187 -----------------------------------------------------------------------------------------
## 188    62 | ASHWIN BALAJI                   |1.0  |W  55|U    |U    |U    |U    |U    |U    |
## 189    MI | 15219542 / R: 1530   ->1535     |     |B    |     |     |     |     |     |     |
## 190 -----------------------------------------------------------------------------------------
## 191    63 | THOMAS JOSEPH HOSMER            |1.0  |L   2|L  48|D  49|L  43|L  45|H    |U    |
## 192    MI | 15057092 / R: 1175   ->1125     |     |W    |B    |W    |B    |B    |     |     |
## 193 -----------------------------------------------------------------------------------------
## 194    64 | BEN LI                          |1.0  |L  22|D  30|L  31|D  49|L  46|L  42|L  54|
## 195    MI | 15006561 / R: 1163   ->1112     |     |B    |W    |W    |B    |W    |B    |B    |
## 196 -----------------------------------------------------------------------------------------

2. Create a list by removing the column names and “—” lines above and below column names.

tour <- tour[-c(1:4),]
head(tour)
## [1]     1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## [2]    ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |
## [3] -----------------------------------------------------------------------------------------
## [4]     2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|
## [5]    MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |
## [6] -----------------------------------------------------------------------------------------
## 131 Levels:     1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4| ...

3. Read the number lines list (that has player information) separately and rating list separately.

player <- tour[seq(1, length(tour), 3)]
player
##  [1]     1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
##  [2]     2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|
##  [3]     3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12|
##  [4]     4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1|
##  [5]     5 | HANSHI ZUO                      |5.5  |W  45|W  37|D  12|D  13|D   4|W  14|W  17|
##  [6]     6 | HANSEN SONG                     |5.0  |W  34|D  29|L  11|W  35|D  10|W  27|W  21|
##  [7]     7 | GARY DEE SWATHELL               |5.0  |W  57|W  46|W  13|W  11|L   1|W   9|L   2|
##  [8]     8 | EZEKIEL HOUGHTON                |5.0  |W   3|W  32|L  14|L   9|W  47|W  28|W  19|
##  [9]     9 | STEFANO LEE                     |5.0  |W  25|L  18|W  59|W   8|W  26|L   7|W  20|
## [10]    10 | ANVIT RAO                       |5.0  |D  16|L  19|W  55|W  31|D   6|W  25|W  18|
## [11]    11 | CAMERON WILLIAM MC LEMAN        |4.5  |D  38|W  56|W   6|L   7|L   3|W  34|W  26|
## [12]    12 | KENNETH J TACK                  |4.5  |W  42|W  33|D   5|W  38|H    |D   1|L   3|
## [13]    13 | TORRANCE HENRY JR               |4.5  |W  36|W  27|L   7|D   5|W  33|L   3|W  32|
## [14]    14 | BRADLEY SHAW                    |4.5  |W  54|W  44|W   8|L   1|D  27|L   5|W  31|
## [15]    15 | ZACHARY JAMES HOUGHTON          |4.5  |D  19|L  16|W  30|L  22|W  54|W  33|W  38|
## [16]    16 | MIKE NIKITIN                    |4.0  |D  10|W  15|H    |W  39|L   2|W  36|U    |
## [17]    17 | RONALD GRZEGORCZYK              |4.0  |W  48|W  41|L  26|L   2|W  23|W  22|L   5|
## [18]    18 | DAVID SUNDEEN                   |4.0  |W  47|W   9|L   1|W  32|L  19|W  38|L  10|
## [19]    19 | DIPANKAR ROY                    |4.0  |D  15|W  10|W  52|D  28|W  18|L   4|L   8|
## [20]    20 | JASON ZHENG                     |4.0  |L  40|W  49|W  23|W  41|W  28|L   2|L   9|
## [21]    21 | DINH DANG BUI                   |4.0  |W  43|L   1|W  47|L   3|W  40|W  39|L   6|
## [22]    22 | EUGENE L MCCLURE                |4.0  |W  64|D  52|L  28|W  15|H    |L  17|W  40|
## [23]    23 | ALAN BUI                        |4.0  |L   4|W  43|L  20|W  58|L  17|W  37|W  46|
## [24]    24 | MICHAEL R ALDRICH               |4.0  |L  28|L  47|W  43|L  25|W  60|W  44|W  39|
## [25]    25 | LOREN SCHWIEBERT                |3.5  |L   9|W  53|L   3|W  24|D  34|L  10|W  47|
## [26]    26 | MAX ZHU                         |3.5  |W  49|W  40|W  17|L   4|L   9|D  32|L  11|
## [27]    27 | GAURAV GIDWANI                  |3.5  |W  51|L  13|W  46|W  37|D  14|L   6|U    |
## [28]    28 | SOFIA ADINA STANESCU-BELLU      |3.5  |W  24|D   4|W  22|D  19|L  20|L   8|D  36|
## [29]    29 | CHIEDOZIE OKORIE                |3.5  |W  50|D   6|L  38|L  34|W  52|W  48|U    |
## [30]    30 | GEORGE AVERY JONES              |3.5  |L  52|D  64|L  15|W  55|L  31|W  61|W  50|
## [31]    31 | RISHI SHETTY                    |3.5  |L  58|D  55|W  64|L  10|W  30|W  50|L  14|
## [32]    32 | JOSHUA PHILIP MATHEWS           |3.5  |W  61|L   8|W  44|L  18|W  51|D  26|L  13|
## [33]    33 | JADE GE                         |3.5  |W  60|L  12|W  50|D  36|L  13|L  15|W  51|
## [34]    34 | MICHAEL JEFFERY THOMAS          |3.5  |L   6|W  60|L  37|W  29|D  25|L  11|W  52|
## [35]    35 | JOSHUA DAVID LEE                |3.5  |L  46|L  38|W  56|L   6|W  57|D  52|W  48|
## [36]    36 | SIDDHARTH JHA                   |3.5  |L  13|W  57|W  51|D  33|H    |L  16|D  28|
## [37]    37 | AMIYATOSH PWNANANDAM            |3.5  |B    |L   5|W  34|L  27|H    |L  23|W  61|
## [38]    38 | BRIAN LIU                       |3.0  |D  11|W  35|W  29|L  12|H    |L  18|L  15|
## [39]    39 | JOEL R HENDON                   |3.0  |L   1|W  54|W  40|L  16|W  44|L  21|L  24|
## [40]    40 | FOREST ZHANG                    |3.0  |W  20|L  26|L  39|W  59|L  21|W  56|L  22|
## [41]    41 | KYLE WILLIAM MURPHY             |3.0  |W  59|L  17|W  58|L  20|X    |U    |U    |
## [42]    42 | JARED GE                        |3.0  |L  12|L  50|L  57|D  60|D  61|W  64|W  56|
## [43]    43 | ROBERT GLEN VASEY               |3.0  |L  21|L  23|L  24|W  63|W  59|L  46|W  55|
## [44]    44 | JUSTIN D SCHILLING              |3.0  |B    |L  14|L  32|W  53|L  39|L  24|W  59|
## [45]    45 | DEREK YAN                       |3.0  |L   5|L  51|D  60|L  56|W  63|D  55|W  58|
## [46]    46 | JACOB ALEXANDER LAVALLEY        |3.0  |W  35|L   7|L  27|L  50|W  64|W  43|L  23|
## [47]    47 | ERIC WRIGHT                     |2.5  |L  18|W  24|L  21|W  61|L   8|D  51|L  25|
## [48]    48 | DANIEL KHAIN                    |2.5  |L  17|W  63|H    |D  52|H    |L  29|L  35|
## [49]    49 | MICHAEL J MARTIN                |2.5  |L  26|L  20|D  63|D  64|W  58|H    |U    |
## [50]    50 | SHIVAM JHA                      |2.5  |L  29|W  42|L  33|W  46|H    |L  31|L  30|
## [51]    51 | TEJAS AYYAGARI                  |2.5  |L  27|W  45|L  36|W  57|L  32|D  47|L  33|
## [52]    52 | ETHAN GUO                       |2.5  |W  30|D  22|L  19|D  48|L  29|D  35|L  34|
## [53]    53 | JOSE C YBARRA                   |2.0  |H    |L  25|H    |L  44|U    |W  57|U    |
## [54]    54 | LARRY HODGE                     |2.0  |L  14|L  39|L  61|B    |L  15|L  59|W  64|
## [55]    55 | ALEX KONG                       |2.0  |L  62|D  31|L  10|L  30|B    |D  45|L  43|
## [56]    56 | MARISA RICCI                    |2.0  |H    |L  11|L  35|W  45|H    |L  40|L  42|
## [57]    57 | MICHAEL LU                      |2.0  |L   7|L  36|W  42|L  51|L  35|L  53|B    |
## [58]    58 | VIRAJ MOHILE                    |2.0  |W  31|L   2|L  41|L  23|L  49|B    |L  45|
## [59]    59 | SEAN M MC CORMICK               |2.0  |L  41|B    |L   9|L  40|L  43|W  54|L  44|
## [60]    60 | JULIA SHEN                      |1.5  |L  33|L  34|D  45|D  42|L  24|H    |U    |
## [61]    61 | JEZZEL FARKAS                   |1.5  |L  32|L   3|W  54|L  47|D  42|L  30|L  37|
## [62]    62 | ASHWIN BALAJI                   |1.0  |W  55|U    |U    |U    |U    |U    |U    |
## [63]    63 | THOMAS JOSEPH HOSMER            |1.0  |L   2|L  48|D  49|L  43|L  45|H    |U    |
## [64]    64 | BEN LI                          |1.0  |L  22|D  30|L  31|D  49|L  46|L  42|L  54|
## 131 Levels:     1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4| ...
length(player)
## [1] 64
rating <- tour[seq(2, length(tour), 3)]
length(rating)
## [1] 64

4. Extract the required column names by identifying the pattern from the string.

pairNo <- as.integer(str_extract(player, "\\d+"))  #matches the digits
Name <- str_trim(str_extract(player, "(\\w+\\s){2,3}"))   #matches the word
Points <- as.numeric(str_extract(player, "\\d+\\.\\d+"))  #matches the digits with a dot.
Opponents <- str_extract_all(str_extract_all(player, "\\d+\\|"), "\\d+") #matches digits with pipe and pick only the digits which is the id of opponent.

Won <- str_count(player, "\\Q|W  \\E")   #count of W
Loose <- str_count(player, "\\Q|L  \\E")  #count of L
Draw <- str_count(player, "\\Q|D  \\E")   #count of D

State <- str_extract(rating, "\\w+")  
Ratings <- as.integer(str_extract(str_extract(rating, "[^\\d]\\d{3,4}[^\\d]"
                                             ), "\\d+"))  # identify the string that has numbers with either 3 or 4 digits and char concatenated. pick only the first set of digits.

5. Group the required fields. calculate the mean rating.

mRating <- length(player)
mRating
## [1] 64
for (i in 1:length(player)) { 
  mRating[i] <- round(mean(Ratings[as.numeric(unlist(Opponents[pairNo[i]]))]), digits = 0);  
                            }
opData <- data.frame(Name, State, Points,Ratings,mRating,  Won, Loose, Draw);

colnames(opData) <- c("Player's Name", "State", "Total Points", "Player's Pre-Rating", " Average Pre Chess Rating of Opponents", "Won", "Lost", "Draw")
knitr::kable(opData)
Player’s Name State Total Points Player’s Pre-Rating Average Pre Chess Rating of Opponents Won Lost Draw
GARY HUA ON 6.0 1794 1605 5 0 2
DAKSHESH DARURI MI 6.0 1553 1469 6 1 0
ADITYA BAJAJ MI 6.0 1384 1564 6 1 0
PATRICK H SCHILLING MI 5.5 1716 1574 4 0 3
HANSHI ZUO MI 5.5 1655 1501 4 0 3
HANSEN SONG OH 5.0 1686 1519 4 1 2
GARY DEE SWATHELL MI 5.0 1649 1372 5 2 0
EZEKIEL HOUGHTON MI 5.0 1641 1468 5 2 0
STEFANO LEE ON 5.0 1411 1523 5 2 0
ANVIT RAO MI 5.0 1365 1554 4 1 2
CAMERON WILLIAM MC MI 4.5 1712 1468 4 2 1
KENNETH J TACK MI 4.5 1663 1506 3 1 2
TORRANCE HENRY JR MI 4.5 1666 1498 4 2 1
BRADLEY SHAW MI 4.5 1610 1515 4 2 1
ZACHARY JAMES HOUGHTON MI 4.5 1220 1484 4 2 1
MIKE NIKITIN MI 4.0 1604 1386 3 1 1
RONALD GRZEGORCZYK MI 4.0 1629 1499 4 3 0
DAVID SUNDEEN MI 4.0 1600 1480 4 3 0
DIPANKAR ROY MI 4.0 1564 1426 3 2 2
JASON ZHENG MI 4.0 1595 1411 4 3 0
DINH DANG BUI ON 4.0 1563 1470 4 3 0
EUGENE L MCCLURE MI 4.0 1555 1300 3 2 1
ALAN BUI ON 4.0 1363 1214 4 3 0
MICHAEL R ALDRICH MI 4.0 1229 1357 4 3 0
LOREN SCHWIEBERT MI 3.5 1745 1363 3 3 1
MAX ZHU ON 3.5 1579 1507 3 3 1
GAURAV GIDWANI MI 3.5 1552 1222 3 2 1
SOFIA ADINA MI 3.5 1507 1522 2 2 3
CHIEDOZIE OKORIE MI 3.5 1602 1314 3 2 1
GEORGE AVERY JONES ON 3.5 1522 1144 3 3 1
RISHI SHETTY MI 3.5 1494 1260 3 3 1
JOSHUA PHILIP MATHEWS ON 3.5 1441 1379 3 3 1
JADE GE MI 3.5 1449 1277 3 3 1
MICHAEL JEFFERY THOMAS MI 3.5 1399 1375 3 3 1
JOSHUA DAVID LEE MI 3.5 1438 1150 3 3 1
SIDDHARTH JHA MI 3.5 1355 1388 2 2 2
AMIYATOSH PWNANANDAM MI 3.5 980 1385 2 3 0
BRIAN LIU MI 3.0 1423 1539 2 3 1
JOEL R HENDON MI 3.0 1436 1430 3 4 0
FOREST ZHANG MI 3.0 1348 1391 3 4 0
KYLE WILLIAM MURPHY MI 3.0 1403 1248 2 2 0
JARED GE MI 3.0 1332 1150 2 3 2
ROBERT GLEN VASEY MI 3.0 1283 1107 3 4 0
JUSTIN D SCHILLING MI 3.0 1199 1327 2 4 0
DEREK YAN MI 3.0 1242 1152 2 3 2
JACOB ALEXANDER LAVALLEY MI 3.0 377 1358 3 4 0
ERIC WRIGHT MI 2.5 1362 1392 2 4 1
DANIEL KHAIN MI 2.5 1382 1356 1 3 1
MICHAEL J MARTIN MI 2.5 1291 1286 1 2 2
SHIVAM JHA MI 2.5 1056 1296 2 4 0
TEJAS AYYAGARI MI 2.5 1011 1356 2 4 1
ETHAN GUO MI 2.5 935 1495 1 3 3
JOSE C YBARRA MI 2.0 1393 1345 1 2 0
LARRY HODGE MI 2.0 1270 1206 1 5 0
ALEX KONG MI 2.0 1186 1406 0 4 2
MARISA RICCI MI 2.0 1153 1414 1 4 0
MICHAEL LU MI 2.0 1092 1363 1 5 0
VIRAJ MOHILE MI 2.0 917 1391 1 5 0
SEAN M MC MI 2.0 853 1319 1 5 0
JULIA SHEN MI 1.5 967 1330 0 3 2
JEZZEL FARKAS ON 1.5 955 1327 1 5 1
ASHWIN BALAJI MI 1.0 1530 1186 1 0 0
THOMAS JOSEPH HOSMER MI 1.0 1175 1350 0 4 1
BEN LI MI 1.0 1163 1263 0 5 2

6. Write the data into csv file

   write.csv( opData, "/Users/Raghu/tour.csv")