Instructions

In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players: Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents For the first player, the information would be: Gary Hua, ON, 6.0, 1794, 1605 1605 was calculated by using the pre-tournament opponents’ ratings of 1436, 1563, 1600, 1610, 1649, 1663, 1716, and dividing by the total number of games played.

If you have questions about the meaning of the data or the results, please post them on the discussion forum. Data science, like chess, is a game of back and forth…

The chess rating system (invented by a Minnesota statistician named Arpad Elo) has been used in many other contexts, including assessing relative strength of employment candidates by human resource departments.

You may substitute another text file (or set of text files, or data scraped from web pages) of similar or greater complexity, and create your own assignment and solution. You may work in a small team. All of your code should be in an R markdown file (and published to rpubs.com); with your data accessible for the person running the script.

Read data from URL file by using “read.delim” function.

I experienced problems using “read.table” function hense I used the “read.delim” function.

url <- 'https://raw.githubusercontent.com/dvillalobos/MSDA/master/607/Projects/Project1/tournamentinfo.txt'
my.data <- read.delim(url, header=FALSE, stringsAsFactors =FALSE )
head(my.data)
##                                                                                           V1
## 1  -----------------------------------------------------------------------------------------
## 2  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 
## 3  Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## 4  -----------------------------------------------------------------------------------------
## 5      1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## 6     ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |

Clean line dividers

# Cleaning extra lines and eliminating Empty lines in between
head(my.data)
##                                                                                           V1
## 1  -----------------------------------------------------------------------------------------
## 2  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 
## 3  Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## 4  -----------------------------------------------------------------------------------------
## 5      1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## 6     ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |
split_data <- data.frame(str_replace_all(my.data$V1,"-",""))
head(split_data)
##                                                         str_replace_all.my.data.V1..........
## 1                                                                                           
## 2  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 
## 3   Num  | USCF ID / Rtg (Pre>Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## 4                                                                                           
## 5      1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## 6      ON | 15445895 / R: 1794   >1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |
# Deleting empty lines
split_data <- data.frame(split_data[!apply(split_data == "", 1, all),])

Combining two consecutive rows into one column

# Need to define an empty new_dataframe
new_table <- data.frame(c())
# Combining two consecutive rows into one column
for (i in 1:dim(split_data)[1]){
  if (i %% 2 == 1) {
    Part1 <- rbind(new_table$Part1, as.character(split_data[i,1]))
    Part2 <- as.character(split_data[i+1,1])
    Combined <- data.frame(paste0(Part1, Part2))
    names(Combined) <- "Combined"
    new_table <- rbind(new_table, Combined)
  } 
}
head(new_table)
##                                                                                                                                                                              Combined
## 1  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round|  Num  | USCF ID / Rtg (Pre>Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## 2       1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|   ON | 15445895 / R: 1794   >1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |
## 3       2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|   MI | 14598900 / R: 1553   >1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |
## 4       3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12|   MI | 14959604 / R: 1384   >1640     |N:2  |W    |B    |W    |B    |W    |B    |W    |
## 5       4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1|   MI | 12616049 / R: 1716   >1744     |N:2  |W    |B    |W    |B    |W    |B    |B    |
## 6       5 | HANSHI ZUO                      |5.5  |W  45|W  37|D  12|D  13|D   4|W  14|W  17|   MI | 14601533 / R: 1655   >1690     |N:2  |B    |W    |B    |W    |B    |W    |B    |

Spliting one more column for “|” separator

# Creating headers
Headers <- c("Pair","Player Name","Total","Round 1","Round 2","Round 3", "Round 4","Round 5","Round 6","Round 7","State","USCF ID / Rtg (Pre->Post)","Pts","1","2","3","4","5","6","7","Ave Pre Rating")
# Separating into columns
newer_table <- separate(data = new_table, col = Combined, into = Headers, sep = "\\|")
# Need to clear row containing all columns names
newer_table <- newer_table[-c(1), ]

head(newer_table)
##     Pair                       Player Name Total Round 1 Round 2 Round 3
## 2     1   GARY HUA                         6.0     W  39   W  21   W  18
## 3     2   DAKSHESH DARURI                  6.0     W  63   W  58   L   4
## 4     3   ADITYA BAJAJ                     6.0     L   8   W  61   W  25
## 5     4   PATRICK H SCHILLING              5.5     W  23   D  28   W   2
## 6     5   HANSHI ZUO                       5.5     W  45   W  37   D  12
## 7     6   HANSEN SONG                      5.0     W  34   D  29   L  11
##   Round 4 Round 5 Round 6 Round 7  State        USCF ID / Rtg (Pre->Post)
## 2   W  14   W   7   D  12   D   4    ON   15445895 / R: 1794   >1817     
## 3   W  17   W  16   W  20   W   7    MI   14598900 / R: 1553   >1663     
## 4   W  21   W  11   W  13   W  12    MI   14959604 / R: 1384   >1640     
## 5   W  26   D   5   W  19   D   1    MI   12616049 / R: 1716   >1744     
## 6   D  13   D   4   W  14   W  17    MI   14601533 / R: 1655   >1690     
## 7   W  35   D  10   W  27   W  21    OH   15055204 / R: 1686   >1687     
##     Pts     1     2     3     4     5     6     7 Ave Pre Rating
## 2 N:2   W     B     W     B     W     B     W                   
## 3 N:2   B     W     B     W     B     W     B                   
## 4 N:2   W     B     W     B     W     B     W                   
## 5 N:2   W     B     W     B     W     B     B                   
## 6 N:2   B     W     B     W     B     W     B                   
## 7 N:3   W     B     W     B     B     W     B

Spliting extra columns that were not splitted

# Extracting numerical values from "USCF ID / Rtg (Pre->Post)" unsplitted column
temp <- str_extract_all(newer_table$`USCF ID / Rtg (Pre->Post)`,"\\b\\d{1,}")
temp <- data.frame(as.character(temp))
# Separating the data frame from one couln to three different columns
temp <- separate(data = temp, col = as.character.temp., into = c("col1","col2","col3"), sep = ",")
kable(head(temp))
col1 col2 col3
c(“15445895” “1794” “1817”)
c(“14598900” “1553” “1663”)
c(“14959604” “1384” “1640”)
c(“12616049” “1716” “1744”)
c(“14601533” “1655” “1690”)
c(“15055204” “1686” “1687”)
# Temporary column vectors
col1 <- str_extract_all(temp$col1,"[[:digit:]]{1,}")
col2 <- str_extract_all(temp$col2,"[[:digit:]]{1,}")
col3 <- str_extract_all(temp$col3,"[[:digit:]]{1,}")

newer_table$`USCF ID` <- as.character(col1)
newer_table$`Pre Rating` <- as.character(col2)
newer_table$`Post Rating` <- as.character(col3)
head(newer_table)
##     Pair                       Player Name Total Round 1 Round 2 Round 3
## 2     1   GARY HUA                         6.0     W  39   W  21   W  18
## 3     2   DAKSHESH DARURI                  6.0     W  63   W  58   L   4
## 4     3   ADITYA BAJAJ                     6.0     L   8   W  61   W  25
## 5     4   PATRICK H SCHILLING              5.5     W  23   D  28   W   2
## 6     5   HANSHI ZUO                       5.5     W  45   W  37   D  12
## 7     6   HANSEN SONG                      5.0     W  34   D  29   L  11
##   Round 4 Round 5 Round 6 Round 7  State        USCF ID / Rtg (Pre->Post)
## 2   W  14   W   7   D  12   D   4    ON   15445895 / R: 1794   >1817     
## 3   W  17   W  16   W  20   W   7    MI   14598900 / R: 1553   >1663     
## 4   W  21   W  11   W  13   W  12    MI   14959604 / R: 1384   >1640     
## 5   W  26   D   5   W  19   D   1    MI   12616049 / R: 1716   >1744     
## 6   D  13   D   4   W  14   W  17    MI   14601533 / R: 1655   >1690     
## 7   W  35   D  10   W  27   W  21    OH   15055204 / R: 1686   >1687     
##     Pts     1     2     3     4     5     6     7 Ave Pre Rating  USCF ID
## 2 N:2   W     B     W     B     W     B     W                    15445895
## 3 N:2   B     W     B     W     B     W     B                    14598900
## 4 N:2   W     B     W     B     W     B     W                    14959604
## 5 N:2   W     B     W     B     W     B     B                    12616049
## 6 N:2   B     W     B     W     B     W     B                    14601533
## 7 N:3   W     B     W     B     B     W     B                    15055204
##   Pre Rating Post Rating
## 2       1794        1817
## 3       1553        1663
## 4       1384        1640
## 5       1716        1744
## 6       1655        1690
## 7       1686        1687

Separating needed columns to include in .csv file

csv.table <- subset(newer_table, select = c(1,2,11,22,3,23,24,21))
kable(head(csv.table))
Pair Player Name State USCF ID Total Pre Rating Post Rating Ave Pre Rating
2 1 GARY HUA ON 15445895 6.0 1794 1817
3 2 DAKSHESH DARURI MI 14598900 6.0 1553 1663
4 3 ADITYA BAJAJ MI 14959604 6.0 1384 1640
5 4 PATRICK H SCHILLING MI 12616049 5.5 1716 1744
6 5 HANSHI ZUO MI 14601533 5.5 1655 1690
7 6 HANSEN SONG OH 15055204 5.0 1686 1687

Calculating Ave Pre Rating

# Creating the opponent values from the unsplitted data frame
opponent1 <- data.frame(as.numeric(str_extract_all(newer_table$`Round 1`,"[[:digit:]]{1,}")))
opponent2 <- data.frame(as.numeric(str_extract_all(newer_table$`Round 2`,"[[:digit:]]{1,}")))
opponent3 <- data.frame(as.numeric(str_extract_all(newer_table$`Round 3`,"[[:digit:]]{1,}")))
opponent4 <- data.frame(as.numeric(str_extract_all(newer_table$`Round 4`,"[[:digit:]]{1,}")))
opponent5 <- data.frame(as.numeric(str_extract_all(newer_table$`Round 5`,"[[:digit:]]{1,}")))
opponent6 <- data.frame(as.numeric(str_extract_all(newer_table$`Round 6`,"[[:digit:]]{1,}")))
opponent7 <- data.frame(as.numeric(str_extract_all(newer_table$`Round 7`,"[[:digit:]]{1,}")))

# Creeating Opponents data frame.
opponents <- cbind(newer_table$Pair, newer_table$`Player Name`, opponent1, opponent2, opponent3, opponent4, opponent5, opponent6, opponent7)
names(opponents) <- c("Pair","Player Name","Opp 1","Opp 2","Opp 3","Opp 4","Opp 5","Opp 6","Opp 7")

# Finding number of games played
for(i in 1:dim(opponents)[1]){
    opponents$NGames[i] <- 7 - as.numeric(sum(is.na(opponents[i,])))
}

# Reporting  table to view the opponents table
kable(opponents)
Pair Player Name Opp 1 Opp 2 Opp 3 Opp 4 Opp 5 Opp 6 Opp 7 NGames
1 GARY HUA 39 21 18 14 7 12 4 7
2 DAKSHESH DARURI 63 58 4 17 16 20 7 7
3 ADITYA BAJAJ 8 61 25 21 11 13 12 7
4 PATRICK H SCHILLING 23 28 2 26 5 19 1 7
5 HANSHI ZUO 45 37 12 13 4 14 17 7
6 HANSEN SONG 34 29 11 35 10 27 21 7
7 GARY DEE SWATHELL 57 46 13 11 1 9 2 7
8 EZEKIEL HOUGHTON 3 32 14 9 47 28 19 7
9 STEFANO LEE 25 18 59 8 26 7 20 7
10 ANVIT RAO 16 19 55 31 6 25 18 7
11 CAMERON WILLIAM MC LEMAN 38 56 6 7 3 34 26 7
12 KENNETH J TACK 42 33 5 38 NA 1 3 6
13 TORRANCE HENRY JR 36 27 7 5 33 3 32 7
14 BRADLEY SHAW 54 44 8 1 27 5 31 7
15 ZACHARY JAMES HOUGHTON 19 16 30 22 54 33 38 7
16 MIKE NIKITIN 10 15 NA 39 2 36 NA 5
17 RONALD GRZEGORCZYK 48 41 26 2 23 22 5 7
18 DAVID SUNDEEN 47 9 1 32 19 38 10 7
19 DIPANKAR ROY 15 10 52 28 18 4 8 7
20 JASON ZHENG 40 49 23 41 28 2 9 7
21 DINH DANG BUI 43 1 47 3 40 39 6 7
22 EUGENE L MCCLURE 64 52 28 15 NA 17 40 6
23 ALAN BUI 4 43 20 58 17 37 46 7
24 MICHAEL R ALDRICH 28 47 43 25 60 44 39 7
25 LOREN SCHWIEBERT 9 53 3 24 34 10 47 7
26 MAX ZHU 49 40 17 4 9 32 11 7
27 GAURAV GIDWANI 51 13 46 37 14 6 NA 6
28 SOFIA ADINA STANESCUBELLU 24 4 22 19 20 8 36 7
29 CHIEDOZIE OKORIE 50 6 38 34 52 48 NA 6
30 GEORGE AVERY JONES 52 64 15 55 31 61 50 7
31 RISHI SHETTY 58 55 64 10 30 50 14 7
32 JOSHUA PHILIP MATHEWS 61 8 44 18 51 26 13 7
33 JADE GE 60 12 50 36 13 15 51 7
34 MICHAEL JEFFERY THOMAS 6 60 37 29 25 11 52 7
35 JOSHUA DAVID LEE 46 38 56 6 57 52 48 7
36 SIDDHARTH JHA 13 57 51 33 NA 16 28 6
37 AMIYATOSH PWNANANDAM NA 5 34 27 NA 23 61 5
38 BRIAN LIU 11 35 29 12 NA 18 15 6
39 JOEL R HENDON 1 54 40 16 44 21 24 7
40 FOREST ZHANG 20 26 39 59 21 56 22 7
41 KYLE WILLIAM MURPHY 59 17 58 20 NA NA NA 4
42 JARED GE 12 50 57 60 61 64 56 7
43 ROBERT GLEN VASEY 21 23 24 63 59 46 55 7
44 JUSTIN D SCHILLING NA 14 32 53 39 24 59 6
45 DEREK YAN 5 51 60 56 63 55 58 7
46 JACOB ALEXANDER LAVALLEY 35 7 27 50 64 43 23 7
47 ERIC WRIGHT 18 24 21 61 8 51 25 7
48 DANIEL KHAIN 17 63 NA 52 NA 29 35 5
49 MICHAEL J MARTIN 26 20 63 64 58 NA NA 5
50 SHIVAM JHA 29 42 33 46 NA 31 30 6
51 TEJAS AYYAGARI 27 45 36 57 32 47 33 7
52 ETHAN GUO 30 22 19 48 29 35 34 7
53 JOSE C YBARRA NA 25 NA 44 NA 57 NA 3
54 LARRY HODGE 14 39 61 NA 15 59 64 6
55 ALEX KONG 62 31 10 30 NA 45 43 6
56 MARISA RICCI NA 11 35 45 NA 40 42 5
57 MICHAEL LU 7 36 42 51 35 53 NA 6
58 VIRAJ MOHILE 31 2 41 23 49 NA 45 6
59 SEAN M MC CORMICK 41 NA 9 40 43 54 44 6
60 JULIA SHEN 33 34 45 42 24 NA NA 5
61 JEZZEL FARKAS 32 3 54 47 42 30 37 7
62 ASHWIN BALAJI 55 NA NA NA NA NA NA 1
63 THOMAS JOSEPH HOSMER 2 48 49 43 45 NA NA 5
64 BEN LI 22 30 31 49 46 42 54 7
# Eliminating NA Cases in order to continue with our calculations, NA replaced by 0.
opponents[is.na(opponents)] <- as.numeric(0)

# Need to assign zero values in order to add accordingly and to avoid errors.
csv.table$`Ave Pre Rating` <- as.numeric(0)

# Procedure to calculate Average Pre-Rating for each player
for (k in 3:9){
  for (j in 1:dim(csv.table)[1]){
    for (i in 1:dim(csv.table)[1]){
      if (as.numeric(opponents[j,k]) == as.numeric(csv.table$Pair[i])){
        csv.table$`Ave Pre Rating`[j] <- as.numeric(csv.table$`Ave Pre Rating`[j]) + as.numeric(csv.table$`Pre Rating`[i])
      }
    }
  }
}

# Final Procedure to find each player's average based on the number of played games
csv.table$`Ave Pre Rating`  <- round(as.numeric(csv.table$`Ave Pre Rating`) / opponents$NGames,0)

Finalized table ready to export

kable(csv.table, row.names = FALSE)
Pair Player Name State USCF ID Total Pre Rating Post Rating Ave Pre Rating
1 GARY HUA ON 15445895 6.0 1794 1817 1605
2 DAKSHESH DARURI MI 14598900 6.0 1553 1663 1469
3 ADITYA BAJAJ MI 14959604 6.0 1384 1640 1564
4 PATRICK H SCHILLING MI 12616049 5.5 1716 1744 1574
5 HANSHI ZUO MI 14601533 5.5 1655 1690 1501
6 HANSEN SONG OH 15055204 5.0 1686 1687 1519
7 GARY DEE SWATHELL MI 11146376 5.0 1649 1673 1372
8 EZEKIEL HOUGHTON MI 15142253 5.0 1641 1657 1468
9 STEFANO LEE ON 14954524 5.0 1411 1564 1523
10 ANVIT RAO MI 14150362 5.0 1365 1544 1554
11 CAMERON WILLIAM MC LEMAN MI 12581589 4.5 1712 1696 1468
12 KENNETH J TACK MI 12681257 4.5 1663 1670 1506
13 TORRANCE HENRY JR MI 15082995 4.5 1666 1662 1498
14 BRADLEY SHAW MI 10131499 4.5 1610 1618 1515
15 ZACHARY JAMES HOUGHTON MI 15619130 4.5 1220 1416 1484
16 MIKE NIKITIN MI 10295068 4.0 1604 1613 1386
17 RONALD GRZEGORCZYK MI 10297702 4.0 1629 1610 1499
18 DAVID SUNDEEN MI 11342094 4.0 1600 1600 1480
19 DIPANKAR ROY MI 14862333 4.0 1564 1570 1426
20 JASON ZHENG MI 14529060 4.0 1595 1569 1411
21 DINH DANG BUI ON 15495066 4.0 1563 1562 1470
22 EUGENE L MCCLURE MI 12405534 4.0 1555 1529 1300
23 ALAN BUI ON 15030142 4.0 1363 1371 1214
24 MICHAEL R ALDRICH MI 13469010 4.0 1229 1300 1357
25 LOREN SCHWIEBERT MI 12486656 3.5 1745 1681 1363
26 MAX ZHU ON 15131520 3.5 1579 1564 1507
27 GAURAV GIDWANI MI 14476567 3.5 1552 1539 1222
28 SOFIA ADINA STANESCUBELLU MI 14882954 3.5 1507 1513 1522
29 CHIEDOZIE OKORIE MI 15323285 3.5 1602 1508 1314
30 GEORGE AVERY JONES ON 12577178 3.5 1522 1444 1144
31 RISHI SHETTY MI 15131618 3.5 1494 1444 1260
32 JOSHUA PHILIP MATHEWS ON 14073750 3.5 1441 1433 1379
33 JADE GE MI 14691842 3.5 1449 1421 1277
34 MICHAEL JEFFERY THOMAS MI 15051807 3.5 1399 1400 1375
35 JOSHUA DAVID LEE MI 14601397 3.5 1438 1392 1150
36 SIDDHARTH JHA MI 14773163 3.5 1355 1367 1388
37 AMIYATOSH PWNANANDAM MI 15489571 3.5 980 1077 1385
38 BRIAN LIU MI 15108523 3.0 1423 1439 1539
39 JOEL R HENDON MI 12923035 3.0 1436 1413 1430
40 FOREST ZHANG MI 14892710 3.0 1348 1346 1391
41 KYLE WILLIAM MURPHY MI 15761443 3.0 1403 1341 1248
42 JARED GE MI 14462326 3.0 1332 1256 1150
43 ROBERT GLEN VASEY MI 14101068 3.0 1283 1244 1107
44 JUSTIN D SCHILLING MI 15323504 3.0 1199 1199 1327
45 DEREK YAN MI 15372807 3.0 1242 1191 1152
46 JACOB ALEXANDER LAVALLEY MI 15490981 3.0 377 1076 1358
47 ERIC WRIGHT MI 12533115 2.5 1362 1341 1392
48 DANIEL KHAIN MI 14369165 2.5 1382 1335 1356
49 MICHAEL J MARTIN MI 12531685 2.5 1291 1259 1286
50 SHIVAM JHA MI 14773178 2.5 1056 1111 1296
51 TEJAS AYYAGARI MI 15205474 2.5 1011 1097 1356
52 ETHAN GUO MI 14918803 2.5 935 1092 1495
53 JOSE C YBARRA MI 12578849 2.0 1393 1359 1345
54 LARRY HODGE MI 12836773 2.0 1270 1200 1206
55 ALEX KONG MI 15412571 2.0 1186 1163 1406
56 MARISA RICCI MI 14679887 2.0 1153 1140 1414
57 MICHAEL LU MI 15113330 2.0 1092 1079 1363
58 VIRAJ MOHILE MI 14700365 2.0 917 941 1391
59 SEAN M MC CORMICK MI 12841036 2.0 853 878 1319
60 JULIA SHEN MI 14579262 1.5 967 984 1330
61 JEZZEL FARKAS ON 15771592 1.5 955 979 1327
62 ASHWIN BALAJI MI 15219542 1.0 1530 1535 1186
63 THOMAS JOSEPH HOSMER MI 15057092 1.0 1175 1125 1350
64 BEN LI MI 15006561 1.0 1163 1112 1263

Export csv file

#write.csv(csv.table, file = "Villalobos-tournamentInfo.csv")
write.table(csv.table, file = "Villalobos-tournamentInfo.csv",row.names=FALSE, na="",col.names=TRUE, sep=",")