Instructions

In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players: Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents For the first player, the information would be: Gary Hua, ON, 6.0, 1794, 1605 1605 was calculated by using the pre-tournament opponents’ ratings of 1436, 1563, 1600, 1610, 1649, 1663, 1716, and dividing by the total number of games played.

If you have questions about the meaning of the data or the results, please post them on the discussion forum. Data science, like chess, is a game of back and forth…

The chess rating system (invented by a Minnesota statistician named Arpad Elo) has been used in many other contexts, including assessing relative strength of employment candidates by human resource departments.

You may substitute another text file (or set of text files, or data scraped from web pages) of similar or greater complexity, and create your own assignment and solution. You may work in a small team. All of your code should be in an R markdown file (and published to rpubs.com); with your data accessible for the person running the script.

Read data from URL file by using “read.delim” function.

I experienced problems using “read.table” function hense I used the “read.delim” function.

url <- 'https://raw.githubusercontent.com/dvillalobos/MSDA/master/607/Projects/Project1/tournamentinfo.txt'
my.data <- read.delim(url, header=FALSE, stringsAsFactors =FALSE )
head(my.data)
##                                                                                           V1
## 1  -----------------------------------------------------------------------------------------
## 2  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 
## 3  Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## 4  -----------------------------------------------------------------------------------------
## 5      1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## 6     ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |

Clean line dividers

# Cleaning extra lines and eliminating Empty lines in between
head(my.data)

##                                                                                           V1
## 1  -----------------------------------------------------------------------------------------
## 2  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 
## 3  Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## 4  -----------------------------------------------------------------------------------------
## 5      1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## 6     ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |

split_data <- data.frame(str_replace_all(my.data$V1,"-",""))
head(split_data)

##                                                         str_replace_all.my.data.V1..........
## 1                                                                                           
## 2  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 
## 3   Num  | USCF ID / Rtg (Pre>Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## 4                                                                                           
## 5      1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
## 6      ON | 15445895 / R: 1794   >1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |

# Deleting empty lines
split_data <- data.frame(split_data[!apply(split_data == "", 1, all),])

Combining two consecutive rows into one column

# Need to define an empty new_dataframe
new_table <- data.frame(c())
# Combining two consecutive rows into one column
for (i in 1:dim(split_data)[1]){
  if (i %% 2 == 1) {
    Part1 <- rbind(new_table$Part1, as.character(split_data[i,1]))
    Part2 <- as.character(split_data[i+1,1])
    Combined <- data.frame(paste0(Part1, Part2))
    names(Combined) <- "Combined"
    new_table <- rbind(new_table, Combined)
  } 
}
head(new_table)

##                                                                                                                                                                              Combined
## 1  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round|  Num  | USCF ID / Rtg (Pre>Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## 2       1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|   ON | 15445895 / R: 1794   >1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |
## 3       2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|   MI | 14598900 / R: 1553   >1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |
## 4       3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12|   MI | 14959604 / R: 1384   >1640     |N:2  |W    |B    |W    |B    |W    |B    |W    |
## 5       4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1|   MI | 12616049 / R: 1716   >1744     |N:2  |W    |B    |W    |B    |W    |B    |B    |
## 6       5 | HANSHI ZUO                      |5.5  |W  45|W  37|D  12|D  13|D   4|W  14|W  17|   MI | 14601533 / R: 1655   >1690     |N:2  |B    |W    |B    |W    |B    |W    |B    |

Spliting one more column for “|” separator

# Creating headers
Headers <- c("Pair","Player Name","Total","Round 1","Round 2","Round 3", "Round 4","Round 5","Round 6","Round 7","State","USCF ID / Rtg (Pre->Post)","Pts","1","2","3","4","5","6","7","Ave Pre Rating")
# Separating into columns
newer_table <- separate(data = new_table, col = Combined, into = Headers, sep = "\\|")
# Need to clear row containing all columns names
newer_table <- newer_table[-c(1), ]

head(newer_table)

##     Pair                       Player Name Total Round 1 Round 2 Round 3
## 2     1   GARY HUA                         6.0     W  39   W  21   W  18
## 3     2   DAKSHESH DARURI                  6.0     W  63   W  58   L   4
## 4     3   ADITYA BAJAJ                     6.0     L   8   W  61   W  25
## 5     4   PATRICK H SCHILLING              5.5     W  23   D  28   W   2
## 6     5   HANSHI ZUO                       5.5     W  45   W  37   D  12
## 7     6   HANSEN SONG                      5.0     W  34   D  29   L  11
##   Round 4 Round 5 Round 6 Round 7  State        USCF ID / Rtg (Pre->Post)
## 2   W  14   W   7   D  12   D   4    ON   15445895 / R: 1794   >1817     
## 3   W  17   W  16   W  20   W   7    MI   14598900 / R: 1553   >1663     
## 4   W  21   W  11   W  13   W  12    MI   14959604 / R: 1384   >1640     
## 5   W  26   D   5   W  19   D   1    MI   12616049 / R: 1716   >1744     
## 6   D  13   D   4   W  14   W  17    MI   14601533 / R: 1655   >1690     
## 7   W  35   D  10   W  27   W  21    OH   15055204 / R: 1686   >1687     
##     Pts     1     2     3     4     5     6     7 Ave Pre Rating
## 2 N:2   W     B     W     B     W     B     W                   
## 3 N:2   B     W     B     W     B     W     B                   
## 4 N:2   W     B     W     B     W     B     W                   
## 5 N:2   W     B     W     B     W     B     B                   
## 6 N:2   B     W     B     W     B     W     B                   
## 7 N:3   W     B     W     B     B     W     B

Spliting extra columns that were not splitted

# Extracting numerical values from "USCF ID / Rtg (Pre->Post)" unsplitted column
temp <- str_extract_all(newer_table$`USCF ID / Rtg (Pre->Post)`,"\\b\\d{1,}")
temp <- data.frame(as.character(temp))
# Separating the data frame from one couln to three different columns
temp <- separate(data = temp, col = as.character.temp., into = c("col1","col2","col3"), sep = ",")
kable(head(temp))

col1	col2	col3
c(“15445895”	“1794”	“1817”)
c(“14598900”	“1553”	“1663”)
c(“14959604”	“1384”	“1640”)
c(“12616049”	“1716”	“1744”)
c(“14601533”	“1655”	“1690”)
c(“15055204”	“1686”	“1687”)

# Temporary column vectors
col1 <- str_extract_all(temp$col1,"[[:digit:]]{1,}")
col2 <- str_extract_all(temp$col2,"[[:digit:]]{1,}")
col3 <- str_extract_all(temp$col3,"[[:digit:]]{1,}")

newer_table$`USCF ID` <- as.character(col1)
newer_table$`Pre Rating` <- as.character(col2)
newer_table$`Post Rating` <- as.character(col3)
head(newer_table)

##     Pair                       Player Name Total Round 1 Round 2 Round 3
## 2     1   GARY HUA                         6.0     W  39   W  21   W  18
## 3     2   DAKSHESH DARURI                  6.0     W  63   W  58   L   4
## 4     3   ADITYA BAJAJ                     6.0     L   8   W  61   W  25
## 5     4   PATRICK H SCHILLING              5.5     W  23   D  28   W   2
## 6     5   HANSHI ZUO                       5.5     W  45   W  37   D  12
## 7     6   HANSEN SONG                      5.0     W  34   D  29   L  11
##   Round 4 Round 5 Round 6 Round 7  State        USCF ID / Rtg (Pre->Post)
## 2   W  14   W   7   D  12   D   4    ON   15445895 / R: 1794   >1817     
## 3   W  17   W  16   W  20   W   7    MI   14598900 / R: 1553   >1663     
## 4   W  21   W  11   W  13   W  12    MI   14959604 / R: 1384   >1640     
## 5   W  26   D   5   W  19   D   1    MI   12616049 / R: 1716   >1744     
## 6   D  13   D   4   W  14   W  17    MI   14601533 / R: 1655   >1690     
## 7   W  35   D  10   W  27   W  21    OH   15055204 / R: 1686   >1687     
##     Pts     1     2     3     4     5     6     7 Ave Pre Rating  USCF ID
## 2 N:2   W     B     W     B     W     B     W                    15445895
## 3 N:2   B     W     B     W     B     W     B                    14598900
## 4 N:2   W     B     W     B     W     B     W                    14959604
## 5 N:2   W     B     W     B     W     B     B                    12616049
## 6 N:2   B     W     B     W     B     W     B                    14601533
## 7 N:3   W     B     W     B     B     W     B                    15055204
##   Pre Rating Post Rating
## 2       1794        1817
## 3       1553        1663
## 4       1384        1640
## 5       1716        1744
## 6       1655        1690
## 7       1686        1687

Separating needed columns to include in .csv file

csv.table <- subset(newer_table, select = c(1,2,11,22,3,23,24,21))
kable(head(csv.table))

	Pair	Player Name	State	USCF ID	Total	Pre Rating	Post Rating
2	1	GARY HUA	ON	15445895	6.0	1794	1817
3	2	DAKSHESH DARURI	MI	14598900	6.0	1553	1663
4	3	ADITYA BAJAJ	MI	14959604	6.0	1384	1640
5	4	PATRICK H SCHILLING	MI	12616049	5.5	1716	1744
6	5	HANSHI ZUO	MI	14601533	5.5	1655	1690
7	6	HANSEN SONG	OH	15055204	5.0	1686	1687

Calculating Ave Pre Rating

# Creating the opponent values from the unsplitted data frame
opponent1 <- data.frame(as.numeric(str_extract_all(newer_table$`Round 1`,"[[:digit:]]{1,}")))
opponent2 <- data.frame(as.numeric(str_extract_all(newer_table$`Round 2`,"[[:digit:]]{1,}")))
opponent3 <- data.frame(as.numeric(str_extract_all(newer_table$`Round 3`,"[[:digit:]]{1,}")))
opponent4 <- data.frame(as.numeric(str_extract_all(newer_table$`Round 4`,"[[:digit:]]{1,}")))
opponent5 <- data.frame(as.numeric(str_extract_all(newer_table$`Round 5`,"[[:digit:]]{1,}")))
opponent6 <- data.frame(as.numeric(str_extract_all(newer_table$`Round 6`,"[[:digit:]]{1,}")))
opponent7 <- data.frame(as.numeric(str_extract_all(newer_table$`Round 7`,"[[:digit:]]{1,}")))

# Creeating Opponents data frame.
opponents <- cbind(newer_table$Pair, newer_table$`Player Name`, opponent1, opponent2, opponent3, opponent4, opponent5, opponent6, opponent7)
names(opponents) <- c("Pair","Player Name","Opp 1","Opp 2","Opp 3","Opp 4","Opp 5","Opp 6","Opp 7")

# Finding number of games played
for(i in 1:dim(opponents)[1]){
    opponents$NGames[i] <- 7 - as.numeric(sum(is.na(opponents[i,])))
}

# Reporting  table to view the opponents table
kable(opponents)

Pair	Player Name	Opp 1	Opp 2	Opp 3	Opp 4	Opp 5	Opp 6	Opp 7	NGames
1	GARY HUA	39	21	18	14	7	12	4	7
2	DAKSHESH DARURI	63	58	4	17	16	20	7	7
3	ADITYA BAJAJ	8	61	25	21	11	13	12	7
4	PATRICK H SCHILLING	23	28	2	26	5	19	1	7
5	HANSHI ZUO	45	37	12	13	4	14	17	7
6	HANSEN SONG	34	29	11	35	10	27	21	7
7	GARY DEE SWATHELL	57	46	13	11	1	9	2	7
8	EZEKIEL HOUGHTON	3	32	14	9	47	28	19	7
9	STEFANO LEE	25	18	59	8	26	7	20	7
10	ANVIT RAO	16	19	55	31	6	25	18	7
11	CAMERON WILLIAM MC LEMAN	38	56	6	7	3	34	26	7
12	KENNETH J TACK	42	33	5	38	NA	1	3	6
13	TORRANCE HENRY JR	36	27	7	5	33	3	32	7
14	BRADLEY SHAW	54	44	8	1	27	5	31	7
15	ZACHARY JAMES HOUGHTON	19	16	30	22	54	33	38	7
16	MIKE NIKITIN	10	15	NA	39	2	36	NA	5
17	RONALD GRZEGORCZYK	48	41	26	2	23	22	5	7
18	DAVID SUNDEEN	47	9	1	32	19	38	10	7
19	DIPANKAR ROY	15	10	52	28	18	4	8	7
20	JASON ZHENG	40	49	23	41	28	2	9	7
21	DINH DANG BUI	43	1	47	3	40	39	6	7
22	EUGENE L MCCLURE	64	52	28	15	NA	17	40	6
23	ALAN BUI	4	43	20	58	17	37	46	7
24	MICHAEL R ALDRICH	28	47	43	25	60	44	39	7
25	LOREN SCHWIEBERT	9	53	3	24	34	10	47	7
26	MAX ZHU	49	40	17	4	9	32	11	7
27	GAURAV GIDWANI	51	13	46	37	14	6	NA	6
28	SOFIA ADINA STANESCUBELLU	24	4	22	19	20	8	36	7
29	CHIEDOZIE OKORIE	50	6	38	34	52	48	NA	6
30	GEORGE AVERY JONES	52	64	15	55	31	61	50	7
31	RISHI SHETTY	58	55	64	10	30	50	14	7
32	JOSHUA PHILIP MATHEWS	61	8	44	18	51	26	13	7
33	JADE GE	60	12	50	36	13	15	51	7
34	MICHAEL JEFFERY THOMAS	6	60	37	29	25	11	52	7
35	JOSHUA DAVID LEE	46	38	56	6	57	52	48	7
36	SIDDHARTH JHA	13	57	51	33	NA	16	28	6
37	AMIYATOSH PWNANANDAM	NA	5	34	27	NA	23	61	5
38	BRIAN LIU	11	35	29	12	NA	18	15	6
39	JOEL R HENDON	1	54	40	16	44	21	24	7
40	FOREST ZHANG	20	26	39	59	21	56	22	7
41	KYLE WILLIAM MURPHY	59	17	58	20	NA	NA	NA	4
42	JARED GE	12	50	57	60	61	64	56	7
43	ROBERT GLEN VASEY	21	23	24	63	59	46	55	7
44	JUSTIN D SCHILLING	NA	14	32	53	39	24	59	6
45	DEREK YAN	5	51	60	56	63	55	58	7
46	JACOB ALEXANDER LAVALLEY	35	7	27	50	64	43	23	7
47	ERIC WRIGHT	18	24	21	61	8	51	25	7
48	DANIEL KHAIN	17	63	NA	52	NA	29	35	5
49	MICHAEL J MARTIN	26	20	63	64	58	NA	NA	5
50	SHIVAM JHA	29	42	33	46	NA	31	30	6
51	TEJAS AYYAGARI	27	45	36	57	32	47	33	7
52	ETHAN GUO	30	22	19	48	29	35	34	7
53	JOSE C YBARRA	NA	25	NA	44	NA	57	NA	3
54	LARRY HODGE	14	39	61	NA	15	59	64	6
55	ALEX KONG	62	31	10	30	NA	45	43	6
56	MARISA RICCI	NA	11	35	45	NA	40	42	5
57	MICHAEL LU	7	36	42	51	35	53	NA	6
58	VIRAJ MOHILE	31	2	41	23	49	NA	45	6
59	SEAN M MC CORMICK	41	NA	9	40	43	54	44	6
60	JULIA SHEN	33	34	45	42	24	NA	NA	5
61	JEZZEL FARKAS	32	3	54	47	42	30	37	7
62	ASHWIN BALAJI	55	NA	NA	NA	NA	NA	NA	1
63	THOMAS JOSEPH HOSMER	2	48	49	43	45	NA	NA	5
64	BEN LI	22	30	31	49	46	42	54	7

# Eliminating NA Cases in order to continue with our calculations, NA replaced by 0.
opponents[is.na(opponents)] <- as.numeric(0)

# Need to assign zero values in order to add accordingly and to avoid errors.
csv.table$`Ave Pre Rating` <- as.numeric(0)

# Procedure to calculate Average Pre-Rating for each player
for (k in 3:9){
  for (j in 1:dim(csv.table)[1]){
    for (i in 1:dim(csv.table)[1]){
      if (as.numeric(opponents[j,k]) == as.numeric(csv.table$Pair[i])){
        csv.table$`Ave Pre Rating`[j] <- as.numeric(csv.table$`Ave Pre Rating`[j]) + as.numeric(csv.table$`Pre Rating`[i])
      }
    }
  }
}

# Final Procedure to find each player's average based on the number of played games
csv.table$`Ave Pre Rating`  <- round(as.numeric(csv.table$`Ave Pre Rating`) / opponents$NGames,0)

Finalized table ready to export

kable(csv.table, row.names = FALSE)

Pair	Player Name	State	USCF ID	Total	Pre Rating	Post Rating	Ave Pre Rating
1	GARY HUA	ON	15445895	6.0	1794	1817	1605
2	DAKSHESH DARURI	MI	14598900	6.0	1553	1663	1469
3	ADITYA BAJAJ	MI	14959604	6.0	1384	1640	1564
4	PATRICK H SCHILLING	MI	12616049	5.5	1716	1744	1574
5	HANSHI ZUO	MI	14601533	5.5	1655	1690	1501
6	HANSEN SONG	OH	15055204	5.0	1686	1687	1519
7	GARY DEE SWATHELL	MI	11146376	5.0	1649	1673	1372
8	EZEKIEL HOUGHTON	MI	15142253	5.0	1641	1657	1468
9	STEFANO LEE	ON	14954524	5.0	1411	1564	1523
10	ANVIT RAO	MI	14150362	5.0	1365	1544	1554
11	CAMERON WILLIAM MC LEMAN	MI	12581589	4.5	1712	1696	1468
12	KENNETH J TACK	MI	12681257	4.5	1663	1670	1506
13	TORRANCE HENRY JR	MI	15082995	4.5	1666	1662	1498
14	BRADLEY SHAW	MI	10131499	4.5	1610	1618	1515
15	ZACHARY JAMES HOUGHTON	MI	15619130	4.5	1220	1416	1484
16	MIKE NIKITIN	MI	10295068	4.0	1604	1613	1386
17	RONALD GRZEGORCZYK	MI	10297702	4.0	1629	1610	1499
18	DAVID SUNDEEN	MI	11342094	4.0	1600	1600	1480
19	DIPANKAR ROY	MI	14862333	4.0	1564	1570	1426
20	JASON ZHENG	MI	14529060	4.0	1595	1569	1411
21	DINH DANG BUI	ON	15495066	4.0	1563	1562	1470
22	EUGENE L MCCLURE	MI	12405534	4.0	1555	1529	1300
23	ALAN BUI	ON	15030142	4.0	1363	1371	1214
24	MICHAEL R ALDRICH	MI	13469010	4.0	1229	1300	1357
25	LOREN SCHWIEBERT	MI	12486656	3.5	1745	1681	1363
26	MAX ZHU	ON	15131520	3.5	1579	1564	1507
27	GAURAV GIDWANI	MI	14476567	3.5	1552	1539	1222
28	SOFIA ADINA STANESCUBELLU	MI	14882954	3.5	1507	1513	1522
29	CHIEDOZIE OKORIE	MI	15323285	3.5	1602	1508	1314
30	GEORGE AVERY JONES	ON	12577178	3.5	1522	1444	1144
31	RISHI SHETTY	MI	15131618	3.5	1494	1444	1260
32	JOSHUA PHILIP MATHEWS	ON	14073750	3.5	1441	1433	1379
33	JADE GE	MI	14691842	3.5	1449	1421	1277
34	MICHAEL JEFFERY THOMAS	MI	15051807	3.5	1399	1400	1375
35	JOSHUA DAVID LEE	MI	14601397	3.5	1438	1392	1150
36	SIDDHARTH JHA	MI	14773163	3.5	1355	1367	1388
37	AMIYATOSH PWNANANDAM	MI	15489571	3.5	980	1077	1385
38	BRIAN LIU	MI	15108523	3.0	1423	1439	1539
39	JOEL R HENDON	MI	12923035	3.0	1436	1413	1430
40	FOREST ZHANG	MI	14892710	3.0	1348	1346	1391
41	KYLE WILLIAM MURPHY	MI	15761443	3.0	1403	1341	1248
42	JARED GE	MI	14462326	3.0	1332	1256	1150
43	ROBERT GLEN VASEY	MI	14101068	3.0	1283	1244	1107
44	JUSTIN D SCHILLING	MI	15323504	3.0	1199	1199	1327
45	DEREK YAN	MI	15372807	3.0	1242	1191	1152
46	JACOB ALEXANDER LAVALLEY	MI	15490981	3.0	377	1076	1358
47	ERIC WRIGHT	MI	12533115	2.5	1362	1341	1392
48	DANIEL KHAIN	MI	14369165	2.5	1382	1335	1356
49	MICHAEL J MARTIN	MI	12531685	2.5	1291	1259	1286
50	SHIVAM JHA	MI	14773178	2.5	1056	1111	1296
51	TEJAS AYYAGARI	MI	15205474	2.5	1011	1097	1356
52	ETHAN GUO	MI	14918803	2.5	935	1092	1495
53	JOSE C YBARRA	MI	12578849	2.0	1393	1359	1345
54	LARRY HODGE	MI	12836773	2.0	1270	1200	1206
55	ALEX KONG	MI	15412571	2.0	1186	1163	1406
56	MARISA RICCI	MI	14679887	2.0	1153	1140	1414
57	MICHAEL LU	MI	15113330	2.0	1092	1079	1363
58	VIRAJ MOHILE	MI	14700365	2.0	917	941	1391
59	SEAN M MC CORMICK	MI	12841036	2.0	853	878	1319
60	JULIA SHEN	MI	14579262	1.5	967	984	1330
61	JEZZEL FARKAS	ON	15771592	1.5	955	979	1327
62	ASHWIN BALAJI	MI	15219542	1.0	1530	1535	1186
63	THOMAS JOSEPH HOSMER	MI	15057092	1.0	1175	1125	1350
64	BEN LI	MI	15006561	1.0	1163	1112	1263

Export csv file

#write.csv(csv.table, file = "Villalobos-tournamentInfo.csv")
write.table(csv.table, file = "Villalobos-tournamentInfo.csv",row.names=FALSE, na="",col.names=TRUE, sep=",")

Project 1

CUNY MSDA - DATA607

Duubar Villalobos Jimenez mydvtech@gmail.com

February 26, 2017