Introduction In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players:
Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents
For the first player, the information would be:
Gary Hua, ON, 6.0, 1794, 1605
1605 was calculated by using the pre-tournament opponents’ ratings of 1436, 1563, 1600, 1610, 1649, 1663, 1716, and dividing by the total number of games played.
The chess rating system (invented by a Minnesota statistician named Arpad Elo) has been used in many other contexts, including assessing relative strength of employment candidates by human resource departments.
Import text file into R
tourinfo <- read.table("tournamentinfo.txt", sep = "\n", header=TRUE)
Examin Data
# Top of data
head(tourinfo, 10)
## X.........................................................................................
## 1 Pair | Player Name |Total|Round|Round|Round|Round|Round|Round|Round|
## 2 Num | USCF ID / Rtg (Pre->Post) | Pts | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
## 3 -----------------------------------------------------------------------------------------
## 4 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|
## 5 ON | 15445895 / R: 1794 ->1817 |N:2 |W |B |W |B |W |B |W |
## 6 -----------------------------------------------------------------------------------------
## 7 2 | DAKSHESH DARURI |6.0 |W 63|W 58|L 4|W 17|W 16|W 20|W 7|
## 8 MI | 14598900 / R: 1553 ->1663 |N:2 |B |W |B |W |B |W |B |
## 9 -----------------------------------------------------------------------------------------
## 10 3 | ADITYA BAJAJ |6.0 |L 8|W 61|W 25|W 21|W 11|W 13|W 12|
# End of data
tail(tourinfo, 10)
## X.........................................................................................
## 186 -----------------------------------------------------------------------------------------
## 187 62 | ASHWIN BALAJI |1.0 |W 55|U |U |U |U |U |U |
## 188 MI | 15219542 / R: 1530 ->1535 | |B | | | | | | |
## 189 -----------------------------------------------------------------------------------------
## 190 63 | THOMAS JOSEPH HOSMER |1.0 |L 2|L 48|D 49|L 43|L 45|H |U |
## 191 MI | 15057092 / R: 1175 ->1125 | |W |B |W |B |B | | |
## 192 -----------------------------------------------------------------------------------------
## 193 64 | BEN LI |1.0 |L 22|D 30|L 31|D 49|L 46|L 42|L 54|
## 194 MI | 15006561 / R: 1163 ->1112 | |B |W |W |B |W |B |B |
## 195 -----------------------------------------------------------------------------------------
# Structure of the data
str(tourinfo)
## 'data.frame': 195 obs. of 1 variable:
## $ X.........................................................................................: Factor w/ 131 levels " 1 | GARY HUA |6.0 |W 39|W 21|W 18|W 14|W 7|D 12|D 4|",..: 130 129 131 1 126 131 2 89 131 3 ...
# Dimensions
dim(tourinfo)
## [1] 195 1
# Missing values
is.na(tourinfo)
## X.........................................................................................
## [1,] FALSE
## [2,] FALSE
## [3,] FALSE
## [4,] FALSE
## [5,] FALSE
## [6,] FALSE
## [7,] FALSE
## [8,] FALSE
## [9,] FALSE
## [10,] FALSE
## [11,] FALSE
## [12,] FALSE
## [13,] FALSE
## [14,] FALSE
## [15,] FALSE
## [16,] FALSE
## [17,] FALSE
## [18,] FALSE
## [19,] FALSE
## [20,] FALSE
## [21,] FALSE
## [22,] FALSE
## [23,] FALSE
## [24,] FALSE
## [25,] FALSE
## [26,] FALSE
## [27,] FALSE
## [28,] FALSE
## [29,] FALSE
## [30,] FALSE
## [31,] FALSE
## [32,] FALSE
## [33,] FALSE
## [34,] FALSE
## [35,] FALSE
## [36,] FALSE
## [37,] FALSE
## [38,] FALSE
## [39,] FALSE
## [40,] FALSE
## [41,] FALSE
## [42,] FALSE
## [43,] FALSE
## [44,] FALSE
## [45,] FALSE
## [46,] FALSE
## [47,] FALSE
## [48,] FALSE
## [49,] FALSE
## [50,] FALSE
## [51,] FALSE
## [52,] FALSE
## [53,] FALSE
## [54,] FALSE
## [55,] FALSE
## [56,] FALSE
## [57,] FALSE
## [58,] FALSE
## [59,] FALSE
## [60,] FALSE
## [61,] FALSE
## [62,] FALSE
## [63,] FALSE
## [64,] FALSE
## [65,] FALSE
## [66,] FALSE
## [67,] FALSE
## [68,] FALSE
## [69,] FALSE
## [70,] FALSE
## [71,] FALSE
## [72,] FALSE
## [73,] FALSE
## [74,] FALSE
## [75,] FALSE
## [76,] FALSE
## [77,] FALSE
## [78,] FALSE
## [79,] FALSE
## [80,] FALSE
## [81,] FALSE
## [82,] FALSE
## [83,] FALSE
## [84,] FALSE
## [85,] FALSE
## [86,] FALSE
## [87,] FALSE
## [88,] FALSE
## [89,] FALSE
## [90,] FALSE
## [91,] FALSE
## [92,] FALSE
## [93,] FALSE
## [94,] FALSE
## [95,] FALSE
## [96,] FALSE
## [97,] FALSE
## [98,] FALSE
## [99,] FALSE
## [100,] FALSE
## [101,] FALSE
## [102,] FALSE
## [103,] FALSE
## [104,] FALSE
## [105,] FALSE
## [106,] FALSE
## [107,] FALSE
## [108,] FALSE
## [109,] FALSE
## [110,] FALSE
## [111,] FALSE
## [112,] FALSE
## [113,] FALSE
## [114,] FALSE
## [115,] FALSE
## [116,] FALSE
## [117,] FALSE
## [118,] FALSE
## [119,] FALSE
## [120,] FALSE
## [121,] FALSE
## [122,] FALSE
## [123,] FALSE
## [124,] FALSE
## [125,] FALSE
## [126,] FALSE
## [127,] FALSE
## [128,] FALSE
## [129,] FALSE
## [130,] FALSE
## [131,] FALSE
## [132,] FALSE
## [133,] FALSE
## [134,] FALSE
## [135,] FALSE
## [136,] FALSE
## [137,] FALSE
## [138,] FALSE
## [139,] FALSE
## [140,] FALSE
## [141,] FALSE
## [142,] FALSE
## [143,] FALSE
## [144,] FALSE
## [145,] FALSE
## [146,] FALSE
## [147,] FALSE
## [148,] FALSE
## [149,] FALSE
## [150,] FALSE
## [151,] FALSE
## [152,] FALSE
## [153,] FALSE
## [154,] FALSE
## [155,] FALSE
## [156,] FALSE
## [157,] FALSE
## [158,] FALSE
## [159,] FALSE
## [160,] FALSE
## [161,] FALSE
## [162,] FALSE
## [163,] FALSE
## [164,] FALSE
## [165,] FALSE
## [166,] FALSE
## [167,] FALSE
## [168,] FALSE
## [169,] FALSE
## [170,] FALSE
## [171,] FALSE
## [172,] FALSE
## [173,] FALSE
## [174,] FALSE
## [175,] FALSE
## [176,] FALSE
## [177,] FALSE
## [178,] FALSE
## [179,] FALSE
## [180,] FALSE
## [181,] FALSE
## [182,] FALSE
## [183,] FALSE
## [184,] FALSE
## [185,] FALSE
## [186,] FALSE
## [187,] FALSE
## [188,] FALSE
## [189,] FALSE
## [190,] FALSE
## [191,] FALSE
## [192,] FALSE
## [193,] FALSE
## [194,] FALSE
## [195,] FALSE
# Data class
class(tourinfo)
## [1] "data.frame"
Extract player name
library(stringr)
player_name <- (str_trim(unlist(str_extract_all(unlist(tourinfo),
"([[:alpha:] ]-?){13,50}"))))[1:64]
head(player_name, 10)
## [1] "Player Name" "GARY HUA" "DAKSHESH DARURI"
## [4] "ADITYA BAJAJ" "PATRICK H SCHILLING" "HANSHI ZUO"
## [7] "HANSEN SONG" "GARY DEE SWATHELL" "EZEKIEL HOUGHTON"
## [10] "STEFANO LEE"
Extrat state
state <- str_trim(unlist(str_extract_all(unlist(tourinfo), "MI | ON | OH ")))
head(state, 10)
## [1] "ON" "MI" "MI" "MI" "MI" "OH" "MI" "MI" "ON" "MI"
Extract pre-rating
# Extact players pre-rating
# check class
class(1794)
## [1] "numeric"
pre_ratng <- str_replace_all(str_trim(unlist(str_extract_all(unlist(tourinfo),
"R: [[:digit:] ]*"))), "R: ", "")
head(pre_ratng, 10)
## [1] "1794" "1553" "1384" "1716" "1655" "1686" "1649" "1641" "1411" "1365"
Extract round
r <- str_sub(unlist(tourinfo), start = 48, end = 89) [c(seq(1, length(unlist(tourinfo)),
by = 3))]
head(r, 10)
## [1] "Round|Round|Round|Round|Round|Round|Round|"
## [2] "W 39|W 21|W 18|W 14|W 7|D 12|D 4|"
## [3] "W 63|W 58|L 4|W 17|W 16|W 20|W 7|"
## [4] "L 8|W 61|W 25|W 21|W 11|W 13|W 12|"
## [5] "W 23|D 28|W 2|W 26|D 5|W 19|D 1|"
## [6] "W 45|W 37|D 12|D 13|D 4|W 14|W 17|"
## [7] "W 34|D 29|L 11|W 35|D 10|W 27|W 21|"
## [8] "W 57|W 46|W 13|W 11|L 1|W 9|L 2|"
## [9] "W 3|W 32|L 14|L 9|W 47|W 28|W 19|"
## [10] "W 25|L 18|W 59|W 8|W 26|L 7|W 20|"
Convert data frame to a numeric matrix Data are numeric so no need to coerce
# Extract the characters W B D L H
rating <- str_extract_all(r, "( |\\d){4}")
head(rating, 10)
## [[1]]
## character(0)
##
## [[2]]
## [1] " 39" " 21" " 18" " 14" " 7" " 12" " 4"
##
## [[3]]
## [1] " 63" " 58" " 4" " 17" " 16" " 20" " 7"
##
## [[4]]
## [1] " 8" " 61" " 25" " 21" " 11" " 13" " 12"
##
## [[5]]
## [1] " 23" " 28" " 2" " 26" " 5" " 19" " 1"
##
## [[6]]
## [1] " 45" " 37" " 12" " 13" " 4" " 14" " 17"
##
## [[7]]
## [1] " 34" " 29" " 11" " 35" " 10" " 27" " 21"
##
## [[8]]
## [1] " 57" " 46" " 13" " 11" " 1" " 9" " 2"
##
## [[9]]
## [1] " 3" " 32" " 14" " 9" " 47" " 28" " 19"
##
## [[10]]
## [1] " 25" " 18" " 59" " 8" " 26" " 7" " 20"
rating <- as.numeric(unlist(rating))
# Convert data frame to numberic matrix
ratng_convert <- matrix (rating, nrow = 1, ncol = 64)
head(ratng_convert, 10)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## [1,] 39 21 18 14 7 12 4 63 58 4 17 16 20
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## [1,] 7 8 61 25 21 11 13 12 23 28 2
## [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35]
## [1,] 26 5 19 1 45 37 12 13 4 14 17
## [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46]
## [1,] 34 29 11 35 10 27 21 57 46 13 11
## [,47] [,48] [,49] [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57]
## [1,] 1 9 2 3 32 14 9 47 28 19 25
## [,58] [,59] [,60] [,61] [,62] [,63] [,64]
## [1,] 18 59 8 26 7 20 16
# Calculate average
average <- colMeans((ratng_convert))
Print data
data.frame(player_name, state, pre_ratng, average)
## player_name state pre_ratng average
## 1 Player Name ON 1794 39
## 2 GARY HUA MI 1553 21
## 3 DAKSHESH DARURI MI 1384 18
## 4 ADITYA BAJAJ MI 1716 14
## 5 PATRICK H SCHILLING MI 1655 7
## 6 HANSHI ZUO OH 1686 12
## 7 HANSEN SONG MI 1649 4
## 8 GARY DEE SWATHELL MI 1641 63
## 9 EZEKIEL HOUGHTON ON 1411 58
## 10 STEFANO LEE MI 1365 4
## 11 ANVIT RAO MI 1712 17
## 12 CAMERON WILLIAM MC LEMAN MI 1663 16
## 13 KENNETH J TACK MI 1666 20
## 14 TORRANCE HENRY JR MI 1610 7
## 15 BRADLEY SHAW MI 1220 8
## 16 ZACHARY JAMES HOUGHTON MI 1604 61
## 17 MIKE NIKITIN MI 1629 25
## 18 RONALD GRZEGORCZYK MI 1600 21
## 19 DAVID SUNDEEN MI 1564 11
## 20 DIPANKAR ROY MI 1595 13
## 21 JASON ZHENG ON 1563 12
## 22 DINH DANG BUI MI 1555 23
## 23 EUGENE L MCCLURE ON 1363 28
## 24 ALAN BUI MI 1229 2
## 25 MICHAEL R ALDRICH MI 1745 26
## 26 LOREN SCHWIEBERT ON 1579 5
## 27 MAX ZHU MI 1552 19
## 28 GAURAV GIDWANI MI 1507 1
## 29 SOFIA ADINA STANESCU-BELLU MI 1602 45
## 30 CHIEDOZIE OKORIE ON 1522 37
## 31 GEORGE AVERY JONES MI 1494 12
## 32 RISHI SHETTY ON 1441 13
## 33 JOSHUA PHILIP MATHEWS MI 1449 4
## 34 JADE GE MI 1399 14
## 35 MICHAEL JEFFERY THOMAS MI 1438 17
## 36 JOSHUA DAVID LEE MI 1355 34
## 37 SIDDHARTH JHA MI 980 29
## 38 AMIYATOSH PWNANANDAM MI 1423 11
## 39 BRIAN LIU MI 1436 35
## 40 JOEL R HENDON MI 1348 10
## 41 FOREST ZHANG MI 1403 27
## 42 KYLE WILLIAM MURPHY MI 1332 21
## 43 JARED GE MI 1283 57
## 44 ROBERT GLEN VASEY MI 1199 46
## 45 JUSTIN D SCHILLING MI 1242 13
## 46 DEREK YAN MI 377 11
## 47 JACOB ALEXANDER LAVALLEY MI 1362 1
## 48 ERIC WRIGHT MI 1382 9
## 49 DANIEL KHAIN MI 1291 2
## 50 MICHAEL J MARTIN MI 1056 3
## 51 SHIVAM JHA MI 1011 32
## 52 TEJAS AYYAGARI MI 935 14
## 53 ETHAN GUO MI 1393 9
## 54 JOSE C YBARRA MI 1270 47
## 55 LARRY HODGE MI 1186 28
## 56 ALEX KONG MI 1153 19
## 57 MARISA RICCI MI 1092 25
## 58 MICHAEL LU MI 917 18
## 59 VIRAJ MOHILE MI 853 59
## 60 SEAN M MC CORMICK MI 967 8
## 61 JULIA SHEN ON 955 26
## 62 JEZZEL FARKAS MI 1530 7
## 63 ASHWIN BALAJI MI 1175 20
## 64 THOMAS JOSEPH HOSMER MI 1163 16
Output file
touraverage <- data.frame(player_name, state, pre_ratng, average)
write.csv(touraverage, file="/users/sharonmorris/IS607/tournamentinfodone.txt")