Download the file first and then read the txt file into R.
library(stringr)
library(RCurl)
## Loading required package: bitops
url = "https://raw.githubusercontent.com/cyadusha/tournamentinfo/master/tournamentinfo.txt"
x = getURL(url)
tournamentinfo = read.delim(file = textConnection(x), header = TRUE)
nrow(tournamentinfo)
## [1] 195
We know that we desire a sequence that has a length of 64 values. The txt file that is inputted into R has 195 rows. This does not imply that there are 195 players. Therefore, we input the following command in order for us to know which rows should be subsetted from the data.
subsetrows = c(seq(from = 4, to = 193, by = 3))
subsetrows
## [1] 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52
## [18] 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103
## [35] 106 109 112 115 118 121 124 127 130 133 136 139 142 145 148 151 154
## [52] 157 160 163 166 169 172 175 178 181 184 187 190 193
This is where we obtain the player names, total points, and the opponents’ identifications.
t1 = tournamentinfo[c(subsetrows[1:64]), ]
names = str_trim(unlist(str_extract(t1, "[[:alpha:]]+ ?[[:alpha:]]+ [[:alpha:]]+")))
totalpoints = str_trim(unlist(str_extract(t1, "[[:digit:]].[[:digit:]]")))
opponents = str_extract_all(t1, "[[:digit:]]{2}+[|]+[[:upper:]]|[[:digit:]]+[|]+[[:upper:]]|[[:digit:]]+[|]")
opponents = str_extract_all(opponents,"[[:digit:]]{2}|[[:digit:]]")
opponents = lapply(opponents, as.numeric)
The rows where the state names and the preratings are stored are one row below those of the player’s names.
subsetrows2 = subsetrows + 1
subsetrows2
## [1] 5 8 11 14 17 20 23 26 29 32 35 38 41 44 47 50 53
## [18] 56 59 62 65 68 71 74 77 80 83 86 89 92 95 98 101 104
## [35] 107 110 113 116 119 122 125 128 131 134 137 140 143 146 149 152 155
## [52] 158 161 164 167 170 173 176 179 182 185 188 191 194
This is where we obtain the state names and the preratings.
t2 = tournamentinfo[c(subsetrows2[1:64]), ]
states = str_trim(unlist(str_extract(t2, "[:alpha:]+[:alpha:]")))
prerating = str_trim(unlist(str_extract(t2, "R: ?.[[:digit:]]{3}|R: ?.[[:digit:]]{4}")))
prerating = str_sub(prerating, start = 4, end = 7)
prerating = as.numeric(prerating)
We have to set up the identification numbers of the opponents as numeric values.
id = c(1:length(t1))
t3 = data.frame(id, prerating)
Next, we match the identification number of each opponent given in each round with the prerating of the opponent given in the row below the name.
opponentsmatching = lapply(opponents, function(x){
sapply(x, function(y){
y = t3$prerating[t3$id == y]
})
})
Finally we compute the average opponent rating for each player. For some players, the value in some of their rounds is actually null. For those players we will just average the given values.
averageopponentrating = round(sapply(opponentsmatching,mean), 0)
We collect all of the columns computed above into a single data frame of 5 columns and 64 rows.
tournamentdata = data.frame(names, states, totalpoints, prerating, averageopponentrating)
colnames(tournamentdata) = c("Name", "State", "Total Number of Points", "Pre-Rating", "Average Pre-Rating of Opponents")
head(tournamentdata)
## Name State Total Number of Points Pre-Rating
## 1 GARY HUA ON 6.0 1794
## 2 DAKSHESH DARURI MI 6.0 1553
## 3 ADITYA BAJAJ MI 6.0 1384
## 4 PATRICK H SCHILLING MI 5.5 1716
## 5 HANSHI ZUO MI 5.5 1655
## 6 HANSEN SONG OH 5.0 1686
## Average Pre-Rating of Opponents
## 1 1605
## 2 1469
## 3 1564
## 4 1574
## 5 1501
## 6 1519
Now we write the data into a .csv file.
write.table(tournamentdata, file = "tournamentdata.csv", sep = ",", row.names = FALSE)
Now we make sure the data was read into a .csv file.
tournamentdata = read.csv(file = "/Users/chittampalliyashaswini/Desktop/Yadu/tournamentdata.csv", sep = ",", header = TRUE)