All the tables, save the last, are made using df_print
and paged in the YAML
. These tables will automatically scroll off into the sunset. So, if you feel data is missing click on the traingle in the upper right corner and it will give you asscess to the hidden columns.
Also, I used code_folding with hdden in the YAML
as well, so the code can be rolled up to better read through data, graphics and tables. If you are missing code you think should be here, just click on the Code
button and if you want to clear it out to see the tables and text, roll it back up by clicking Hide
Using the fixed width method, I read this file directly from the Github into R, skipping the first four rows of headings and dashes after loading the needed packages
require(stringr)
require(DT)
url ="https://raw.githubusercontent.com/bpoulin-CUNY/Data607/master/chess.txt"
data<-read.fwf(url, c(6,35,6,6,6,6,6,6,6,6), skip = 4)
data[] <- (lapply(data[1:10], FUN = function(x) (gsub("\\|", "", x))) )
data
The file seems to have come in properly, but much grooming is still necessary. It is time to remove the pipe symbols “|” as well as the rows with nothing but dashes. Then create a counter to interate throgh rows the first and second rows of each player while skipping the third, then moving 3 rows forward and repeating this process to align all of the first players information on a single row of the data frame.
To facilitate this process I creadted an empty data frame of the needed dimensions and then used rbind
to attach the temporary row to the permanent frame from inside the loop. Once siftedData
, I subsetted to keep only the necessary columns and saved them into the data frame refindData.
data[] <- (lapply(data[1:10], FUN = function(x) (gsub("\\|", "", x))) )
rownames(data)<-NULL
iter<-1
siftedData<-data.frame(matrix(ncol = 20, nrow = 0))
while (iter<191){
temp = data.frame(cbind(as.vector(data[iter,]), as.vector(data[iter+1,])))
siftedData = rbind(siftedData, temp)
iter= iter +3
}
refinedData <- siftedData[,c("V1", "V2","V3","V4", "V5","V6","V7", "V8","V9","V10", "V1.1","V2.1")]
siftedData # with only necessary rows properly aligned
refinedData # with only necessary columns
Sorting out the unnecessary columns here the remaining columns are only what is needed to form the final data frame or do calculations for it.
Because only the integers from columns V4 through V1.1 are needed to create the indexes for the oppositions pre-contest averages, the letters and spaces were removed. To do this, I created a function using regular expressions and stringr
and passed in into lappy
columns 4 through 10. The resulting data is clean of letter characters.
numberExtractor<- function(x){
temp =vector('numeric')
for(i in 1:length(x)){dig=str_extract(x[i], "[[:digit:]]{1,3}"); temp=c(temp,dig)}
return(as.numeric(temp))}
rownames(refinedData)<-NULL
refinedData[,4:10] <- lapply(refinedData[,4:10], numberExtractor)
refinedData
In order to show the players pre-match scores as well as cross-calculate the average pre-match scores of each players opponents, it was necessary to pull the four numbers following the “R:” in column twelve. I accomplished this with the following function called preExtractor using regular expressions. I then applied it to each row in column twelve using lapply
With all the crude transformation taken care of and the aggregating phase about to begin, adding useful variable names also made sense.
preExtractor<- function(x){code=as.numeric(str_extract(x,"(?<=R: \\D)\\d{3}|(?<=R:\\D)\\d{4}"))
return(code)}
refinedData[,12] <- unlist(lapply(refinedData[,12], preExtractor))
colnames(refinedData)<- c("Player_ID", "Name", "Total_Points", "Round_1","Round_2","Round_3","Round_4","Round_5","Round_6","Round_7", "State", "Prerating")
refinedData$Total_Points <-as.numeric(refinedData$Total_Points)
refinedData
The easiest way to create the pre-match average of opponents for each player was to first create a cross table with the Player_ID’s and that players Pre_Match Score, and use it in an SQL-like fashion as a key and value. With this table and a bit of grooming to replace NA
values, in refinedData, with a non-score integer (allowing the iterator to act directly on the player ID instead of creating a more complicated loop) and converting the Player_ID’s in the SubstitutionFrame
to numerics so the match the numbers in the main table the scores can easily be dropped into the table using for loops and ifelse statements to replace NA values where they previously were.
substitutionFrame <- refinedData[,c("Player_ID","Prerating")]
substitutionFrame$Player_ID <- as.numeric(substitutionFrame$Player_ID)
refinedData[is.na(refinedData)] <- 2900
refinedData$Prerating[refinedData$Prerating==2900] <- NA
for(i in 1:length(refinedData$Round_1)){ refinedData$Round_1[i] = ifelse(i==2900 , NA ,substitutionFrame$Prerating[substitutionFrame$Player_ID==refinedData$Round_1[i]])}
for(i in 1:length(refinedData$Round_2)){ refinedData$Round_2[i] = ifelse(i==2900 , NA ,substitutionFrame$Prerating[substitutionFrame$Player_ID==refinedData$Round_2[i]])}
for(i in 1:length(refinedData$Round_3)){ refinedData$Round_3[i] = ifelse(i==2900 , NA ,substitutionFrame$Prerating[substitutionFrame$Player_ID==refinedData$Round_3[i]])}
for(i in 1:length(refinedData$Round_4)){ refinedData$Round_4[i] = ifelse(i==2900 , NA ,substitutionFrame$Prerating[substitutionFrame$Player_ID==refinedData$Round_4[i]])}
for(i in 1:length(refinedData$Round_5)){ refinedData$Round_5[i] = ifelse(i==2900 , NA ,substitutionFrame$Prerating[substitutionFrame$Player_ID==refinedData$Round_5[i]])}
for(i in 1:length(refinedData$Round_6)){ refinedData$Round_6[i] = ifelse(i==2900 , NA ,substitutionFrame$Prerating[substitutionFrame$Player_ID==refinedData$Round_6[i]])}
for(i in 1:length(refinedData$Round_7)){ refinedData$Round_7[i] = ifelse(i==2900 , NA ,substitutionFrame$Prerating[substitutionFrame$Player_ID==refinedData$Round_7[i]])}
refinedData
With the opposition scores inline with the player by round, it is a simple matter of extracting a vector of values for each player, taking their mean, and saving them into a column called Opponent_Average
which has place-holder NA’s.
refinedData$Opponent_Average <- NULL
for(r in 1:nrow(refinedData)){
vect<-as.vector(unlist(refinedData[r, 4:10])) # unlisting is needed to use mean across columns
refinedData$Opponent_Average[r]=round(mean(vect, na.rm = TRUE),0) #ignoring NA values and rounding to integers
}
refinedData
Creating the final table is simply a matter of subsetting using titles in the appropriate order for the final file.
Final_Chess <- refinedData[ ,c("Name", "State", "Total_Points", "Prerating", "Opponent_Average")]
This interactive JavaScript table allows you to sort by column, search by word, phrase or numeral yet is compact due to the pagination. The fields without values are the DT
way of presenting NA values.
#you will need to set the working directory to save the csv locally
#setwd("")
write.csv(siftedData, "final.csv")
That is a pretty basic handling of a fixed width text file with mixed character and numeric data variables.