December 9, 2017

NHL Points Leaders

In his second NHL season, Connor McDavid had a breakout year to lead the league in points throughout the regular season. With a total of 100 points, he had 11 more than the established veterans, Sidney Crosby and Patrick Kane, who held the second highest point total among players throughout the season.

Image Credit: http://www.nhl.com.

Getting and Cleaning Data (Part 1)

We choose to further analyze the game-by-game trends of the top six point leaders throughout the season. "Game Logs" were pulled for each of the top six players from https://www.hockey-reference.com/ and saved in their own respective comma seperated text files in a common sub directory named "Game Logs". These datasets summarize various statistics that the player compiled on a game-by-game basis. For the purposes of this exercise, we are particularly interested in the "PTS" (the total number of points that the player generated in that game) and "Date" (the date in which the game was played) variables for each player.

Getting and Cleaning Data (Part 2)

library(dplyr)
fullGameLog <- data.frame(Date = as.Date(character()), 
                          PTS = integer(), 
                          cumulativePoints = integer(),
                          Player = character(), 
                          stringsAsFactors = FALSE)
for(i in list.files("Game Logs")){
        nextGameLog <- read.csv(paste("Game Logs/", i, sep = ""), 
                                header = TRUE, skip = 1)
        nextGameLog <- select(nextGameLog, Date, PTS)
        nextGameLog <- mutate(nextGameLog, 
                              cumulativePoints = cumsum(PTS))
        nextGameLog <- mutate(nextGameLog, 
                              Player = gsub(".txt", "", i))
        fullGameLog <- rbind(fullGameLog, nextGameLog)
}

fullGameLog$Date <- as.Date(fullGameLog$Date)

Plotly Graphic