Loading in the tidyverse, data and setting seed
# Loading tidyverse
library(tidyverse)
#Loading in Data
nhl_draft <- read_csv("nhldraft.csv")
# Combining Positions
nhl_draft <- nhl_draft |>
mutate(position = case_when(
position == "C RW" ~ "C/RW",
position == "Centr" ~ "C",
position == "C; LW" ~ "C/LW",
TRUE ~ position
))
# Setting seed
set.seed(1)
For my Final Project, I wanted to look at this data set from the point of view of a prospective General Manager of a National Hockey League Team. The data for this project comes from Kaggle user “MATT OP” with his data set NHL Draft Hockey Player Data (1963-2022). The data set keeps track of a variety of player stats from goals, assists, team, year, overall pick, etc…
When I first thought about this project, I thought about looking at this from a position stand point debating whether or not I should compare goalie groups. But since this project is supposed to having a potential real life impact, I decided to look at this from a Draft standpoint.
In any sport that has a draft, the number 1 overall pick tends to be one of the best prospect. For many teams, this could mean adding a potential star player to their roster. The idea of the 1st overall pick is interesting because teams either will build around that player or use that player as another piece to the success of their play style.Thus, I thought that it would be interesting to look at whether 1st Overall Picks of each draft turned out as good players in the league or not.
To start, I filtered the data to the first overall picks as one data set
first_pick <- nhl_draft |>
filter(overall_pick == 1)
Then, I wanted to figure out what a NHL First Overall pick looks like. So I broke it down by 3 Categories: Nationality, Position, and Team.
first_pick |>
group_by(nationality) |>
summarize(count = n()) |>
arrange(count) |>
ggplot(aes(reorder(nationality, count), y = count, fill = nationality))+
geom_col()+
geom_text(aes(label = count))+
labs(title = "Amount of 1st Overall Picks by Nationality")+
xlab("nationality")
Looking at Nationality, we can see that the majority of first over all picks by a large margin was Canada with 42 players from Canada drafted Number 1 overall. With North America being the hub of Hockey and the home of the National Hockey League it is no surprise that 50 of the 60 (83%) players drafter number 1 overall in the past 60 years were from Canada and the U.S. This is important because that means that the best players tend to have grown up in North America compared to other countries due to the popularity of the hockey within the continent. Otherwise countries like Russia (“RU”), and Sweeden (“SE”) mkae since due to their cold weather climate.
first_pick |>
drop_na(position) |>
group_by(position) |>
summarize(count = n()) |>
ggplot(aes(reorder(position, count), y = count, fill = position))+
geom_col()+
coord_flip()+
geom_text(aes(label = count))+
labs(title = "Amount of 1st Overall Picks by Position")+
xlab("position")
For Position, we see that the Center Position is the most popular 1st overall draft pick position with 24 of the 60 (40%) players as Centers. This also makes sense as the Center Position is the one always in the center of the action as well as the rink. They are very important in facilitating chemistry between teammates and tend to lead the team in how aggressive to play.
first_pick |>
group_by(team) |>
summarize(count = n()) |>
arrange(count) |>
ggplot(aes(reorder(team, desc(count)), y = count, fill = factor(count)))+
geom_col()+
coord_flip()+
geom_text(aes(label = count))+
labs(title = "Amount of 1st Overall Picks by Team")+
scale_fill_brewer(name = "team")+
xlab("team")
When looking at the graph for team, over the past 60 years there have only been 25 teams that draft first out of the 44 NHL teams in History (56%).The Montreal Canadians have had the most 1st Overall Draft Picks with 6 picks. The Atlanta Thrashers, Quebec Nordiques, and Minnesota North Starts are the three teams that had Number 1 picks but no longer exist. Certain teams have had more draft picks than others and looking at the counts below, it is difficult to get a 1st overall draft pick multiple times.
After breaking down what a Number 1 overall pick looked like, I wanted to look at how the first overall picks have performed in the NHL. Thus, I looked at 4 Statistics: Goals, Assists, Points, and Plus Minus.
first_pick |>
ggplot(aes(x = year, y = goals))+
geom_col()+
labs(title = "Goals by the First Overall Pick Over Time")
Looking at goals, we can see that the overall amount of goals that a player has contributed has varied. While you would think that 1st Overall Draft picks would be high scorers, it turns out that certain eras were better than others. The 1980s turned out to be the best era for goals scored with most of the players scoring over 200 goals in their careers. The number 1 overall picks from the 1960s did not have all of their players and the 2010s has been the worst era. That is partly due to younger prospects having to earn their playing time and thus could be in the minor leagues for a couple of years before coming into the big leagues.
first_pick |>
ggplot(aes(x = year, y =assists))+
geom_col()+
labs(title = "Assists by the First Overall Pick Over Time")
For Assists, the data tells a similar story. However, this time we can see that the number 1 overall picks from the 1970s had a larger amount of volatility in assists compared to the 1980s who had more players with similar amount of assists. This is important as 1970s players had larger amount of assists in general compared to the rest of the eras.
first_pick |>
ggplot(aes(x = year, y = points))+
geom_col()+
labs(title = "Points by the First Overall Pick Over Time")
Points is one of my favorite metrics for hockey because it is a cumulative of a player’s points and assists tell a better story of how an offense can be impacted by a player in a positive way. The 1980s first overall picks had the most amount of players over 500 points for this category and with had the league looks over time that is shocking to me. When thinking of hockey, I never classify the 80s as a golden era of number 1 draft picks but with how well those players score goals, I can believe it.
first_pick |>
ggplot(aes(x = year, y =plus_minus))+
geom_col()+
labs(title = "+/- by the First Overall Pick Over Time")+
ylab("+/-")
Finally, One of the best way to look at player performance is by looking at a player’s plus/minus. The plus minus of a player is the difference between the team’s total scoring versus their opponent’s when the player is in the game. This stat can be supported by what is known as the Corsi statistic which looks at the Shot Differential for a given player. However, this data set does not have total shot attempts, thus we will not be able to calculate Corsi for this analysis.
After looking at all of the statistics, I found that this project turned out to head a different direction than I initially thought. This data, sadly, only looks at player statistics for each player’s time in the league. This means we can only see the impact that the player had in the league.
For future research, I believe that this data generates plenty of ideas that can be explored. Future questions include “What aspects of a players’ statistics make him a Number 1 Draft Pick?”, “Do certain teams tend to favor certain player archetypes?”, and “How does a draft pick fit into a teams’ play style or does the team build around them?”
I hope you enjoyed reading about my final project and the findings. I enjoyed working on this project and even though I didn’t find what I wanted, I liked exploring the data and hope you did too.