Nolan_Assignment 1

Introduction

I decided to use a data set called nhl_elo, which is the predictions for NHL games. This specific data set dates back to the start of the national hockey league, going back to teh 1917-18 season. This data goes on to predict who will win each game and even the Stanley Cup. This method uses different calculations to predict each of the variables presented in this data set.

Link to article: https://fivethirtyeight.com/methodology/how-our-nhl-predictions-work/

library(RCurl)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

x<- getURL("https://raw.githubusercontent.com/arinolan/Assignment-1-data-607/main/nhl_elo.csv")
data <-read.csv(text=x)

newdata <- subset(data, select = -c(neutral, home_team_pregame_rating, away_team_pregame_rating, game_importance_rating,
                                    game_overall_rating, home_team_postgame_rating, away_team_postgame_rating, status),
                                    home_team_winprob > 0.6 & season > '2020')

df <- newdata

df$ot[df$ot=='OT'] <- 'Overtime'
df$ot[df$ot=='SO'] <- 'Shootout'
df$ot[df$ot=='3OT'] <- 'Tiple Overtime'
df$ot[df$ot==''] <- 'No Overtime'

df$playoff[df$playoff==1] <- 'Playoff'
df$playoff[df$playoff==0] <- 'Regular Season'

summary(df$away_team_winprob)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2017  0.3028  0.3394  0.3352  0.3736  0.3999

summary(df$away_team_expected_points)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5666  0.7585  0.8270  0.8186  0.8902  0.9390

summary(df$away_team_score)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.000   2.000   2.532   4.000   8.000

Nolan_Assignment 1

2022-09-03

Introduction

Conculsion and Findings