games.data <- GET('https://api.mysportsfeeds.com/v1.2/pull/nhl/2015-2016-regular/full_game_schedule.json', authenticate('njpsy', 'asdfasdf'),
add_headers('Content-Type'='application/json', 'Accept-Encoding'='gzip')) %>%
content(as='text', encoding='UTF-8') %>%
fromJSON(flatten=TRUE) %>%
extract2(1) %>%
extract2(2) %>%
as.tibble()
game.dates <- games.data %>%
select(6) %>%
mutate(date=str_replace(date, '(\\d+)-(\\d+)-(\\d+)', '\\1\\2\\3')) %>%
unique()
API.Query <- function(date){
Sys.sleep(1)
GET('https://api.mysportsfeeds.com/v1.2/pull/nhl/2015-2016-regular/scoreboard.json', authenticate('njpsy', 'asdfasdf'),
add_headers('Content-Type'='application/json', 'Accept-Encoding'='gzip'),
query=list('fordate'=date)) %>%
content(as='text', encoding='UTF-8') %>%
fromJSON(flatten=TRUE) %>%
extract2(1) %>%
extract2(2) %>%
as.tibble()
}
box.data <- game.dates$date %>%
map_df(~API.Query(.)) %>%
select(8, 24) %>%
unnest() %>%
mutate(goals = awayScore %>% as.numeric() + homeScore %>% as.numeric()) %>%
select(-c(3:6)) %>%
setNames(c('game.ID'='id', '@number'='period', 'goals'='goals')) %>%
filter(period %in% 1:3) %>%
as.tibble()
In the NHL (National Hockey League), is there a relationship between the period of the game and number of goals scored by either team?
Each case represents the number goals scored per period in each game of the 2015-2016 regular hockey season. There are 1230 games, which is the entire population. My “super population” will be all post lockout NHL games (2013-2014 season forward).
Raw data is collected by mysportsfeed.com which maintains an API on all major sports created and supported by fans. I will query the API and collected the required data.
This is an observational study.
The total raw data is collected by mysportsfeed.com, however I will be querying, cleaning and storing the relevant data.
The response variable is the number of goals scored in the period and is numerical.
The explanatory variable is the period of the game and is categorical (1st, 2nd or 3rd)
box.data %>%
filter(period == '1') %>%
select(goals) %>%
describe()
## vars n mean sd median trimmed mad min max range skew kurtosis
## goals 1 1230 1.45 1.19 1 1.34 1.48 0 7 7 0.76 0.43
## se
## goals 0.03
box.data %>%
filter(period == '2') %>%
select(goals) %>%
describe()
## vars n mean sd median trimmed mad min max range skew kurtosis
## goals 1 1230 1.83 1.29 2 1.75 1.48 0 7 7 0.55 0.11
## se
## goals 0.04
box.data %>%
filter(period == '3') %>%
select(goals) %>%
describe()
## vars n mean sd median trimmed mad min max range skew kurtosis
## goals 1 1230 1.91 1.25 2 1.85 1.48 0 6 6 0.46 -0.03
## se
## goals 0.04
ggplot(box.data) +
geom_bar(aes(x=goals, y=(..count../sum(..count..)), fill=period)) +
facet_wrap(~period, nrow=3, ncol=1) +
labs(x='Goals',
y='Proportion of Goals Scored',
fill='Period',
title='Proportion of Goals Scored')