This homework assignment is focused on Time Series and includes Exercises 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8 from the Hyndman text.
options(repos = list(CRAN="http://cran.rstudio.com/"))
knitr::opts_chunk$set(echo = TRUE)
install.packages("USgas")
##
## The downloaded binary packages are in
## /var/folders/4l/182nghl547v3mxtj6p_9dgph0000gn/T//Rtmp6K08Ut/downloaded_packages
library(USgas)
install.packages("fpp3")
##
## The downloaded binary packages are in
## /var/folders/4l/182nghl547v3mxtj6p_9dgph0000gn/T//Rtmp6K08Ut/downloaded_packages
library(fpp3)
## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.3
## ✔ dplyr 1.1.3 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.0 ✔ feasts 0.3.1
## ✔ lubridate 1.9.2 ✔ fable 0.3.3
## ✔ ggplot2 3.4.2 ✔ fabletools 0.3.3
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
#library(tidyverse)
Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.
?aus_production
?pelt
?gafa_stock
?vic_elec
aus_production %>% autoplot(Bricks)
## Warning: Removed 20 rows containing missing values (`geom_line()`).
pelt %>% autoplot(Lynx)
gafa_stock %>% autoplot(Close)
vic_elec %>% autoplot(Demand)
vic_elec %>%
autoplot(Demand) +
labs(title = "Total Electricity Demand in MWh",
subtitle = "Victoria, Australia",
y = "Demand (MWh)",
x = "Time (30-min)")
Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
gafa_stock %>% filter(Symbol == 'AAPL') %>% slice(which.max(Close))
## # A tsibble: 1 x 8 [!]
## # Key: Symbol [1]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2018-10-03 230. 233. 230. 232. 230. 28654800
gafa_stock %>% filter(Symbol == 'GOOG') %>% slice(which.max(Close))
## # A tsibble: 1 x 8 [!]
## # Key: Symbol [1]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 GOOG 2018-07-26 1251 1270. 1249. 1268. 1268. 2405600
gafa_stock %>% filter(Symbol == 'AMZN') %>% slice(which.max(Close))
## # A tsibble: 1 x 8 [!]
## # Key: Symbol [1]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
gafa_stock %>% filter(Symbol == 'FB') %>% slice(which.max(Close))
## # A tsibble: 1 x 8 [!]
## # Key: Symbol [1]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2018-07-25 216. 219. 214. 218. 218. 58954200
tute1 <- readr::read_csv("Homework #1/tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): Sales, AdBudget, GDP
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
mytimeseries <- tute1 %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(index=Quarter)
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x=Quarter,y = value, color = name)) +
geom_line() +
facet_grid(name ~ ., scale = "free_y")
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x=Quarter,y = value, color = name)) +
geom_line()
When you don’t include facet_grid() in the last chart, each of the
series are included in the same chart, and identified using a color and
legend.
The USgas package contains data on the demand for natural gas in the US.
us_total %>%
as_tsibble(index=year,
key=state)
## # A tsibble: 1,266 x 3 [1Y]
## # Key: state [53]
## year state y
## <int> <chr> <int>
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
## 7 2003 Alabama 350345
## 8 2004 Alabama 382367
## 9 2005 Alabama 353156
## 10 2006 Alabama 391093
## # ℹ 1,256 more rows
us_total %>%
filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts",
"Connecticut", "Rhode Island")) %>%
rename("value" = "y") %>%
ggplot(aes(x = year, y = value, color=state)) +
geom_line() +
facet_grid(state ~ ., scales="free_y")
us_total %>%
filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts",
"Connecticut", "Rhode Island")) %>%
rename("value" = "y") %>%
ggplot(aes(x = year, y = value, color=state)) +
geom_line()
tourism_data <- readxl::read_excel("Homework #1/tourism.xlsx")
tourism_data <- tourism_data %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(index=Quarter,
key = c("Region", "State", "Purpose"))
as_tibble(tourism_data) %>%
group_by(Region, Purpose) %>%
summarize(avg_trips = mean(Trips)) %>%
arrange(desc(avg_trips), by_group=TRUE)
## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.
## # A tibble: 304 × 3
## # Groups: Region [76]
## Region Purpose avg_trips
## <chr> <chr> <dbl>
## 1 Sydney Visiting 747.
## 2 Melbourne Visiting 619.
## 3 Sydney Business 602.
## 4 North Coast NSW Holiday 588.
## 5 Sydney Holiday 550.
## 6 Gold Coast Holiday 528.
## 7 Melbourne Holiday 507.
## 8 South Coast Holiday 495.
## 9 Brisbane Visiting 493.
## 10 Melbourne Business 478.
## # ℹ 294 more rows
The Combination of Sydney and Visiting had the maximum number of overnight trips on average.
as_tibble(tourism_data) %>%
group_by(Quarter, State) %>%
summarize(`Total Trips` = sum(Trips)) %>%
tsibble(index=Quarter,
key = State)
## `summarise()` has grouped output by 'Quarter'. You can override using the
## `.groups` argument.
## # A tsibble: 640 x 3 [1Q]
## # Key: State [8]
## Quarter State `Total Trips`
## <qtr> <chr> <dbl>
## 1 1998 Q1 ACT 551.
## 2 1998 Q2 ACT 416.
## 3 1998 Q3 ACT 436.
## 4 1998 Q4 ACT 450.
## 5 1999 Q1 ACT 379.
## 6 1999 Q2 ACT 558.
## 7 1999 Q3 ACT 449.
## 8 1999 Q4 ACT 595.
## 9 2000 Q1 ACT 600.
## 10 2000 Q2 ACT 557.
## # ℹ 630 more rows
Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.
Can you spot any seasonality, cyclicity and trend?
What do you learn about the series?
What can you say about the seasonal patterns?
Can you identify any unusual years?
US Employment
us_employment %>%
filter(Title == "Total Private") %>%
autoplot(Employed) +
labs(title = "Time Series of US Employment",
subtitle = "Total Private")
us_employment %>%
filter(Title == "Total Private") %>%
gg_season(Employed)
us_employment %>%
filter(Title == "Total Private") %>%
gg_subseries(Employed)
us_employment %>%
filter(Title == "Total Private") %>%
gg_lag(Employed)
us_employment %>%
filter(Title == "Total Private") %>%
ACF(Employed) %>%
autoplot()
Looking at the plots of the US Employment data, we see that there has
been an overall positive trends during the period covered by the time
series. There isn’t any indication in the data that the data is
seasonal, as the data appears to have the same characteristics across
each of the months. When looking at both the lag and autocorrelation
plots, it seems as though the overall data is heavily driven by the
overall macro trend - with each data point being highly correlated to
the prior/previous data points in the time series.
When looking at the longer time series, you can see there are a number of years in which there’s a pullback in the data. However, these peaks and troughs appear to occur at such irregular intervals that it becomes hard to define them as cyclical periods and/or to define the duration of such cycles if they do exist.
aus_production %>%
autoplot(Bricks) +
labs(title = "Time Series of Australian Brick Production")
## Warning: Removed 20 rows containing missing values (`geom_line()`).
aus_production %>%
gg_season(Bricks)
## Warning: Removed 20 rows containing missing values (`geom_line()`).
aus_production %>%
gg_subseries(Bricks)
## Warning: Removed 5 rows containing missing values (`geom_line()`).
aus_production %>%
gg_lag(Bricks)
## Warning: Removed 20 rows containing missing values (gg_lag).
aus_production %>%
ACF(Bricks) %>%
autoplot()
The data has exhibited a strong positive trend up until the peak in
1980, and has sense been operating at a monthly negative trend. For the
most part the data has demonstrated seasonal behavior, with the
production seeing it’s peack in Q3 and it’s nadir in Q1. Additionally
the data seems to suggest the presence of cyclicality.
pelt %>%
autoplot(Hare) +
labs(title = "Time Series of Snowshoe Hare Trading by Hudson Bay Company",
subtitle = "1845 - 1935")
#pelt %>% gg_season(Hare, period="year")
pelt %>%
gg_subseries(Hare)
pelt %>%
gg_lag(Hare)
pelt %>%
ACF(Hare) %>%
autoplot()
When looking at the Snowshoe Hare data, we were unable to view
seasonality trends given that the data was not broken down into
intra-year seasonal periods (e.g. days, weeks, months, quarter, etc.).
However, while there is no evidence of an overall trends, the data is
definitely cyclcal. When looking at the ACF plot, we see that the
cyclicality appears to take place over a 10 year period - with the data
seeing a peak everry 10 years, and the data is also seeing troughts
5-years later, that are occurring every 10-years as well.
PBS %>%
filter(ATC2 == "H02") %>%
autoplot(Cost) +
labs(title = "Time Series of Australian Monthly Medicare Prescription Cost",
subtitle = "Limited to H02 Anatomical Therapeutic Chemical Index")
PBS %>%
filter(ATC2 == "H02") %>%
gg_season(Cost)
PBS %>%
filter(ATC2 == "H02") %>%
gg_subseries(Cost)
#PBS %>%
# filter(ATC2 == "H02") %>%
# gg_lag(Cost)
PBS %>%
filter(ATC2 == "H02") %>%
ACF(Cost) %>%
autoplot()
us_gasoline %>%
autoplot(Barrels) +
labs(title = "TIme Series of US Finished Motor Gasoline Supplied")
us_gasoline %>%
gg_season(Barrels)
us_gasoline %>%
gg_subseries(Barrels)
us_gasoline %>%
gg_lag(Barrels)
us_gasoline %>%
ACF(Barrels) %>%
autoplot()
The general trend was positive up until the early 2000s, before declining and then rising again. However, it appears that we are at the beginnings of a decline.