library(fpp3)
library(tsibble)
library(dplyr)
Explore the following four time series: Bricks from
aus_production, Lynx from pelt,
Close from gafa_stock, Demand
from vic_elec.
? (or help()) to find out about the
data in each series.help("aus_production")
## starting httpd help server ... done
glimpse(aus_production)
## Rows: 218
## Columns: 7
## $ Quarter <qtr> 1956 Q1, 1956 Q2, 1956 Q3, 1956 Q4, 1957 Q1, 1957 Q2, 1957…
## $ Beer <dbl> 284, 213, 227, 308, 262, 228, 236, 320, 272, 233, 237, 313…
## $ Tobacco <dbl> 5225, 5178, 5297, 5681, 5577, 5651, 5317, 6152, 5758, 5641…
## $ Bricks <dbl> 189, 204, 208, 197, 187, 214, 227, 222, 199, 229, 249, 234…
## $ Cement <dbl> 465, 532, 561, 570, 529, 604, 603, 582, 554, 620, 646, 637…
## $ Electricity <dbl> 3923, 4436, 4806, 4418, 4339, 4811, 5259, 4735, 4608, 5196…
## $ Gas <dbl> 5, 6, 7, 6, 5, 7, 7, 6, 5, 7, 8, 6, 5, 7, 8, 6, 6, 8, 8, 7…
help("pelt")
glimpse(pelt)
## Rows: 91
## Columns: 3
## $ Year <dbl> 1845, 1846, 1847, 1848, 1849, 1850, 1851, 1852, 1853, 1854, 1855,…
## $ Hare <dbl> 19580, 19600, 19610, 11990, 28040, 58000, 74600, 75090, 88480, 61…
## $ Lynx <dbl> 30090, 45150, 49150, 39520, 21230, 8420, 5560, 5080, 10170, 19600…
help("gafa_stock")
glimpse(gafa_stock)
## Rows: 5,032
## Columns: 8
## Key: Symbol [4]
## $ Symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAP…
## $ Date <date> 2014-01-02, 2014-01-03, 2014-01-06, 2014-01-07, 2014-01-08,…
## $ Open <dbl> 79.38286, 78.98000, 76.77857, 77.76000, 76.97285, 78.11429, …
## $ High <dbl> 79.57571, 79.10000, 78.11429, 77.99429, 77.93714, 78.12286, …
## $ Low <dbl> 78.86000, 77.20428, 76.22857, 76.84571, 76.95571, 76.47857, …
## $ Close <dbl> 79.01857, 77.28286, 77.70428, 77.14857, 77.63715, 76.64571, …
## $ Adj_Close <dbl> 66.96433, 65.49342, 65.85053, 65.37959, 65.79363, 64.95345, …
## $ Volume <dbl> 58671200, 98116900, 103152700, 79302300, 64632400, 69787200,…
help("vic_elec")
glimpse(vic_elec)
## Rows: 52,608
## Columns: 5
## $ Time <dttm> 2012-01-01 00:00:00, 2012-01-01 00:30:00, 2012-01-01 01:0…
## $ Demand <dbl> 4382.825, 4263.366, 4048.966, 3877.563, 4036.230, 3865.597…
## $ Temperature <dbl> 21.40, 21.05, 20.70, 20.55, 20.40, 20.25, 20.10, 19.60, 19…
## $ Date <date> 2012-01-01, 2012-01-01, 2012-01-01, 2012-01-01, 2012-01-0…
## $ Holiday <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
vic_elec<-vic_elec
What is the time interval of each series?
The Aus_Production is recorded on a quartly time interval
Pelt is recorded on a yarly interval from 1845 to 1935
Gafa_stock is recoreded on a daily interval from 2014 to 2018
Victoria Electric is recorded on a half hourly basis from 1/1/2012 0:00:00 to 31/12/2014 23:30:00
Use autoplot() to produce a time plot of each
series.
aus_production %>% autoplot(Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
pelt %>% autoplot(Lynx)
gafa_stock %>% autoplot(Close)
vic_elec %>% autoplot(Demand) +
labs(title = "Electricity Demand in Victoria",
x = "Time",
y = "Demand (MW)")
Use filter() to find what days corresponded to the
peak closing price for each of the four stocks in
gafa_stock.
gafa_stock %>%
group_by(Symbol) %>%
filter(Close == max(Close)) %>%
select(Symbol, Date, Close)
## # A tsibble: 4 x 3 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Close
## <chr> <date> <dbl>
## 1 AAPL 2018-10-03 232.
## 2 AMZN 2018-09-04 2040.
## 3 FB 2018-07-25 218.
## 4 GOOG 2018-07-26 1268.Download the file tute1.csv from the book website, open it in Excel
(or some other spreadsheet application), and review its contents. You
should find four columns of information. Columns B through D each
contain a quarterly series, labelled Sales, AdBudget and GDP. Sales
contains the quarterly sales for a small company over the period
1981-2005. AdBudget is the advertising budget and GDP is the gross
domestic product. All series have been adjusted for inflation.
These were partially converted to my own code vs the book as for some reason I could not get the data to run without converting it.
tutel1 <- read.csv("C:/Users/rbron/Downloads/tute1 (1).csv")
b. Convert the data to time series
tutel1_ts <- tutel1 %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(index = Quarter)
tutel1_ts |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
The USgas package contains data on the demand for
natural gas in the US.
USgas package.#install.packages("USgas")
us_total with year as the index
and state as the key.library(USgas)
us_tsibble <- us_total %>%
as_tsibble(index = year, key = state)
library(ggplot2)
# Define the New England states
new_england_states <- c("Maine", "Vermont", "New Hampshire", "Massachusetts",
"Connecticut", "Rhode Island")
# Filter the tsibble for only those states
new_england_gas <- us_tsibble %>%
filter(state %in% new_england_states)
# Plot
ggplot(new_england_gas, aes(x = year, y = y, color = state)) +
geom_line() +
labs(
title = "Annual Natural Gas Consumption in New England States",
x = "Year",
y = "Natural Gas Consumption (Million Cubic Feet)",
color = "State"
) +
theme_minimal()
tourism.xlsx from the book website and read it into R
using readxl::read_excel().library(readxl)
tourism <- read_excel("C:/Users/rbron/Downloads/tourism.xlsx")
tourism
tsibble from the tsibble package.tourism_ts <- tourism %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(
key = c(Region, State, Purpose),
index = Quarter
)
Region and
Purpose had the maximum number of overnight trips on
average.tourism %>%
group_by(Region, Purpose) %>%
summarise(
avg_trips = mean(Trips, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(avg_trips)) %>%
slice(1)
## # A tibble: 1 × 3
## Region Purpose avg_trips
## <chr> <chr> <dbl>
## 1 Sydney Visiting 747.
d. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
tourism_state <- tourism %>%
mutate(Quarter = yearquarter(Quarter)) %>%
group_by(State, Quarter) %>%
summarise(
Trips = sum(Trips, na.rm = TRUE),
.groups = "drop"
) %>%
as_tsibble(
key = State,
index = Quarter
)
The aus_arrivals data set comprises quarterly
international arrivals to Australia from Japan, New Zealand, UK and the
US.
Use `autoplot()`, `gg_season()` and `gg_subseries()` to compare the differences between the arrivals from these four countries.
library(tsibble)
library(feasts)
data(aus_arrivals)
Table 1: Overall Comparison
autoplot(aus_arrivals, Arrivals) +
facet_wrap(~ Origin, scales = "free_y")
Figure Overall: New Zealand has the largest arrivals, The US and The UK have growth over time, Japan shows a decline in arrivals after the 1990’s.
Table 2: Seasonality
gg_season(aus_arrivals, Arrivals) +
facet_wrap(~ Origin)
## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Interpretation Seasonality: There is strong seasonality for all season, with the first quarter or Summer time being the highest amount of arrivals, Japan has a weakening trend overtime.
Table 3: Subset Comparison
gg_subseries(aus_arrivals, Arrivals) +
facet_wrap(~ Origin)
## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Interpretation: Quarter 1 has the strongest overall amount of arrivals, US and UK show a steady increase in arrivals, with Japan showing declining means in all quarters.
There were 3 unusual observations. The first was that there was a drop in arrivals during 2008 to 2009 due to the global financial crisis. The next unusual observation was that there was a long-term pattern for a drop in Japanese arrivals. The last observation was there were sometimes extreme 1st Quarter peaks, especially for New Zealand.
Monthly Australian retail data is provided in
aus_retail. Select one of the time series as follows (but
choose your own seed value):
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
Explore your chosen retail time series using the following functions:
autoplot(myseries, Turnover)
Autoplot Interpretation: There is higher turnover as the years go on.
gg_season(myseries, Turnover)
Seasonality Interpretation: There is increasing turnover as the year goes on with a larger spike in the summer around July/August, and very large spikes in November/December.
gg_subseries(myseries, Turnover)
Interpretation Subseries: This follows the pattern of the seasonal plot with the highest turnovers being in the small spike in July/August, and the largest Spike being in December.
gg_lag(myseries, Turnover, lags = 1:4)
## Warning: `gg_lag()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_lag()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Lag Interpretation: The lag plots show strong positive autocorrelation, with turnover in one period closely related to turnover in previous periods, especially at lag 1. The colored monthly patterns indicate strong seasonality, with higher turnover toward the end of the year and lower values early in the year, and no obvious unusual observations.
myseries %>%
ACF(Turnover) %>%
autoplot()
Lag Interpreation: The ACF shows very strong positive autocorrelation at short lags, indicating high persistence in turnover over time. There are clear seasonal spikes at lag 12 (and multiples), confirming strong annual seasonality, with no evidence of unusual or irregular behaviour.
autoplot(), gg_season(),
gg_subseries(), gg_lag(),
ACF() |> autoplot()
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
Use the following graphics functions: autoplot(),
gg_season(), gg_subseries(),
gg_lag(), ACF() and explore features from the
following time series: “Total Private” Employed from
us_employment, Bricks from
aus_production, Hare from pelt,
“H02” Cost from PBS, and Barrels
from us_gasoline.
us_employment %>%
filter(Title == "Total Private") %>%
autoplot(Employed)
us_employment %>%
filter(Title == "Total Private") %>%
gg_season(Employed)
us_employment %>%
filter(Title == "Total Private") %>%
gg_subseries(Employed)
us_employment %>%
filter(Title == "Total Private") %>%
gg_lag(Employed)
us_employment %>%
filter(Title == "Total Private") %>%
ACF(Employed) %>%
autoplot()
aus_production %>%
autoplot(Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production %>%
gg_season(Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production %>%
gg_subseries(Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production %>%
gg_lag(Bricks)
## Warning: Removed 20 rows containing missing values (gg_lag).
aus_production %>%
ACF(Bricks) %>%
autoplot()
pelt %>%
autoplot(Hare)
pelt %>%
gg_lag(Hare)
pelt %>%
ACF(Hare) %>%
autoplot()
PBS %>%
filter(ATC2 == "H02", Type == "Safety net", Concession == "General") %>%
autoplot(Cost)
PBS %>%
filter(ATC2 == "H02", Type == "Safety net", Concession == "General") %>%
gg_season(Cost)
PBS %>%
filter(ATC2 == "H02", Type == "Safety net", Concession == "General") %>%
gg_subseries(Cost)
PBS %>%
filter(ATC2 == "H02", Type == "Safety net", Concession == "General") %>%
gg_lag(Cost)
PBS %>%
filter(ATC2 == "H02", Type == "Safety net", Concession == "General") %>%
ACF(Cost) %>%
autoplot()
us_gasoline %>%
autoplot(Barrels)
us_gasoline %>%
gg_season(Barrels)
us_gasoline %>%
gg_subseries(Barrels)
us_gasoline %>%
gg_lag(Barrels)
us_gasoline %>%
ACF(Barrels) %>%
autoplot()
Can you spot any seasonality, cyclicity and trend?
What do you learn about the series?
What can you say about the seasonal patterns?
Can you identify any unusual years?
The following time plots and ACF plots correspond to four different time series. Your task is to match each time plot in the first row with one of the ACF plots in the second row.
The aus_livestock data contains the monthly total
number of pigs slaughtered in Victoria, Australia, from Jul 1972 to Dec
2018. Use filter() to extract pig slaughters in Victoria
between 1990 and 1995. Use autoplot() and
ACF() for this data. How do they differ from white noise?
If a longer period of data is used, what difference does it make to the
ACF?
Use the following code to compute the daily changes in Google closing stock prices.
dgoog <- gafa_stock |> filter(Symbol == "GOOG", year(Date) >= 2018) |> mutate(trading_day = row_number()) |> update_tsibble(index = trading_day, regular = TRUE) |> mutate(diff = difference(Close))Why was it necessary to re-index the tsibble?
Plot these differences and their ACF.
Do the changes in the stock prices look like white noise?