library(fpp3)
library(stringr)
Please submit exercises 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8 from the Hyndman online Forecasting book. Please submit both your Rpubs link as well as attach the .pdf file with your code.
Explore the following four time series: Bricks
from
aus_production
, Lynx
from pelt
,
Close
from gafa_stock
, Demand
from vic_elec
.
?
(or help()
) to find out about the
data in each series.help(aus_production)
class(aus_production)
help(pelt)
class(pelt)
help(gafa_stock)
class(gafa_stock)
help(vic_elec)
class(vic_elec)
Time Series | Interval |
---|---|
aus_production |
Quarterly |
pelt |
Annual |
gafa_stock |
Daily |
vic_elec |
1/2 hour |
I used the following to help find the interval of each series.
index(aus_production)
index(pelt)
index(gafa_stock)
head(gafa_stock)
index(vic_elec)
head(vic_elec)
autoplot()
to produce a time plot of each
series.aus_production |> autoplot(Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
pelt |> autoplot(Lynx)
gafa_stock |> autoplot(Close)
vic_elec |> autoplot(Demand)
vic_elec |>
autoplot(Demand) +
labs(
title = "Electricity Demand in Australia",
x = "Time (1/2 Hour Intervals)",
y = "Demand"
)
Use filter()
to find what days corresponded to the peak
closing price for each of the four stocks in
gafa_stock
.
gafa_stock |>
group_by(Symbol) |>
filter(Close == max(Close)) |>
select(Symbol, Date, Close)
## # A tsibble: 4 x 3 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Close
## <chr> <date> <dbl>
## 1 AAPL 2018-10-03 232.
## 2 AMZN 2018-09-04 2040.
## 3 FB 2018-07-25 218.
## 4 GOOG 2018-07-26 1268.
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
tute1 <- readr::read_csv("data/tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): Sales, AdBudget, GDP
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(tute1)
## # A tibble: 6 × 4
## Quarter Sales AdBudget GDP
## <date> <dbl> <dbl> <dbl>
## 1 1981-03-01 1020. 659. 252.
## 2 1981-06-01 889. 589 291.
## 3 1981-09-01 795 512. 291.
## 4 1981-12-01 1004. 614. 292.
## 5 1982-03-01 1058. 647. 279.
## 6 1982-06-01 944. 602 254
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
Check what happens when you don’t include
facet_grid()
.
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
“AdBudget”, “GDP” and “Sales” are not facet’ed off into their own tiles. Rather, they are presented on a unified grid. Actually, this view is really nice for side by side comparison, though each appears to be in league of its own.
USgas
package.install.packages("USgas")
library(USgas)
head(us_total)
## year state y
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
us_total
with year as the index
and state as the key.ts_us_total <- us_total |>
as_tsibble(index = year, key = c(state))
ts_us_total |>
filter(state %in% c(
"Maine",
"Vermont",
"New Hampshire",
"Massachusetts",
"Connecticut",
"Rhode Island"
)) |>
ggplot(mapping = aes(x = year, y = y, color = state)) +
geom_line() +
geom_point() +
labs(
title = "Gas Consumption by State",
subtitle = "New England",
x = "Year",
y = "Volume (Million Cubic Feet)"
)
tourism <- readxl::read_excel("./data/tourism.xlsx")
tourism |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter, key = c(State, Region, Purpose))
## # A tsibble: 24,320 x 5 [1Q]
## # Key: State, Region, Purpose [304]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Canberra ACT Business 150.
## 2 1998 Q2 Canberra ACT Business 99.9
## 3 1998 Q3 Canberra ACT Business 130.
## 4 1998 Q4 Canberra ACT Business 102.
## 5 1999 Q1 Canberra ACT Business 95.5
## 6 1999 Q2 Canberra ACT Business 229.
## 7 1999 Q3 Canberra ACT Business 109.
## 8 1999 Q4 Canberra ACT Business 159.
## 9 2000 Q1 Canberra ACT Business 105.
## 10 2000 Q2 Canberra ACT Business 202.
## # ℹ 24,310 more rows
tourism |>
group_by(Region, Purpose) |>
summarize(
TripAverage = mean(Trips),
.groups = "keep"
) |>
arrange(desc(TripAverage)) |>
head(n = 1)
## # A tibble: 1 × 3
## # Groups: Region, Purpose [1]
## Region Purpose TripAverage
## <chr> <chr> <dbl>
## 1 Sydney Visiting 747.
tourism |>
group_by(State, Quarter) |>
summarize(
TripTotal = sum(Trips),
.groups = "keep"
) |>
mutate(
Quarter = yearquarter(Quarter)
) |>
as_tsibble(index = Quarter, key = c(State))
## # A tsibble: 640 x 3 [1Q]
## # Key: State [8]
## # Groups: State @ Quarter [640]
## State Quarter TripTotal
## <chr> <qtr> <dbl>
## 1 ACT 1998 Q1 551.
## 2 ACT 1998 Q2 416.
## 3 ACT 1998 Q3 436.
## 4 ACT 1998 Q4 450.
## 5 ACT 1999 Q1 379.
## 6 ACT 1999 Q2 558.
## 7 ACT 1999 Q3 449.
## 8 ACT 1999 Q4 595.
## 9 ACT 2000 Q1 600.
## 10 ACT 2000 Q2 557.
## # ℹ 630 more rows
Use the following graphics functions: autoplot()
,
gg_season()
, gg_subseries()
,
gg_lag()
, ACF()
and explore features from the
following time series: “Total Private” Employed
from
us_employment
, Bricks
from
aus_production
, Hare
from pelt
,
“H02” Cost
from PBS
, and Barrels
from us_gasoline
.
dump_ts <- function(ts, title) {
print(title)
cat("index: ")
print(index(ts))
cat("key: ")
print(key(ts))
print(head(ts))
}
autoplot_ts <- function(ts, variable, title) {
dump_ts(ts, title)
autoplot(ts, {{ variable }}) +
labs(
title = title,
x = "Time",
y = "Value"
)
ACF(ts, {{ variable }}) |>
autoplot() +
labs(title = str_c("ACF: ", title))
}
seasonal_plot <- function(ts, variable, title) {
gg_season(ts, {{ variable }}) +
labs(
title = str_c("Seasonal Plot: ", title),
x = "Time",
y = "Value"
)
}
Apply filtering to us_employment
and PBS
to
get the time series we want.
us_employment_tp <- us_employment |>
filter(Title == "Total Private")
PBS_H02 <- PBS |>
filter(ATC2 == "H02")
Let’s have a look at each one of these time series
autoplot_ts(us_employment_tp, Employed, "Total Private Employment in US")
## [1] "Total Private Employment in US"
## index: Month
## key: [[1]]
## Series_ID
##
## # A tsibble: 6 x 4 [1M]
## # Key: Series_ID [1]
## Month Series_ID Title Employed
## <mth> <chr> <chr> <dbl>
## 1 1939 Jan CEU0500000001 Total Private 25338
## 2 1939 Feb CEU0500000001 Total Private 25447
## 3 1939 Mar CEU0500000001 Total Private 25833
## 4 1939 Apr CEU0500000001 Total Private 25801
## 5 1939 May CEU0500000001 Total Private 26113
## 6 1939 Jun CEU0500000001 Total Private 26485
seasonal_plot(us_employment_tp, Employed, "Total Private Employment in US")
gg_subseries(us_employment_tp, Employed)
gg_lag(us_employment_tp, Employed)
autoplot_ts(aus_production, Bricks, "Brick Production in Australia")
## [1] "Brick Production in Australia"
## index: Quarter
## key: list()
## # A tsibble: 6 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1956 Q1 284 5225 189 465 3923 5
## 2 1956 Q2 213 5178 204 532 4436 6
## 3 1956 Q3 227 5297 208 561 4806 7
## 4 1956 Q4 308 5681 197 570 4418 6
## 5 1957 Q1 262 5577 187 529 4339 5
## 6 1957 Q2 228 5651 214 604 4811 7
seasonal_plot(aus_production, Bricks, "Brick Production in Australia")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_subseries(aus_production, Bricks)
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_lag(aus_production, Bricks)
## Warning: Removed 20 rows containing missing values (gg_lag).
autoplot_ts(pelt, Hare, "Hare Pelt Trading in Canada")
## [1] "Hare Pelt Trading in Canada"
## index: Year
## key: list()
## # A tsibble: 6 x 3 [1Y]
## Year Hare Lynx
## <dbl> <dbl> <dbl>
## 1 1845 19580 30090
## 2 1846 19600 45150
## 3 1847 19610 49150
## 4 1848 11990 39520
## 5 1849 28040 21230
## 6 1850 58000 8420
# Error: The data must contain at least one observation per seasonal period.
gg_subseries(pelt, Hare)
gg_lag(pelt, Hare)
autoplot_ts(PBS_H02, Cost, 'Medicare "H02" Cost in Australia')
## [1] "Medicare \"H02\" Cost in Australia"
## index: Month
## key: [[1]]
## Concession
##
## [[2]]
## Type
##
## [[3]]
## ATC1
##
## [[4]]
## ATC2
##
## # A tsibble: 6 x 9 [1M]
## # Key: Concession, Type, ATC1, ATC2 [1]
## Month Concession Type ATC1 ATC1_desc ATC2 ATC2_desc Scripts Cost
## <mth> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1991 Jul Concessional Co-payme… H Systemic… H02 CORTICOS… 63261 317384
## 2 1991 Aug Concessional Co-payme… H Systemic… H02 CORTICOS… 53528 269891
## 3 1991 Sep Concessional Co-payme… H Systemic… H02 CORTICOS… 52822 269703
## 4 1991 Oct Concessional Co-payme… H Systemic… H02 CORTICOS… 54016 280418
## 5 1991 Nov Concessional Co-payme… H Systemic… H02 CORTICOS… 49281 268070
## 6 1991 Dec Concessional Co-payme… H Systemic… H02 CORTICOS… 51798 277139
seasonal_plot(PBS_H02, Cost, 'Medicare "H02" Cost in Australia')
gg_subseries(PBS_H02, Cost)
# gg_lag(PBS_H02, Cost)
autoplot_ts(us_gasoline, Barrels, "Gasoline Production in US")
## [1] "Gasoline Production in US"
## index: Week
## key: list()
## # A tsibble: 6 x 2 [1W]
## Week Barrels
## <week> <dbl>
## 1 1991 W06 6.62
## 2 1991 W07 6.43
## 3 1991 W08 6.58
## 4 1991 W09 7.22
## 5 1991 W10 6.88
## 6 1991 W11 6.95
# Let's zoom in and see if we can make sense of the seasonal/cyclic motion
us_gasoline |>
filter(
Week >= yearweek(as.Date("1992-01-01")) &
Week <= yearweek(as.Date("1993-01-01"))
) |>
autoplot(Barrels) +
labs(
title = "Gasoline Production in US (1992 - 1993)",
x = "Time",
y = "Barrels"
)
seasonal_plot(us_gasoline, Barrels, "Gasoline Production in US")
gg_subseries(us_gasoline, Barrels)
gg_lag(us_gasoline, Barrels)
Can you spot any seasonality, cyclicity and trend?
Time Series | Seasonality | Cyclicity | Trend |
---|---|---|---|
us_employment_tp$Employed |
Yearly | No | Steady increase |
aus_production$Bricks |
Yearly | It looks like there is a long term cycle of ~5 years. It’s more evident after ~1978. | Ascending to 1980, then descending |
pelt$Hare |
No | Yes, it approximates 10 years | No |
PBS_H02$Cost |
All but “General/Co-payments” have a clear annual period | No | Ascending |
us_gasoline$Barrels |
Yes, yearly | I can’t tell whether it’s noise or a cycle. It’s erratic, but always present. It cycles ~10/14 days | Steady increase between (1990, 2007) and then it falls and rises again between (2007, 2017) |
What do you learn about the series?
You learn the trends and tendencies of the past. I imagine this is the basis for predictions about the future. It’s going to be interesting to see it all come together. Specifically, we learn:
What can you say about the seasonal patterns?
us_employment_tp$Employed: The seasonal pattern is subtle
and a little elusive. It’s clear to see in the autoplot()
but difficult to discern in the gg_season()
plot. What can
be said about this pattern? I could imagine a few reasons for this
pattern:
aus_production$Bricks: The seasonal pattern is clear and consistent. It looks like the brick industry in Australia peaks in the 3rd quarter. I can only come up with one reason for this pattern, and that is the weather is nice in the 3rd quarter, so it’s a good time to build.
pelt$Hare: There is no seasonal pattern. The number of hares traded each year is erratic.
PBS_H02$Cost: The seasonal pattern is clear and consistent,
though its pattern varies depending on the Type
. Speaking
of which, I was having trouble understanding how autoplot
knew about the Type
, but I see that it is part of the
key
of PBS_H02
. Breaking it down by
Type
:
I wonder if the inverted patterns have to do with balancing budgets.
Can you identify any unusual years?
Yes: