library(fpp3)
library(tidyverse)
2.1 Explore the following four time series: Bricks from
aus_production, Lynx from pelt,
Close from gafa_stock, Demand
from vic_elec.
? (or help()) to find out about the
data in each series.autoplot() to produce a time plot of each
series.?aus_production
?pelt
?gafa_stock
?vic_elec
The time interval for each of the time series is described below:
aus_production: quarterlypelt: annuallygafa_stock: daily. However, certain days, such as
weekends and holidays, are excluded.vic_elec: half-hourlyaus_production |> autoplot(Bricks) + theme_bw()
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
pelt |> autoplot(Lynx) + theme_bw()
gafa_stock |> autoplot(Close) + theme_bw()
vic_elec |>
autoplot(Demand) +
labs(y = "Demand (MWh)",
title = "Half-Hourly Electricitiy Demand for Victoria, Australia from 2012-2015") +
theme_bw()
2.2 Use filter() to find what days corresponded to the
peak closing price for each of the four stocks in
gafa_stock.
gafa_stock |>
group_by(Symbol) |>
filter(Close == max(Close))
Based on the above filtered tsibble, the maximum closing prices for each of the stocks are listed below:
2.3 Download the file tute1.csv from the book website,
open it in Excel (or some other spreadsheet application), and review its
contents. You should find four columns of information. Columns B through
D each contain a quarterly series, labelled Sales, AdBudget and GDP.
Sales contains the quarterly sales for a small company over the period
1981-2005. AdBudget is the advertising budget and GDP is the gross
domestic product. All series have been adjusted for inflation.
tute1 <- readr::read_csv("tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): Sales, AdBudget, GDP
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(tute1)
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
mytimeseries
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
Check what happens when you don’t include
facet_grid().
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
When facet_grid() is not included, all three
series are plotted on the same plot, whereas facet_grid()
separated them into three separate plots with their own set of
axes.
2.4 The USgas package contains data on the demand for
natural gas in the US.
us_total with year as the index
and state as the key.library(USgas)
us_total <- us_total |>
as_tsibble(key = state, index = year)
us_total |> head(5)
us_total |>
filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts",
"Connecticut", "Rhode Island")) |>
autoplot() +
labs(x = "Year", y = "Consumption (MMCF)",
title = "Annual Total Natural Gas Consumption for New England Area\n in Millions of Cubic Feet") +
theme_bw()
## Plot variable not specified, automatically selected `.vars = y`
2.5
tourism.xlsx from the book website and read it
into R using readxl::read_excel().tourism
tsibble from the tsibble package.Region and
Purpose had the maximum number of overnight trips on
average.tourism_book <- readxl::read_excel("tourism.xlsx")
tourism_book <- tourism_book |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(key = c(Region, State, Purpose), index = Quarter)
tourism_book |>
arrange(desc(Trips)) |>
head(20)
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `State`, `Purpose`, `Quarter` first.
Based on the tsibble above, 16 of the top 20 quarters in
terms of overnight trips occurred in the Sydney region. As
a result, the below plot was generated with just the Sydney
region.
tourism_book |>
filter(Region == "Sydney") |>
select(-Region, -State) |>
autoplot() +
labs(x = "Quarter", y = "Trips ('000s)",
title = "Quarterly Overnight Trips for Sydney Region Grouped by Business Purpose") +
theme_bw()
## Plot variable not specified, automatically selected `.vars = Trips`
Based on the above plot, on average, the Sydney
region with the Visiting business purpose has the most
overnight trips.
tourism_book |>
group_by(State) |>
summarize(state_trips = sum(Trips))
Above is the new tsibble that combines the Purposes and Regions, and has just the total trips by State.
2.8 Use the following graphics functions: autoplot(),
gg_season(), gg_subseries(),
gg_lag(), ACF() and explore features from the
following time series: “Total Private” Employed from
us_employment, Bricks from
aus_production, Hare from pelt,
“H02” Cost from PBS, and Barrels
from us_gasoline.
us_employment |>
summarize(total_employed = sum(Employed, na.rm = TRUE)) |>
autoplot()
## Plot variable not specified, automatically selected `.vars = total_employed`
us_employment |>
summarize(total_employed = sum(Employed, na.rm = TRUE)) |>
gg_season(total_employed, labels = "both") +
theme_bw()
## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
us_employment |>
filter(Month >= yearmonth("1990 Jan")) |>
summarize(total_employed = sum(Employed, na.rm = TRUE)) |>
gg_season(total_employed) +
theme_bw()
us_employment |>
summarize(total_employed = sum(Employed, na.rm = TRUE)) |>
gg_subseries(total_employed) +
facet_wrap(~month(Month, label = TRUE), nrow = 3) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90))
## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
us_employment |>
summarize(total_employed = sum(Employed, na.rm = TRUE)) |>
gg_lag(total_employed, geom = "point", lags = 1:12) +
theme_bw()
## Warning: `gg_lag()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_lag()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
us_employment |>
summarize(total_employed = sum(Employed, na.rm = TRUE)) |>
ACF(total_employed, lag_max = 48) |>
autoplot() +
theme_bw()
For the above plots, the total_employed variable
created is the sum of all values from the Employed column,
regardless of the Series ID. Looking at the above plots, there appears
to be a positive trend going back to 1939.
Looking at the seasonal plot with all years worth of data included, it’s difficult to see whether there is any seasonality or not. As a result, a second seasonal plot with data since 1990 was included. Based on this new seasonal plot, there appears to be a positive trend from January-June, followed by a dip in employment in July. The employment rates then stay relatively level until September, where the employment then increases through December for most years. Finally, for most years, the total employment decreases from December of one year to January of the next year. This all suggests that the total US Employment has some seasonality, with local peaks in June and December.
Looking at the autoplot(), there appears to be
some cyclical behavior. For example, the total employment from 1992 to
2001 increases over time, but then begins to decrease in 2001 until
about 2003. The total employment begins increasing again until about
2008, where it begins to decrease again until about 2010.
Finally, looking at the autoplot() and the
seasonal plot, there appears to be a large jump in total
employment between 1989 and 1990. This is very unusual compared to all
other years, where the total employment typically decreases when going
from one year to the next.
aus_production |>
autoplot(Bricks) +
theme_bw()
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production |>
gg_season(Bricks, labels = "both") +
theme_bw()
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_text()`).
aus_production |>
gg_subseries(Bricks) +
theme_bw()
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production |>
gg_lag(Bricks, geom = "point") +
theme_bw()
## Warning: Removed 20 rows containing missing values (gg_lag).
aus_production |>
ACF(Bricks) |>
autoplot() +
theme_bw()
Looking at the
autoplot(), there was a positive
trend from about 1955 until about 1980. Since 1980, however, there has
been a negative trend. Additionally, the autoplot() does
appear to show some cyclicity. For example, starting in about 1991, the
brick production generally increases until about 1995, and then
decreases over the next 3 quarters. The brick production then generally
increases until about 2000, where another decrease in brick production
occurs over the next quarter.
Looking at the ACF plot, the correlations appear
to have local peaks at every 4 time intervals (such as lag = 4, lag = 8,
etc.). Additionally, the correlations also appear to have a local trough
at every 4 time intervals, starting at lag = 2. This suggests there
being seasonality on an annual basis. More specifically, when looking at
the seasonal plot, the production generally increases from Q1 to Q2
every year, and then decreases starting in either Q2 or Q3 and into
Q4.
Looking again at the autoplot(), there appear to
be 2 unusual years: 1975 and 1982. In these years, there was a decrease
in production, similar to other years as described above regarding
cyclicity, but these 2 years have a much sharper decrease than any of
the other years.
pelt |>
autoplot(Hare) +
theme_bw()
pelt |>
gg_lag(Hare, geom = "point") +
theme_bw()
pelt |>
ACF(Hare, lag_max = 30) |>
autoplot() +
theme_bw()
Looking at the autoplot() above, there does not
appear to be a positive or negative trend. The data does appear to be
cyclical. For example, around 1918, there is a local minimum in terms of
pelts traded, followed by an increase in pelts traded until about 1922,
where there is a local maximum. In 1922, the number of pelts traded
begin to decrease until about 1928. These patterns suggest that the
number of hare pelts traded are cyclical.
Additionally, when looking at the ACF plot,
there appears to be a strong negative correlation every 10 or so years,
starting at year 5. This then increases until about year 10, where there
is a strong positive correlation. This pattern continues, with the
strength of each correlation decreasing over time. This further enhances
that the data is cyclical.
Since the data is provided annually, it is not possible to
generate either a seasonal or subseries plot.
This also means that there is no seasonality within this
data.
PBS |>
summarize(total_cost = sum(Cost, na.rm = TRUE)) |>
autoplot(total_cost) +
theme_bw()
PBS |>
summarize(total_cost = sum(Cost, na.rm = TRUE)) |>
gg_season(total_cost, labels = "both") +
theme_bw()
PBS |>
summarize(total_cost = sum(Cost, na.rm = TRUE)) |>
gg_subseries(total_cost) +
theme_bw() +
facet_wrap(~month(Month, label = TRUE), nrow = 3) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90))
PBS |>
summarize(total_cost = sum(Cost, na.rm = TRUE)) |>
gg_lag(total_cost, geom = "point", lags = 1:12) +
theme_bw()
PBS |>
summarize(total_cost = sum(Cost, na.rm = TRUE)) |>
ACF(total_cost, lag_max = 48) |>
autoplot() +
theme_bw()
For the above plots, the total_cost variable
created is the sum of all values from the Cost column,
regardless of the Concession, Type, ATC1, and ACT2. Looking at the above
plots, there appears to be a positive trend going back to about
1992.
Looking at the seasonal plots above, there is generally a
decrease in total_cost from January-February, followed by a
generally increasing total_cost through December. For many
of the years, there’s also a sharp increase from November-December.
Additionally, when looking at the ACF plot, every 12 time
intervals, there is a small spike in correlation compared to the
previous time interval. This suggests that there is annual seasonality,
with a seasonal decrease in total_cost between
January-February, followed by a general increase the rest of the
year.
Looking at the autoplot(), there is no signs of
an cyclicity.
Finally, looking at the seasonal plot, there
appears to be a few outliers. In 2008, the total_cost
decreased from February-March, the only such year where this occurred.
In the years 2001 and 2006, the total_cost decreased
between November-December, compared to other years where the
total_cost increased betweeen these 2 months.
us_gasoline |>
autoplot(Barrels) +
theme_bw()
us_gasoline |>
gg_season(Barrels, labels = "both") +
theme_bw()
us_gasoline |>
gg_subseries(Barrels, labels = "both") +
theme_bw() +
facet_wrap(~week(Week), nrow = 4) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90))
## Warning in geom_line(...): Ignoring unknown parameters: `labels`
us_gasoline |>
gg_lag(Barrels, geom = "point") +
theme_bw()
us_gasoline |>
ACF(Barrels, lag_max = 106) |>
autoplot() +
theme_bw()
Looking at the above plots, there appears to be a positive trend going back to about 1990.
Looking at the seasonal plot, in general, it
appears that the weekly gasoline supplied generally increases from
January-June, and then generally decreases from June-December.
Additionally, when looking at the ACF plot, the correlation
generally decreases as the time interval increases. However, at the 26th
week, there is a local minimum, at which point the correlation begins to
increase until a local maximum at around the 52nd week. This pattern
repeats for the next 52 weeks. These 2 plots suggest annual seasonality
where the weekly production increases until about June, and then
decreases until December.
Looking at the plots, there does not appear to be any obvious cyclicity. Additionally, there does not appear to be any clear unusual years in the plots.