Explore the following time series:
Use ? (or help() ) to find out about
the data in each series
What is the time interval of each series?
Use autoplot() to produce a time plot of each
series
For the last plot, modify the axis labels and title
Bricks from aus_production
datasetThe help file for the aus_production dataset shows that
Bricks is a time series with quarterly estimates of clay
brick production in Australia.
help("aus_production")
The time interval of the series is 1 calendar quarter (ie, 3 months).
interval(aus_production)
## <interval[1]>
## [1] 1Q
is_regular(aus_production)
## [1] TRUE
The Bricks time series is plotted below.
autoplot(drop_na(aus_production), Bricks) +
ylab('Bricks produced (millions)') +
ggtitle('Clay Brick Production in Australia') +
guides(x = guide_axis(minor.ticks = TRUE),
y = guide_axis(minor.ticks = TRUE)) +
theme_classic() +
theme(
axis.title = element_text(face = 'bold'),
plot.title = element_text(face = 'bold')
)
Lynx from pelt datasetThe help file for the pelt dataset shows that
Lynx is a time series with the number of Canadian Lynx
pelts traded per year by the Hudson Bay Company from 1845 to 1935.
help(pelt)
The time interval of the series is 1 year.
interval(pelt)
## <interval[1]>
## [1] 1Y
is_regular(pelt)
## [1] TRUE
The Lynx time series is plotted below.
autoplot(pelt, Lynx) +
ylab('Pelts traded') +
ggtitle('Canadian Lynx pelts traded, 1845-1935') +
guides(x = guide_axis(minor.ticks = TRUE),
y = guide_axis(minor.ticks = TRUE)) +
theme_classic() +
theme(
axis.title = element_text(face = 'bold'),
plot.title = element_text(face = 'bold')
)
Close from gafa_stock
datasetThe help file for the gafa_stock dataset shows that
Close is a time series with the closing stock price (US
dollar) for Google, Amazon, Facebook, and Apple on irregular trading
days between 2014 and 2018.
help("gafa_stock")
The time interval of the series is irregular (ie, there is no fixed or predictable interval).
interval(gafa_stock)
## <interval[1]>
## [1] !
is_regular(gafa_stock)
## [1] FALSE
The Close time series is plotted below. There is a
discrepancy in the time range for the series in the documentation for
the gafa_stock dataset (see above), which states that the
time range is from 2014 to 2018, and the plot, which shows data from
2014 to 2019.
autoplot(gafa_stock, Close) +
labs(
x = 'Year',
y = 'Closing price (USD)',
caption = 'Abbreviations: AAPL, Apple; AMZN, Amazon; FB, Facebook; GOOG, Google.'
) +
ggtitle('Closing stock prices on irregular trading days, 2014-2019') +
guides(x = guide_axis(minor.ticks = TRUE),
y = guide_axis(minor.ticks = TRUE)) +
theme_classic() +
theme(
axis.title = element_text(face = 'bold'),
plot.title = element_text(face = 'bold'),
plot.caption = element_text(color = "gray", hjust = 0)
)
Demand from vic_elec
datasetThe help file for the vic_elec dataset shows that
Demand is a time series with the total operational
electricity demand (megawatt-hours) in half-hour intervals for Victoria,
Australia. The information does not specify the time frame for the
data.
help("vic_elec")
The time interval of the series is 30 minutes.
interval(vic_elec)
## <interval[1]>
## [1] 30m
is_regular(vic_elec)
## [1] TRUE
The Demand time series is plotted below.
autoplot(vic_elec, Demand) +
labs(
x = 'Year [30 min]',
y = 'Demand (MWh)'
) +
ggtitle('Operational electricity demand in Victoria, Australia') +
guides(x = guide_axis(minor.ticks = TRUE),
y = guide_axis(minor.ticks = TRUE)) +
theme_classic() +
theme(
axis.title = element_text(face = 'bold'),
plot.title = element_text(face = 'bold')
)
Use filter() to find what days corresponded to the peak
closing price for each of the four stocks in
gafa_stock.
The closing price for Apple, Amazon, Facebook, and Google peaked on October 3, 2018; September 4, 2018; July 25, 2018; and July 26, 2018, respectively.
gafa_stock %>%
group_by(Symbol) %>% # group dataset by company
filter(Close == max(Close)) %>% # filter rows with max closing price
select(Symbol, Date, Close)
## # A tsibble: 4 x 3 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Close
## <chr> <date> <dbl>
## 1 AAPL 2018-10-03 232.
## 2 AMZN 2018-09-04 2040.
## 3 FB 2018-07-25 218.
## 4 GOOG 2018-07-26 1268.
tute1 <- read_csv('Data/tute1.csv', show_col_types = FALSE)
tute1
## # A tibble: 100 × 4
## Quarter Sales AdBudget GDP
## <date> <dbl> <dbl> <dbl>
## 1 1981-03-01 1020. 659. 252.
## 2 1981-06-01 889. 589 291.
## 3 1981-09-01 795 512. 291.
## 4 1981-12-01 1004. 614. 292.
## 5 1982-03-01 1058. 647. 279.
## 6 1982-06-01 944. 602 254
## 7 1982-09-01 778. 531. 296.
## 8 1982-12-01 932. 608. 272.
## 9 1983-03-01 996. 638. 260.
## 10 1983-06-01 908. 582. 280.
## # ℹ 90 more rows
tute1_ts <- tute1 %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(index = Quarter)
tute1_ts %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, color = name)) +
geom_line() +
facet_grid(name ~ ., scales = 'free_y')
Without facet_grid(), the plots are not faceted (ie,
each series is not separated into a separate plot). In addition, all
three plots share a single y-axis. In particular, this compresses the
GDP series (green line) and makes it harder to discern highs/lows (eg,
the peak at ~1991).
tute1_ts %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, color = name)) +
geom_line()
USgas package# Check whether USgas package is installed and, if not, install and load it
if (!require('USgas', character.only = TRUE)) {
install.packages('USgas')
library('USgas')
}
## Loading required package: USgas
us_total# Use year as the index and state as the key
us_total_ts <- USgas::us_total %>%
as_tibble(key = state, index = year)
First I examined the documentation for us_total to
understand the dataset and determine how to label the plot.
help(us_total)
The plots below show that states in New England had different trends in natural gas consumption between 1997 and 2019. However, comparison of consumption between states is difficult because consumption is not normalized by state population.
new_england <- c('Maine', 'Vermont', 'New Hampshire', 'Massachusetts', 'Connecticut', 'Rhode Island')
us_total_ts %>%
filter(state %in% new_england) %>%
ggplot(aes(x = year, y = y, color = state)) +
geom_line() +
facet_grid(state ~ ., scales = 'free_y') +
labs(
x = 'Year',
y = bquote(bold('Annual Consumption (' * ft^3 * ')')),
title = 'Natural Gas Consumption in New England, 1997-2019') +
scale_color_discrete('State') + # legend title
theme(
axis.title.x = element_text(face = 'bold'),
plot.title = element_text(face = 'bold')
)
a. Read tourism.xlsx into R using
readxl::read_excel()
tourism_df <- readxl::read_excel('Data/tourism.xlsx')
b. Create a tsibble which is identical to the tourism
tsibble from the tsibble package
# Create tsibble
tourism_ts <- tourism_df %>%
mutate(
Quarter = yearquarter(Quarter)
) %>%
as_tsibble(key = c(Region, State, Purpose), index = Quarter)
# Confirm that the manually created tsibble contains the same data as the
# one provided in the tsibble package
all.equal(tourism_ts, tsibble::tourism)
## [1] TRUE
c. Find what combination of Region and
Purpose had the maximum number of overnight trips on
average
The Sydney region with Visiting purpose had the most overnight trips on average.
tourism_ts %>%
group_by(Region, Purpose) %>%
mutate(
mean_trips = mean(Trips)
) %>%
ungroup() %>%
filter(mean_trips == max(mean_trips)) %>%
distinct(Region, Purpose)
## # A tibble: 1 × 2
## Region Purpose
## <chr> <chr>
## 1 Sydney Visiting
d. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State
# Data wrangling is easier with tibble, so first convert
tourism_tbl <- as_tibble(tourism_ts)
tourism_tbl <- tourism_tbl %>%
group_by(State) %>%
mutate(
# sum all trips for each state (ie, for all purposes and regions)
total_trips = sum(Trips)
) %>%
# Omit columns no longer needed
select(-c(Region, Purpose, Trips))
# Create new tsibble from tibble
tourism_ts2 <- tourism_tbl %>%
# tsibble needs distinct rows
distinct() %>%
# Create new tsibble
as_tsibble(index = Quarter, key = State)
Use the following graphics functions: autoplot(),
gg_season(), gg_subseries(),
gg_lag(), ACF() and explore features from the
following time series: “Total Private” Employed from
us_employment, Bricks from
aus_production, Hare from pelt,
“HO2” Cost from PBS, and Barrels
from us_gasoline.
Can you spot any seasonality, cyclicity and trend?
What do you learn about the series?
What can you say about the seasonal patterns?
Can you identify any unusual years?
Employed from
us_employmentThe autoplot shows that the total private employment in
the US trended upward from 1940 to 2020. There is a comparatively large
drop in the number employed around 2010, which may make it an “unusual”
year. However, the upward trend continues after that point.
The scale of the gg_season plot makes it difficult to
discern seasonality, but the gg_subseries and
gg_lag plots show seasonality (monthly period). The lag
plot also shows strong autocorrelation since the data are concentrated
on the diagonal.
The ACF plot shows higher autocorrelations for small
lags than large lags, which confirms the presence of trend. However, the
autocorrelations for seasonal lags are not larger than other lags, which
suggests the absence of seasonality. I’m not sure how to reconcile this
with the previous plots, unless I misinterpreted something.
None of the plots show evidence of cyclic behavior.
us_empl_tp <- us_employment %>%
filter(Title == 'Total Private')
autoplot(us_empl_tp, Employed)
ggtime::gg_season(us_empl_tp, Employed)
ggtime::gg_subseries(us_empl_tp, Employed)
ggtime::gg_lag(us_empl_tp, Employed)
ACF(us_empl_tp, Employed) %>%
autoplot()
Bricks from
aus_productionThe autoplot shows that brick production in Australia
trended upward from 1956 to 1974, then trended downward. There were
several large drops in production during the latter period (eg, 1975 and
1983), which may be considered unusual (relative to the magnitude of
other decreases).
The gg_season, gg_subseries, and
gg_lag plots all show seasonality (quarterly period). The
lack of linearity in the lag plot indicates the absence of
autocorrelation.
The ACF plot confirms the presence of trend and
seasonality. None of the plots show evidence of cyclic behavior.
autoplot(drop_na(aus_production), Bricks)
ggtime::gg_season(drop_na(aus_production), Bricks)
ggtime::gg_subseries(drop_na(aus_production), Bricks)
ggtime::gg_lag(drop_na(aus_production), Bricks)
ACF(drop_na(aus_production), Bricks) %>%
autoplot()
Hare from peltThe autoplot does not show any trend in the number of
snowshoe hare pelts traded between 1845 and 1935. However, there does
appear to be some cyclical behavior as most of the large peaks are
followed by dips in the number of pelts traded. This is consistent with
the cyclic, sinusoidal ACF plot as well as the time needed
for the hare population to recover.
The Hare time series does not include seasonal data, so
seasonality cannot be assessed.
autoplot(pelt, Hare)
gg_season is unable to plot the data because the
Hare time series interval is yearly and does not have a
seasonal period.
ggtime::gg_season(pelt, Hare)
## Error in `ggtime::gg_season()`:
## ! The data must contain at least one observation per seasonal period.
interval(pelt)
## <interval[1]>
## [1] 1Y
Similarly, gg_subseries cannot create facet plots
without seasonal data.
ggtime::gg_subseries(pelt, Hare)
ggtime::gg_lag(pelt, Hare)
ACF(pelt, Hare) %>%
autoplot()
Cost from PBSI was not able to analyze the HO2 cost time series because there is
no ATC2 index for “HO2” in the PBS
dataset.
sprintf("There are %d rows with ATC2 == 'HO2'", nrow(filter(PBS, ATC2 == 'HO2')))
## [1] "There are 0 rows with ATC2 == 'HO2'"
Barrels from
us_gasolineThe autoplot shows that US gasoline production trended
upward from 1991 to 2005, but there does not appear to be any trend
thereafter. There does not appear to be any particularly unusual
years.
The gg_season, gg_subseries, and
gg_lag plots show seasonality. The gg_season
plot suggests that the seasonality period is approximately 6 months,
such that production tends to be higher during summer, which is
consistent with common travel trends.
The concentration of data around the diagonal in the lag plot suggests the presence of autocorrelation.
The ACF plot indicates the presence of trend; however,
evidence for seasonality is not clear and appears to be weak. If
present, the periodicity seems shorter than that indicated by the
gg_season plot. I’m not sure how to reconcile the two
plots.
None of the plots show evidence of cyclic behavior.
autoplot(us_gasoline, Barrels)
ggtime::gg_season(us_gasoline, Barrels)
ggtime::gg_subseries(us_gasoline, Barrels)
ggtime::gg_lag(us_gasoline, Barrels)
ACF(us_gasoline, Barrels) %>%
autoplot()
sessionInfo()
## R version 4.5.2 (2025-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Tahoe 26.2
##
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] USgas_0.1.2 fable_0.5.0 fpp3_1.0.2 tsibbledata_0.4.1
## [5] tsibble_1.1.6 feasts_0.4.2 fabletools_0.5.1 readxl_1.4.5
## [9] ggtime_0.1.0 lubridate_1.9.4 forcats_1.0.1 stringr_1.6.0
## [13] dplyr_1.1.4 purrr_1.2.1 readr_2.1.6 tidyr_1.3.2
## [17] tibble_3.3.1 ggplot2_4.0.1 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.6 rappdirs_0.3.4 sass_0.4.10
## [4] generics_0.1.4 anytime_0.3.12 stringi_1.8.7
## [7] hms_1.1.4 digest_0.6.39 magrittr_2.0.4
## [10] evaluate_1.0.5 grid_4.5.2 timechange_0.3.0
## [13] RColorBrewer_1.1-3 fastmap_1.2.0 cellranger_1.1.0
## [16] jsonlite_2.0.0 viridisLite_0.4.2 scales_1.4.0
## [19] jquerylib_0.1.4 cli_3.6.5 crayon_1.5.3
## [22] rlang_1.1.7 bit64_4.6.0-1 ellipsis_0.3.2
## [25] withr_3.0.2 cachem_1.1.0 yaml_2.3.12
## [28] otel_0.2.0 parallel_4.5.2 tools_4.5.2
## [31] tzdb_0.5.0 vctrs_0.7.1 R6_2.6.1
## [34] lifecycle_1.0.5 bit_4.6.0 vroom_1.7.0
## [37] pkgconfig_2.0.3 pillar_1.11.1 bslib_0.10.0
## [40] gtable_0.3.6 glue_1.8.0 Rcpp_1.1.1
## [43] xfun_0.56 tidyselect_1.2.1 rstudioapi_0.18.0
## [46] knitr_1.51 farver_2.1.2 htmltools_0.5.9
## [49] labeling_0.4.3 rmarkdown_2.30 compiler_4.5.2
## [52] S7_0.2.1 distributional_0.6.0