Exercise 2.1

Explore the following time series:

Use ? (or help() ) to find out about the data in each series
What is the time interval of each series?
Use autoplot() to produce a time plot of each series
For the last plot, modify the axis labels and title

Time series 1. `Bricks` from `aus_production` dataset

The help file for the aus_production dataset shows that Bricks is a time series with quarterly estimates of clay brick production in Australia.

help("aus_production")

The time interval of the series is 1 calendar quarter (ie, 3 months).

interval(aus_production)

## <interval[1]>
## [1] 1Q

is_regular(aus_production)

## [1] TRUE

The Bricks time series is plotted below.

autoplot(drop_na(aus_production), Bricks) + 
  ylab('Bricks produced (millions)') +
  ggtitle('Clay Brick Production in Australia') +
  guides(x = guide_axis(minor.ticks = TRUE), 
         y = guide_axis(minor.ticks = TRUE)) +  
  theme_classic() +
  theme(
    axis.title = element_text(face = 'bold'),
    plot.title = element_text(face = 'bold')
  )

Time series 2. `Lynx` from `pelt` dataset

The help file for the pelt dataset shows that Lynx is a time series with the number of Canadian Lynx pelts traded per year by the Hudson Bay Company from 1845 to 1935.

help(pelt)

The time interval of the series is 1 year.

interval(pelt)

## <interval[1]>
## [1] 1Y

is_regular(pelt)

## [1] TRUE

The Lynx time series is plotted below.

autoplot(pelt, Lynx) + 
  ylab('Pelts traded') +
  ggtitle('Canadian Lynx pelts traded, 1845-1935') +
  guides(x = guide_axis(minor.ticks = TRUE), 
         y = guide_axis(minor.ticks = TRUE)) +  
  theme_classic() +
  theme(
    axis.title = element_text(face = 'bold'),
    plot.title = element_text(face = 'bold')
  )

Time series 3. `Close` from `gafa_stock` dataset

The help file for the gafa_stock dataset shows that Close is a time series with the closing stock price (US dollar) for Google, Amazon, Facebook, and Apple on irregular trading days between 2014 and 2018.

help("gafa_stock")

The time interval of the series is irregular (ie, there is no fixed or predictable interval).

interval(gafa_stock)

## <interval[1]>
## [1] !

is_regular(gafa_stock)

## [1] FALSE

The Close time series is plotted below. There is a discrepancy in the time range for the series in the documentation for the gafa_stock dataset (see above), which states that the time range is from 2014 to 2018, and the plot, which shows data from 2014 to 2019.

autoplot(gafa_stock, Close) + 
  labs(
    x = 'Year',
    y = 'Closing price (USD)',
    caption = 'Abbreviations: AAPL, Apple; AMZN, Amazon; FB, Facebook; GOOG, Google.'
  ) +
  ggtitle('Closing stock prices on irregular trading days, 2014-2019') +
  guides(x = guide_axis(minor.ticks = TRUE), 
         y = guide_axis(minor.ticks = TRUE)) +  
  theme_classic() +
  theme(
    axis.title = element_text(face = 'bold'),
    plot.title = element_text(face = 'bold'),
    plot.caption = element_text(color = "gray", hjust = 0)
  )

Time series 4. `Demand` from `vic_elec` dataset

The help file for the vic_elec dataset shows that Demand is a time series with the total operational electricity demand (megawatt-hours) in half-hour intervals for Victoria, Australia. The information does not specify the time frame for the data.

help("vic_elec")

The time interval of the series is 30 minutes.

interval(vic_elec)

## <interval[1]>
## [1] 30m

is_regular(vic_elec)

## [1] TRUE

The Demand time series is plotted below.

autoplot(vic_elec, Demand) + 
  labs(
    x = 'Year [30 min]',
    y = 'Demand (MWh)'
  ) +
  ggtitle('Operational electricity demand in Victoria, Australia') +
  guides(x = guide_axis(minor.ticks = TRUE), 
         y = guide_axis(minor.ticks = TRUE)) +
  theme_classic() +
  theme(
    axis.title = element_text(face = 'bold'),
    plot.title = element_text(face = 'bold')
  )

Exercise 2.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

The closing price for Apple, Amazon, Facebook, and Google peaked on October 3, 2018; September 4, 2018; July 25, 2018; and July 26, 2018, respectively.

gafa_stock %>%
  group_by(Symbol) %>%  # group dataset by company  
  filter(Close == max(Close)) %>%  # filter rows with max closing price
  select(Symbol, Date, Close)

## # A tsibble: 4 x 3 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date       Close
##   <chr>  <date>     <dbl>
## 1 AAPL   2018-10-03  232.
## 2 AMZN   2018-09-04 2040.
## 3 FB     2018-07-25  218.
## 4 GOOG   2018-07-26 1268.

Exercise 2.3

a. Read the data into R

tute1 <- read_csv('Data/tute1.csv', show_col_types = FALSE)
tute1

## # A tibble: 100 × 4
##    Quarter    Sales AdBudget   GDP
##    <date>     <dbl>    <dbl> <dbl>
##  1 1981-03-01 1020.     659.  252.
##  2 1981-06-01  889.     589   291.
##  3 1981-09-01  795      512.  291.
##  4 1981-12-01 1004.     614.  292.
##  5 1982-03-01 1058.     647.  279.
##  6 1982-06-01  944.     602   254 
##  7 1982-09-01  778.     531.  296.
##  8 1982-12-01  932.     608.  272.
##  9 1983-03-01  996.     638.  260.
## 10 1983-06-01  908.     582.  280.
## # ℹ 90 more rows

b. Convert the data to time series

tute1_ts <- tute1 %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(index = Quarter)

c. Construct time series plots

tute1_ts %>% 
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, color = name)) +
    geom_line() +
    facet_grid(name ~ ., scales = 'free_y')

Without facet_grid(), the plots are not faceted (ie, each series is not separated into a separate plot). In addition, all three plots share a single y-axis. In particular, this compresses the GDP series (green line) and makes it harder to discern highs/lows (eg, the peak at ~1991).

tute1_ts %>% 
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, color = name)) +
    geom_line()

Exercise 2.4

a. Install the `USgas` package

# Check whether USgas package is installed and, if not, install and load it
if (!require('USgas', character.only = TRUE)) {
  install.packages('USgas')
  library('USgas')
}

## Loading required package: USgas

b. Create a tsibble from `us_total`

# Use year as the index and state as the key
us_total_ts <- USgas::us_total %>%
  as_tibble(key = state, index = year)

c. Plot the annual natural gas consumption by state for New England

First I examined the documentation for us_total to understand the dataset and determine how to label the plot.

help(us_total)

The plots below show that states in New England had different trends in natural gas consumption between 1997 and 2019. However, comparison of consumption between states is difficult because consumption is not normalized by state population.

new_england <- c('Maine', 'Vermont', 'New Hampshire', 'Massachusetts', 'Connecticut', 'Rhode Island')

us_total_ts %>%
  filter(state %in% new_england) %>%
  ggplot(aes(x = year, y = y, color = state)) +
  geom_line() +
  facet_grid(state ~ ., scales = 'free_y') +
  labs(
    x = 'Year',
    y = bquote(bold('Annual Consumption (' * ft^3 * ')')),
    title = 'Natural Gas Consumption in New England, 1997-2019') +
  scale_color_discrete('State') +  # legend title
  theme(
    axis.title.x = element_text(face = 'bold'),
    plot.title = element_text(face = 'bold')
  )

Exercise 2.5

a. Read tourism.xlsx into R using readxl::read_excel()

tourism_df <- readxl::read_excel('Data/tourism.xlsx')

b. Create a tsibble which is identical to the tourism tsibble from the tsibble package

# Create tsibble
tourism_ts <- tourism_df %>%
  mutate(
    Quarter = yearquarter(Quarter)
  ) %>%
  as_tsibble(key = c(Region, State, Purpose), index = Quarter)

# Confirm that the manually created tsibble contains the same data as the 
# one provided in the tsibble package
all.equal(tourism_ts, tsibble::tourism)

## [1] TRUE

c. Find what combination of Region and Purpose had the maximum number of overnight trips on average

The Sydney region with Visiting purpose had the most overnight trips on average.

tourism_ts %>%
  group_by(Region, Purpose) %>%
  mutate(
    mean_trips = mean(Trips)
  ) %>%
  ungroup() %>%
  filter(mean_trips == max(mean_trips)) %>%
  distinct(Region, Purpose)

## # A tibble: 1 × 2
##   Region Purpose 
##   <chr>  <chr>   
## 1 Sydney Visiting

d. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State

# Data wrangling is easier with tibble, so first convert
tourism_tbl <- as_tibble(tourism_ts)

tourism_tbl <- tourism_tbl %>%
  group_by(State) %>%
  mutate(
    # sum all trips for each state (ie, for all purposes and regions)
    total_trips = sum(Trips)
  ) %>%
  # Omit columns no longer needed
  select(-c(Region, Purpose, Trips))

# Create new tsibble from tibble
tourism_ts2 <- tourism_tbl %>%
  # tsibble needs distinct rows  
  distinct() %>%
  # Create new tsibble
  as_tsibble(index = Quarter, key = State)

Exercise 2.8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “HO2” Cost from PBS, and Barrels from us_gasoline.

Can you spot any seasonality, cyclicity and trend?
What do you learn about the series?
What can you say about the seasonal patterns?
Can you identify any unusual years?

Time series 1. “Total Private” `Employed` from `us_employment`

The autoplot shows that the total private employment in the US trended upward from 1940 to 2020. There is a comparatively large drop in the number employed around 2010, which may make it an “unusual” year. However, the upward trend continues after that point.

The scale of the gg_season plot makes it difficult to discern seasonality, but the gg_subseries and gg_lag plots show seasonality (monthly period). The lag plot also shows strong autocorrelation since the data are concentrated on the diagonal.

The ACF plot shows higher autocorrelations for small lags than large lags, which confirms the presence of trend. However, the autocorrelations for seasonal lags are not larger than other lags, which suggests the absence of seasonality. I’m not sure how to reconcile this with the previous plots, unless I misinterpreted something.

None of the plots show evidence of cyclic behavior.

us_empl_tp <- us_employment %>%
  filter(Title == 'Total Private')

autoplot(us_empl_tp, Employed)

ggtime::gg_season(us_empl_tp, Employed)

ggtime::gg_subseries(us_empl_tp, Employed)

ggtime::gg_lag(us_empl_tp, Employed)

ACF(us_empl_tp, Employed) %>%
  autoplot()

Time series 2. `Bricks` from `aus_production`

The autoplot shows that brick production in Australia trended upward from 1956 to 1974, then trended downward. There were several large drops in production during the latter period (eg, 1975 and 1983), which may be considered unusual (relative to the magnitude of other decreases).

The gg_season, gg_subseries, and gg_lag plots all show seasonality (quarterly period). The lack of linearity in the lag plot indicates the absence of autocorrelation.

The ACF plot confirms the presence of trend and seasonality. None of the plots show evidence of cyclic behavior.

autoplot(drop_na(aus_production), Bricks)

ggtime::gg_season(drop_na(aus_production), Bricks)

ggtime::gg_subseries(drop_na(aus_production), Bricks)

ggtime::gg_lag(drop_na(aus_production), Bricks)

ACF(drop_na(aus_production), Bricks) %>%
  autoplot()

Time series 3. `Hare` from `pelt`

The autoplot does not show any trend in the number of snowshoe hare pelts traded between 1845 and 1935. However, there does appear to be some cyclical behavior as most of the large peaks are followed by dips in the number of pelts traded. This is consistent with the cyclic, sinusoidal ACF plot as well as the time needed for the hare population to recover.

The Hare time series does not include seasonal data, so seasonality cannot be assessed.

autoplot(pelt, Hare)

gg_season is unable to plot the data because the Hare time series interval is yearly and does not have a seasonal period.

ggtime::gg_season(pelt, Hare)

## Error in `ggtime::gg_season()`:
## ! The data must contain at least one observation per seasonal period.

interval(pelt)

## <interval[1]>
## [1] 1Y

Similarly, gg_subseries cannot create facet plots without seasonal data.

ggtime::gg_subseries(pelt, Hare)

ggtime::gg_lag(pelt, Hare)

ACF(pelt, Hare) %>%
  autoplot()

Time series 4. “HO2” `Cost` from `PBS`

I was not able to analyze the HO2 cost time series because there is no ATC2 index for “HO2” in the PBS dataset.

sprintf("There are %d rows with ATC2 == 'HO2'", nrow(filter(PBS, ATC2 == 'HO2')))

## [1] "There are 0 rows with ATC2 == 'HO2'"

Time series 5. `Barrels` from `us_gasoline`

The autoplot shows that US gasoline production trended upward from 1991 to 2005, but there does not appear to be any trend thereafter. There does not appear to be any particularly unusual years.

The gg_season, gg_subseries, and gg_lag plots show seasonality. The gg_season plot suggests that the seasonality period is approximately 6 months, such that production tends to be higher during summer, which is consistent with common travel trends.

The concentration of data around the diagonal in the lag plot suggests the presence of autocorrelation.

The ACF plot indicates the presence of trend; however, evidence for seasonality is not clear and appears to be weak. If present, the periodicity seems shorter than that indicated by the gg_season plot. I’m not sure how to reconcile the two plots.

None of the plots show evidence of cyclic behavior.

autoplot(us_gasoline, Barrels)

ggtime::gg_season(us_gasoline, Barrels)

ggtime::gg_subseries(us_gasoline, Barrels)

ggtime::gg_lag(us_gasoline, Barrels)

ACF(us_gasoline, Barrels) %>%
  autoplot()

Session Details

sessionInfo()

## R version 4.5.2 (2025-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Tahoe 26.2
## 
## Matrix products: default
## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] USgas_0.1.2       fable_0.5.0       fpp3_1.0.2        tsibbledata_0.4.1
##  [5] tsibble_1.1.6     feasts_0.4.2      fabletools_0.5.1  readxl_1.4.5     
##  [9] ggtime_0.1.0      lubridate_1.9.4   forcats_1.0.1     stringr_1.6.0    
## [13] dplyr_1.1.4       purrr_1.2.1       readr_2.1.6       tidyr_1.3.2      
## [17] tibble_3.3.1      ggplot2_4.0.1     tidyverse_2.0.0  
## 
## loaded via a namespace (and not attached):
##  [1] utf8_1.2.6           rappdirs_0.3.4       sass_0.4.10         
##  [4] generics_0.1.4       anytime_0.3.12       stringi_1.8.7       
##  [7] hms_1.1.4            digest_0.6.39        magrittr_2.0.4      
## [10] evaluate_1.0.5       grid_4.5.2           timechange_0.3.0    
## [13] RColorBrewer_1.1-3   fastmap_1.2.0        cellranger_1.1.0    
## [16] jsonlite_2.0.0       viridisLite_0.4.2    scales_1.4.0        
## [19] jquerylib_0.1.4      cli_3.6.5            crayon_1.5.3        
## [22] rlang_1.1.7          bit64_4.6.0-1        ellipsis_0.3.2      
## [25] withr_3.0.2          cachem_1.1.0         yaml_2.3.12         
## [28] otel_0.2.0           parallel_4.5.2       tools_4.5.2         
## [31] tzdb_0.5.0           vctrs_0.7.1          R6_2.6.1            
## [34] lifecycle_1.0.5      bit_4.6.0            vroom_1.7.0         
## [37] pkgconfig_2.0.3      pillar_1.11.1        bslib_0.10.0        
## [40] gtable_0.3.6         glue_1.8.0           Rcpp_1.1.1          
## [43] xfun_0.56            tidyselect_1.2.1     rstudioapi_0.18.0   
## [46] knitr_1.51           farver_2.1.2         htmltools_0.5.9     
## [49] labeling_0.4.3       rmarkdown_2.30       compiler_4.5.2      
## [52] S7_0.2.1             distributional_0.6.0

DATA624 Homework 1

Alexander Simon

2026-02-08

Exercise 2.1

Time series 1. `Bricks` from `aus_production` dataset

Time series 2. `Lynx` from `pelt` dataset

Time series 3. `Close` from `gafa_stock` dataset

Time series 4. `Demand` from `vic_elec` dataset

Exercise 2.2

Exercise 2.3

a. Read the data into R

b. Convert the data to time series

c. Construct time series plots

Exercise 2.4

a. Install the `USgas` package

b. Create a tsibble from `us_total`

c. Plot the annual natural gas consumption by state for New England

Exercise 2.5

Exercise 2.8

Time series 1. “Total Private” `Employed` from `us_employment`

Time series 2. `Bricks` from `aus_production`

Time series 3. `Hare` from `pelt`

Time series 4. “HO2” `Cost` from `PBS`

Time series 5. `Barrels` from `us_gasoline`

Session Details

DATA624 Homework 1

Alexander Simon

2026-02-08

Exercise 2.1

Time series 1. Bricks from aus_production dataset

Time series 2. Lynx from pelt dataset

Time series 3. Close from gafa_stock dataset

Time series 4. Demand from vic_elec dataset

Exercise 2.2

Exercise 2.3

a. Read the data into R

b. Convert the data to time series

c. Construct time series plots

Exercise 2.4

a. Install the USgas package

b. Create a tsibble from us_total

c. Plot the annual natural gas consumption by state for New England

Exercise 2.5

Exercise 2.8

Time series 1. “Total Private” Employed from us_employment

Time series 2. Bricks from aus_production

Time series 3. Hare from pelt

Time series 4. “HO2” Cost from PBS

Time series 5. Barrels from us_gasoline

Session Details

Time series 1. `Bricks` from `aus_production` dataset

Time series 2. `Lynx` from `pelt` dataset

Time series 3. `Close` from `gafa_stock` dataset

Time series 4. `Demand` from `vic_elec` dataset

a. Install the `USgas` package

b. Create a tsibble from `us_total`

Time series 1. “Total Private” `Employed` from `us_employment`

Time series 2. `Bricks` from `aus_production`

Time series 3. `Hare` from `pelt`

Time series 4. “HO2” `Cost` from `PBS`

Time series 5. `Barrels` from `us_gasoline`