DATA 624 HW 1

library(fpp3)
library(tsibble)
library(dplyr)

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.
- Use ? (or help()) to find out about the data in each series.

help("aus_production")

## starting httpd help server ... done

glimpse(aus_production)

## Rows: 218
## Columns: 7
## $ Quarter     <qtr> 1956 Q1, 1956 Q2, 1956 Q3, 1956 Q4, 1957 Q1, 1957 Q2, 1957…
## $ Beer        <dbl> 284, 213, 227, 308, 262, 228, 236, 320, 272, 233, 237, 313…
## $ Tobacco     <dbl> 5225, 5178, 5297, 5681, 5577, 5651, 5317, 6152, 5758, 5641…
## $ Bricks      <dbl> 189, 204, 208, 197, 187, 214, 227, 222, 199, 229, 249, 234…
## $ Cement      <dbl> 465, 532, 561, 570, 529, 604, 603, 582, 554, 620, 646, 637…
## $ Electricity <dbl> 3923, 4436, 4806, 4418, 4339, 4811, 5259, 4735, 4608, 5196…
## $ Gas         <dbl> 5, 6, 7, 6, 5, 7, 7, 6, 5, 7, 8, 6, 5, 7, 8, 6, 6, 8, 8, 7…

help("pelt")
glimpse(pelt)

## Rows: 91
## Columns: 3
## $ Year <dbl> 1845, 1846, 1847, 1848, 1849, 1850, 1851, 1852, 1853, 1854, 1855,…
## $ Hare <dbl> 19580, 19600, 19610, 11990, 28040, 58000, 74600, 75090, 88480, 61…
## $ Lynx <dbl> 30090, 45150, 49150, 39520, 21230, 8420, 5560, 5080, 10170, 19600…

help("gafa_stock")
glimpse(gafa_stock)

## Rows: 5,032
## Columns: 8
## Key: Symbol [4]
## $ Symbol    <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAP…
## $ Date      <date> 2014-01-02, 2014-01-03, 2014-01-06, 2014-01-07, 2014-01-08,…
## $ Open      <dbl> 79.38286, 78.98000, 76.77857, 77.76000, 76.97285, 78.11429, …
## $ High      <dbl> 79.57571, 79.10000, 78.11429, 77.99429, 77.93714, 78.12286, …
## $ Low       <dbl> 78.86000, 77.20428, 76.22857, 76.84571, 76.95571, 76.47857, …
## $ Close     <dbl> 79.01857, 77.28286, 77.70428, 77.14857, 77.63715, 76.64571, …
## $ Adj_Close <dbl> 66.96433, 65.49342, 65.85053, 65.37959, 65.79363, 64.95345, …
## $ Volume    <dbl> 58671200, 98116900, 103152700, 79302300, 64632400, 69787200,…

help("vic_elec")
glimpse(vic_elec)

## Rows: 52,608
## Columns: 5
## $ Time        <dttm> 2012-01-01 00:00:00, 2012-01-01 00:30:00, 2012-01-01 01:0…
## $ Demand      <dbl> 4382.825, 4263.366, 4048.966, 3877.563, 4036.230, 3865.597…
## $ Temperature <dbl> 21.40, 21.05, 20.70, 20.55, 20.40, 20.25, 20.10, 19.60, 19…
## $ Date        <date> 2012-01-01, 2012-01-01, 2012-01-01, 2012-01-01, 2012-01-0…
## $ Holiday     <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…

vic_elec<-vic_elec

What is the time interval of each series?

The Aus_Production is recorded on a quartly time interval

Pelt is recorded on a yarly interval from 1845 to 1935

Gafa_stock is recoreded on a daily interval from 2014 to 2018

Victoria Electric is recorded on a half hourly basis from 1/1/2012 0:00:00 to 31/12/2014 23:30:00

Use autoplot() to produce a time plot of each series.

aus_production %>%  autoplot(Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

pelt %>% autoplot(Lynx)

gafa_stock %>% autoplot(Close)

vic_elec %>%  autoplot(Demand) +
 labs(title = "Electricity Demand in Victoria",
       x = "Time",
       y = "Demand (MW)")

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

gafa_stock %>% 
  group_by(Symbol) %>% 
  filter(Close == max(Close)) %>% 
  select(Symbol, Date, Close)

## # A tsibble: 4 x 3 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date       Close
##   <chr>  <date>     <dbl>
## 1 AAPL   2018-10-03  232.
## 2 AMZN   2018-09-04 2040.
## 3 FB     2018-07-25  218.
## 4 GOOG   2018-07-26 1268.

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

These were partially converted to my own code vs the book as for some reason I could not get the data to run without converting it.
1. You can read the data into R with the following script:

tutel1 <- read.csv("C:/Users/rbron/Downloads/tute1 (1).csv")

b. Convert the data to time series

 tutel1_ts <- tutel1 %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(index = Quarter)

Construct time series plots of each of the three series

tutel1_ts |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

The USgas package contains data on the demand for natural gas in the US.
1. Install the USgas package.

#install.packages("USgas")

Create a tsibble from us_total with year as the index and state as the key.

library(USgas)

us_tsibble <- us_total %>%
  as_tsibble(index = year, key = state)

Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

library(ggplot2)

# Define the New England states
new_england_states <- c("Maine", "Vermont", "New Hampshire", "Massachusetts",
                        "Connecticut", "Rhode Island")

# Filter the tsibble for only those states
new_england_gas <- us_tsibble %>%
  filter(state %in% new_england_states)

# Plot
ggplot(new_england_gas, aes(x = year, y = y, color = state)) +
  geom_line() +
  labs(
    title = "Annual Natural Gas Consumption in New England States",
    x = "Year",
    y = "Natural Gas Consumption (Million Cubic Feet)",
    color = "State"
  ) +
  theme_minimal()

1. . Download tourism.xlsx from the book website and read it into R using readxl::read_excel().
```
library(readxl)

tourism <- read_excel("C:/Users/rbron/Downloads/tourism.xlsx")
```
1. Create a tsibble which is identical to the tourism tsibble from the tsibble package.
```
tourism_ts <- tourism %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(
    key = c(Region, State, Purpose),
    index = Quarter
  )
```
1. Find what combination of Region and Purpose had the maximum number of overnight trips on average.

tourism %>%
  group_by(Region, Purpose) %>%
  summarise(
    avg_trips = mean(Trips, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_trips)) %>%
  slice(1)

## # A tibble: 1 × 3
##   Region Purpose  avg_trips
##   <chr>  <chr>        <dbl>
## 1 Sydney Visiting      747.

d. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

tourism_state <- tourism %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  group_by(State, Quarter) %>%
  summarise(
    Trips = sum(Trips, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  as_tsibble(
    key = State,
    index = Quarter
  )

The aus_arrivals data set comprises quarterly international arrivals to Australia from Japan, New Zealand, UK and the US.

Use `autoplot()`, `gg_season()` and `gg_subseries()` to compare the differences between the arrivals from these four countries.

library(tsibble)
library(feasts)

data(aus_arrivals)

Table 1: Overall Comparison

autoplot(aus_arrivals, Arrivals) +
  facet_wrap(~ Origin, scales = "free_y")

Figure Overall: New Zealand has the largest arrivals, The US and The UK have growth over time, Japan shows a decline in arrivals after the 1990’s.

Table 2: Seasonality

gg_season(aus_arrivals, Arrivals) +
  facet_wrap(~ Origin)

## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Interpretation Seasonality: There is strong seasonality for all season, with the first quarter or Summer time being the highest amount of arrivals, Japan has a weakening trend overtime.

Table 3: Subset Comparison

gg_subseries(aus_arrivals, Arrivals) +
  facet_wrap(~ Origin)

## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Interpretation: Quarter 1 has the strongest overall amount of arrivals, US and UK show a steady increase in arrivals, with Japan showing declining means in all quarters.

Can you identify any unusual observations?

There were 3 unusual observations. The first was that there was a drop in arrivals during 2008 to 2009 due to the global financial crisis. The next unusual observation was that there was a long-term pattern for a drop in Japanese arrivals. The last observation was there were sometimes extreme 1st Quarter peaks, especially for New Zealand.

Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):
```
set.seed(12345678) 
myseries <- aus_retail |>   
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
```
Explore your chosen retail time series using the following functions:
```
autoplot(myseries, Turnover)
```
Autoplot Interpretation: There is higher turnover as the years go on.
```
gg_season(myseries, Turnover)
```
Seasonality Interpretation: There is increasing turnover as the year goes on with a larger spike in the summer around July/August, and very large spikes in November/December.
```
gg_subseries(myseries, Turnover)
```
Interpretation Subseries: This follows the pattern of the seasonal plot with the highest turnovers being in the small spike in July/August, and the largest Spike being in December.
```
gg_lag(myseries, Turnover, lags = 1:4)
```
```
## Warning: `gg_lag()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_lag()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
```
Lag Interpretation: The lag plots show strong positive autocorrelation, with turnover in one period closely related to turnover in previous periods, especially at lag 1. The colored monthly patterns indicate strong seasonality, with higher turnover toward the end of the year and lower values early in the year, and no obvious unusual observations.
```
myseries %>% 
  ACF(Turnover) %>% 
  autoplot()
```
Lag Interpreation: The ACF shows very strong positive autocorrelation at short lags, indicating high persistence in turnover over time. There are clear seasonal spikes at lag 12 (and multiples), confirming strong annual seasonality, with no evidence of unusual or irregular behaviour.

autoplot(), gg_season(), gg_subseries(), gg_lag(),

ACF() |> autoplot()

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

us_employment %>%
  filter(Title == "Total Private") %>%
  autoplot(Employed)

us_employment %>%
  filter(Title == "Total Private") %>%
  gg_season(Employed)

us_employment %>%
  filter(Title == "Total Private") %>%
  gg_subseries(Employed)

us_employment %>%
  filter(Title == "Total Private") %>%
  gg_lag(Employed)

us_employment %>%
  filter(Title == "Total Private") %>%
  ACF(Employed) %>%
  autoplot()

aus_production %>%
  autoplot(Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>%
  gg_season(Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>%
  gg_subseries(Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>%
  gg_lag(Bricks)

## Warning: Removed 20 rows containing missing values (gg_lag).

aus_production %>%
  ACF(Bricks) %>%
  autoplot()

pelt %>%
  autoplot(Hare)

pelt %>%
  gg_lag(Hare)

pelt %>%
  ACF(Hare) %>%
  autoplot()

PBS %>%
  filter(ATC2 == "H02", Type == "Safety net", Concession == "General") %>%
  autoplot(Cost)

PBS %>%
  filter(ATC2 == "H02", Type == "Safety net", Concession == "General") %>%
  gg_season(Cost)

PBS %>%
  filter(ATC2 == "H02", Type == "Safety net", Concession == "General") %>%
  gg_subseries(Cost)

PBS %>%
  filter(ATC2 == "H02", Type == "Safety net", Concession == "General") %>%
  gg_lag(Cost)

PBS %>%
  filter(ATC2 == "H02", Type == "Safety net", Concession == "General") %>%
  ACF(Cost) %>%
  autoplot()

us_gasoline %>%
  autoplot(Barrels)

us_gasoline %>%
  gg_season(Barrels)

us_gasoline %>%
  gg_subseries(Barrels)

us_gasoline %>%
  gg_lag(Barrels)

us_gasoline %>%
  ACF(Barrels) %>%
  autoplot()

Can you spot any seasonality, cyclicity and trend?
What do you learn about the series?
What can you say about the seasonal patterns?
Can you identify any unusual years?

The following time plots and ACF plots correspond to four different time series. Your task is to match each time plot in the first row with one of the ACF plots in the second row.
The aus_livestock data contains the monthly total number of pigs slaughtered in Victoria, Australia, from Jul 1972 to Dec 2018. Use filter() to extract pig slaughters in Victoria between 1990 and 1995. Use autoplot() and ACF() for this data. How do they differ from white noise? If a longer period of data is used, what difference does it make to the ACF?
1. Use the following code to compute the daily changes in Google closing stock prices.
```
dgoog <- gafa_stock |>   filter(Symbol == "GOOG", year(Date) >= 2018) |>   mutate(trading_day = row_number()) |>   update_tsibble(index = trading_day, regular = TRUE) |>   mutate(diff = difference(Close))
```
2. Why was it necessary to re-index the tsibble?
3. Plot these differences and their ACF.
4. Do the changes in the stock prices look like white noise?

DATA 624 HW 1

Rebecca Bronstein

2026-02-08