Time Series

Required Libraries

library(fpp3)

Problem 2.1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

Use ? (or help()) to find out about the data in each series.
What is the time interval of each series?
Use autoplot() to produce a time plot of each series.
For the last plot, modify the axis labels and title.

aus_production: Bricks

The time interval for Bricks is quarterly

bricks <-
  aus_production |> 
  select(Quarter, Bricks)

head(bricks)

bricks |> 
  autoplot(Bricks) +
  geom_point()

pelt: Lynx

The time interval for Lynx is yearly

lynx <-
  pelt |> 
  select(Year, Lynx)

head(lynx)

lynx |> 
  autoplot(Lynx) +
  geom_point()

gafa_stock: Close

The time interval for Close is daily

close <-
  gafa_stock |> 
  select(Date, Close)

head(close)

close |> 
  autoplot(Close)

vic_elec: Demand

The time interval for Demand is every half-hour.

demand <-
  vic_elec |> 
  select(Time, Demand)

head(demand)

demand |> 
  autoplot(Demand) +
  labs(title = 'Electricity Demand for Victoria, Australia',
       x = 'Time (30 Minutes)',
       y = 'Demand (MWh)')

Problem 2.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

#peak <- 
  gafa_stock |> 
  group_by(Symbol) |>
  filter(Close == max(Close)) |> 
  select(Symbol, Date, Close)

Problem 2.3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

You can read the data into R with the following script:

tute1 <- readr::read_csv("tute1.csv")
head(tute1)

Convert the data to time series

mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

head(mytimeseries)

Construct time series plots of each of the three series

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

Check what happens when you don’t include facet_grid().

We can see that the three time series are on one plot instead of individual subplots, while sharing the same y-scale.

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()

Problem 2.4

The USgas package contains data on the demand for natural gas in the US.

Install the USgas package.

if(require("USgas") == FALSE){
  install.packages("USgas")
}else{library(USgas)}

## Loading required package: USgas

Create a tsibble from us_total with year as the index and state as the key.

us_gas_time_series <-
  us_total |>
  as_tsibble(key = state, index = year)

head(us_gas_time_series)

Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

library(ggh4x)

us_gas_time_series |> 
  filter(state == c('Maine', 'Vermont', 'New Hampshire', 'Massachusetts', 'Connecticut', 'Rhode Island')) |>
  ggplot(aes(x = year, y = y, color = state)) +
  geom_point() +
  geom_line() +
  facet_grid(rows=vars(state), 
             scales = "free_y", 
             labeller = labeller(state = label_wrap_gen(10))) +
  theme_bw() + 
  theme(legend.position="none")

Problem 2.5

Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

library(readxl)

tourism <- 
  read_excel("tourism.xlsx")

head(tourism)

Create a tsibble which is identical to the tourism tsibble from the tsibble package.

tourism_time_series <-
  tourism |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter, key = c(Region, State, Purpose))

head(tourism_time_series)

Find what combination of Region and Purpose had the maximum number of overnight trips on average.

top_region_purpose <-
  tourism_time_series |>
  group_by(Region, Purpose) |>
  summarise(Average_Trips = mean(Trips)) |>
  arrange(desc(Average_Trips))

head(top_region_purpose, 1)

Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

tourism_combined <-
  tourism |>
  group_by(Quarter, State) |>
  summarise(Total_Trips = sum(Trips), .groups = "drop") |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(key = State, index = Quarter)

head(tourism_combined)

Problem 2.8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

Can you spot any seasonality, cyclicity and trend?
What do you learn about the series?
What can you say about the seasonal patterns?
Can you identify any unusual years?

us_employment: Monthly US Total Private Employed

We can see that there is an overall strong, positive trend from 1940-2020.

private_employment <-
  us_employment |>
  filter(Title == "Total Private")

private_employment |>
  autoplot(Employed)

Comparing the current \(y_t\) to the lag \(y_{t-k}\) where k is a different period prior we again see a strong positive trend.

private_employment |>
  gg_lag(Employed, geom = "point")

Another way is looking into the autocorrelation of each lag and we see a small decrease as we get further out from the current time period, but in a positive direction. This confirms the trend we are seeing. We also know that these values are significanlty different from zero and not white noise.

aus_production: Quarterly Bricks Production

Between 1960-1980 we see there where a strong positive trend, then a cycle where a steep drop occurs and begins to increase again. After 1980 there’s a seasonality every 5-7 years of growth and drops in production.

brick_production <-
  aus_production |>
  select(Quarter, Bricks)

brick_production |>
  autoplot(Bricks)

Breaking down each year and looking between quarters we see where Q1 to Q2/Q3 has an increase in brick production that then tapers off and drops by Q4. This shows there is a seasonality factor where the warmer months require more bricks to be produced

brick_production |>
  gg_season(Bricks)

As we look at the lag, lag1 has a strong positive linear trend and begins to have a heteroskedasticity problem throughout the lags as we increase each lag + 1.

brick_production |>
  gg_lag(Bricks, geom = "point")

The autocorrelation confirms our findings where there is a trend as the overall lag autocorrelation slowly decreases as we continue to lag - 1. We also see the seasonality every 4th lag as the graph is scalloped.

pelt: Annual Hare Trades

We can not see any clear trend throughout the years. There is seasonality where roughly every five years there is a steep drop in trades and then growth again.

hare_trades <-
  pelt |>
  select(Year, Hare)

hare_trades |>
  autoplot(Hare)

The lag plots shows us how there is no strong relationship between the current time and lags, which confirms no trends as mentioned earlier.

hare_trades |>
  gg_lag(Hare, geom = "point")

The autocorrelation plot shows us the peaks and drops every five years we saw earlier, confirming the seasonality pattern.