library(fpp3)
library(tidyverse)

2.1 Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

?aus_production
?pelt
?gafa_stock
?vic_elec

The time interval for each of the time series is described below:

aus_production |> autoplot(Bricks) + theme_bw()
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

pelt |> autoplot(Lynx) + theme_bw()

gafa_stock |> autoplot(Close) + theme_bw()

vic_elec |>
  autoplot(Demand) +
  labs(y = "Demand (MWh)",
       title = "Half-Hourly Electricitiy Demand for Victoria, Australia from 2012-2015") +
  theme_bw()

2.2 Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

gafa_stock |>
  group_by(Symbol) |>
  filter(Close == max(Close))

Based on the above filtered tsibble, the maximum closing prices for each of the stocks are listed below:

2.3 Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

  1. You can read the data into R with the following script:
tute1 <- readr::read_csv("tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(tute1)
  1. Convert the data to time series
mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

mytimeseries
  1. Construct time series plots of each of the three series
mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y") 

Check what happens when you don’t include facet_grid().

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()

When facet_grid() is not included, all three series are plotted on the same plot, whereas facet_grid() separated them into three separate plots with their own set of axes.

2.4 The USgas package contains data on the demand for natural gas in the US.

  1. Install the USgas package.
  2. Create a tsibble from us_total with year as the index and state as the key.
  3. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
library(USgas)
us_total <- us_total |>
  as_tsibble(key = state, index = year)
us_total |> head(5)
us_total |>
  filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts",
                      "Connecticut", "Rhode Island")) |>
  autoplot() +
  labs(x = "Year", y = "Consumption (MMCF)",
       title = "Annual Total Natural Gas Consumption for New England Area\n in Millions of Cubic Feet") +
  theme_bw()
## Plot variable not specified, automatically selected `.vars = y`

2.5

  1. Download tourism.xlsx from the book website and read it into R using readxl::read_excel().
  2. Create a tsibble which is identical to the tourism tsibble from the tsibble package.
  3. Find what combination of Region and Purpose had the maximum number of overnight trips on average.
  4. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
tourism_book <- readxl::read_excel("tourism.xlsx")

tourism_book <- tourism_book |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(key = c(Region, State, Purpose), index = Quarter)

tourism_book |>
  arrange(desc(Trips)) |>
  head(20)
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `State`, `Purpose`, `Quarter` first.

Based on the tsibble above, 16 of the top 20 quarters in terms of overnight trips occurred in the Sydney region. As a result, the below plot was generated with just the Sydney region.

tourism_book |>
  filter(Region == "Sydney") |>
  select(-Region, -State) |>
  autoplot() +
  labs(x = "Quarter", y = "Trips ('000s)",
       title = "Quarterly Overnight Trips for Sydney Region Grouped by Business Purpose") +
  theme_bw()
## Plot variable not specified, automatically selected `.vars = Trips`

Based on the above plot, on average, the Sydney region with the Visiting business purpose has the most overnight trips.

tourism_book |>
  group_by(State) |>
  summarize(state_trips = sum(Trips))

Above is the new tsibble that combines the Purposes and Regions, and has just the total trips by State.

2.8 Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

us_employment |>
  summarize(total_employed = sum(Employed, na.rm = TRUE)) |>
  autoplot()
## Plot variable not specified, automatically selected `.vars = total_employed`

us_employment |>
  summarize(total_employed = sum(Employed, na.rm = TRUE)) |>
  gg_season(total_employed, labels = "both") +
  theme_bw()
## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

us_employment |>
  filter(Month >= yearmonth("1990 Jan")) |>
  summarize(total_employed = sum(Employed, na.rm = TRUE)) |>
  gg_season(total_employed) +
  theme_bw()

us_employment |>
  summarize(total_employed = sum(Employed, na.rm = TRUE)) |>
  gg_subseries(total_employed) +
  facet_wrap(~month(Month, label = TRUE), nrow = 3) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90))
## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

us_employment |>
  summarize(total_employed = sum(Employed, na.rm = TRUE)) |>
  gg_lag(total_employed, geom = "point", lags = 1:12) +
  theme_bw()
## Warning: `gg_lag()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_lag()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

us_employment |>
  summarize(total_employed = sum(Employed, na.rm = TRUE)) |>
  ACF(total_employed, lag_max = 48) |>
  autoplot() +
  theme_bw()

For the above plots, the total_employed variable created is the sum of all values from the Employed column, regardless of the Series ID. Looking at the above plots, there appears to be a positive trend going back to 1939.

Looking at the seasonal plot with all years worth of data included, it’s difficult to see whether there is any seasonality or not. As a result, a second seasonal plot with data since 1990 was included. Based on this new seasonal plot, there appears to be a positive trend from January-June, followed by a dip in employment in July. The employment rates then stay relatively level until September, where the employment then increases through December for most years. Finally, for most years, the total employment decreases from December of one year to January of the next year. This all suggests that the total US Employment has some seasonality, with local peaks in June and December.

Looking at the autoplot(), there appears to be some cyclical behavior. For example, the total employment from 1992 to 2001 increases over time, but then begins to decrease in 2001 until about 2003. The total employment begins increasing again until about 2008, where it begins to decrease again until about 2010.

Finally, looking at the autoplot() and the seasonal plot, there appears to be a large jump in total employment between 1989 and 1990. This is very unusual compared to all other years, where the total employment typically decreases when going from one year to the next.

aus_production |>
  autoplot(Bricks) +
  theme_bw()
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production |>
  gg_season(Bricks, labels = "both") +
  theme_bw()
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_text()`).

aus_production |>
  gg_subseries(Bricks) +
  theme_bw()
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production |>
  gg_lag(Bricks, geom = "point") +
  theme_bw()
## Warning: Removed 20 rows containing missing values (gg_lag).

aus_production |>
  ACF(Bricks) |>
  autoplot() +
  theme_bw()

Looking at the autoplot(), there was a positive trend from about 1955 until about 1980. Since 1980, however, there has been a negative trend. Additionally, the autoplot() does appear to show some cyclicity. For example, starting in about 1991, the brick production generally increases until about 1995, and then decreases over the next 3 quarters. The brick production then generally increases until about 2000, where another decrease in brick production occurs over the next quarter.

Looking at the ACF plot, the correlations appear to have local peaks at every 4 time intervals (such as lag = 4, lag = 8, etc.). Additionally, the correlations also appear to have a local trough at every 4 time intervals, starting at lag = 2. This suggests there being seasonality on an annual basis. More specifically, when looking at the seasonal plot, the production generally increases from Q1 to Q2 every year, and then decreases starting in either Q2 or Q3 and into Q4.

Looking again at the autoplot(), there appear to be 2 unusual years: 1975 and 1982. In these years, there was a decrease in production, similar to other years as described above regarding cyclicity, but these 2 years have a much sharper decrease than any of the other years.

pelt |>
  autoplot(Hare) +
  theme_bw()

pelt |>
  gg_lag(Hare, geom = "point") +
  theme_bw()

pelt |>
  ACF(Hare, lag_max = 30) |>
  autoplot() +
  theme_bw()

Looking at the autoplot() above, there does not appear to be a positive or negative trend. The data does appear to be cyclical. For example, around 1918, there is a local minimum in terms of pelts traded, followed by an increase in pelts traded until about 1922, where there is a local maximum. In 1922, the number of pelts traded begin to decrease until about 1928. These patterns suggest that the number of hare pelts traded are cyclical.

Additionally, when looking at the ACF plot, there appears to be a strong negative correlation every 10 or so years, starting at year 5. This then increases until about year 10, where there is a strong positive correlation. This pattern continues, with the strength of each correlation decreasing over time. This further enhances that the data is cyclical.

Since the data is provided annually, it is not possible to generate either a seasonal or subseries plot. This also means that there is no seasonality within this data.

PBS |>
  summarize(total_cost = sum(Cost, na.rm = TRUE)) |>
  autoplot(total_cost) +
  theme_bw()

PBS |>
  summarize(total_cost = sum(Cost, na.rm = TRUE)) |>
  gg_season(total_cost, labels = "both") +
  theme_bw()

PBS |>
  summarize(total_cost = sum(Cost, na.rm = TRUE)) |>
  gg_subseries(total_cost) +
  theme_bw() +
  facet_wrap(~month(Month, label = TRUE), nrow = 3) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90))

PBS |>
  summarize(total_cost = sum(Cost, na.rm = TRUE)) |>
  gg_lag(total_cost, geom = "point", lags = 1:12) +
  theme_bw()

PBS |>
  summarize(total_cost = sum(Cost, na.rm = TRUE)) |>
  ACF(total_cost, lag_max = 48) |>
  autoplot() +
  theme_bw()

For the above plots, the total_cost variable created is the sum of all values from the Cost column, regardless of the Concession, Type, ATC1, and ACT2. Looking at the above plots, there appears to be a positive trend going back to about 1992.

Looking at the seasonal plots above, there is generally a decrease in total_cost from January-February, followed by a generally increasing total_cost through December. For many of the years, there’s also a sharp increase from November-December. Additionally, when looking at the ACF plot, every 12 time intervals, there is a small spike in correlation compared to the previous time interval. This suggests that there is annual seasonality, with a seasonal decrease in total_cost between January-February, followed by a general increase the rest of the year.

Looking at the autoplot(), there is no signs of an cyclicity.

Finally, looking at the seasonal plot, there appears to be a few outliers. In 2008, the total_cost decreased from February-March, the only such year where this occurred. In the years 2001 and 2006, the total_cost decreased between November-December, compared to other years where the total_cost increased betweeen these 2 months.

us_gasoline |>
  autoplot(Barrels) +
  theme_bw()

us_gasoline |>
  gg_season(Barrels, labels = "both") +
  theme_bw()

us_gasoline |>
  gg_subseries(Barrels, labels = "both") +
  theme_bw() +
  facet_wrap(~week(Week), nrow = 4) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90))
## Warning in geom_line(...): Ignoring unknown parameters: `labels`

us_gasoline |>
  gg_lag(Barrels, geom = "point") +
  theme_bw()

us_gasoline |>
  ACF(Barrels, lag_max = 106) |>
  autoplot() +
  theme_bw()

Looking at the above plots, there appears to be a positive trend going back to about 1990.

Looking at the seasonal plot, in general, it appears that the weekly gasoline supplied generally increases from January-June, and then generally decreases from June-December. Additionally, when looking at the ACF plot, the correlation generally decreases as the time interval increases. However, at the 26th week, there is a local minimum, at which point the correlation begins to increase until a local maximum at around the 52nd week. This pattern repeats for the next 52 weeks. These 2 plots suggest annual seasonality where the weekly production increases until about June, and then decreases until December.

Looking at the plots, there does not appear to be any obvious cyclicity. Additionally, there does not appear to be any clear unusual years in the plots.