library(tsibbledata)
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble    3.2.1     ✔ ggplot2   3.5.1
## ✔ dplyr     1.1.4     ✔ tsibble   1.1.6
## ✔ tidyr     1.3.1     ✔ feasts    0.4.2
## ✔ lubridate 1.9.4     ✔ fable     0.4.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

2.1 Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

2.1a Use ? (or help()) to find out about the data in each series.

?aus_production
?pelt
?gafa_stock
?vic_elec

2.1b What is the time interval of each series?

  • aus_production: quarterly
  • pelt: annually (from 1845-1935).
  • gafa_stock: daily (on irregular trading days from 2014-2018).
  • vic_elec: half hourly

2.1c Use autoplot() to produce a time plot of each series.

# Bricks - aus_production
autoplot(aus_production, Bricks) 
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

# Lynx - pelt
autoplot(pelt, Lynx)

# Close - gafa_stock
autoplot(gafa_stock, Close) 

# Demand - vic_elec
autoplot(vic_elec, Demand)

- Bricks shows an overall upward trend (with some sharp drops) until just after 1980 where it starts declining. The ups and downs seem to be repeated yearly, suggesting a seasonal pattern. - Lynx shows a clear rise and fall repeatedly approximately every 10 years suggesting a cyclical pattern. - Stocks have also been showing an upward trend (with a bit of a downward trend towards the end of 2018). I don’t think that there is any clear seasonality or cycle visible. - Demand shows very strong seasonality with clear spikes around the beginning and middle of the year (winter and summer months).

2.1d For the last plot, modify the axis labels and title.

autoplot(vic_elec, Demand) +
  labs(title = "Electricity Demand in Victoria, Australia",
       y = "Demand (MWh)", x = "Time (30 mins)") 

2.2 Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

gafa_stock %>%
  group_by(Symbol) %>%
  filter(Close == max(Close)) %>%
  arrange(Symbol, Date)

2.3 Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

2.3a You can read the data into R with the following script:

tute1 <- readr::read_csv("/Users/aaliyahmjh/Downloads/tute1.csv", show_col_types = FALSE)
head(tute1)

2.3b Convert the data to time series

mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

2.3c Construct time series plots of each of the three series

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

## Check what happens when you don’t include facet_grid().

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() 

With facet_grid(), each variable is plotted separately and we are able to see the rises and falls and overall pattern much easier individually. However, we’re not able to directly compare it with the other variables since they are plotted separately and with a different scale.

If focusing on the patterns (trends and seasonality etc) is more important for a specific use case - facet grid should be implemented. However, if the goal is to compare the levels/magnitudes of each variable then facet grid should be removed (there is a risk associated here with not being able to see the details on smaller series when comparing them to much larger ones).

2.4 The USgas package contains data on the demand for natural gas in the US.

2.4a Install the USgas package.

install.packages("USgas")
## 
## The downloaded binary packages are in
##  /var/folders/nz/h7z329n55nxfs2dv7hmbhc400000gn/T//Rtmp1lwx3G/downloaded_packages
library(USgas)
?us_total

2.4b Create a tsibble from us_total with year as the index and state as the key.

us_gas <- us_total %>%
  as_tsibble(index = year, key = state)

2.4c Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

new_england_gas <- us_gas %>%
  filter(state %in% c("Maine", "Vermont", "New Hampshire",
                      "Massachusetts", "Connecticut", "Rhode Island"))
autoplot(new_england_gas, y) +
  labs(title = "New England Annual Natural Gas Consumption",
       x = "Year",
       y = "Million Cubic Feet")  +
  theme_minimal()

autoplot(new_england_gas, y) +
  labs(title = "New England Annual Natural Gas Consumption",
       x = "Year",
       y = "Million Cubic Feet")  +
  facet_wrap(~state, scales = "free_y") +
  theme_minimal()

When examined collectively and individually it is clear to see that: - Massachusetts is consistently consuming the most gas and has an overall upward trend. - Connecticut is also trending upward and inches close to Massachusetts especially after a sharp spike around 2010. - Comparatively New Hampshire, Rhode Island, and Vermont consume much less gas than MA and CT with NH showing very slight growth and fluctuations, RI showing a steady increase after big dips in the early 2000s and Vermont with very clear growth in the 2010s but still having significantly less consumption than the other New England states. - Maine is also on the lower end of the consumption and is the only state that is showing a steady decline in the recent years.

2.5

2.5a Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

tourism_xl <- readxl::read_excel("/Users/aaliyahmjh/Downloads/tourism.xlsx")

2.5b Create a tsibble which is identical to the tourism tsibble from the tsibble package.

tourism <- tourism_xl %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(index = Quarter, key = c(Region, State, Purpose))

head(tourism)

2.5c Find what combination of Region and Purpose had the maximum number of overnight trips on average.

Visiting Melbourne had the maximum number of overnight trips on average (~985.28 trips)

tourism %>%
  group_by(Region, Purpose) %>%
  summarise(avg_trips = mean(Trips, na.rm = TRUE)) %>%
  arrange(desc(avg_trips))
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.
## Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.
## Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.

2.5d Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

tourism_state <- tourism %>%
  group_by(State) %>%
  summarise(Trips = sum(Trips, na.rm = TRUE))

tourism_state
tourism_state <- tourism %>%
  group_by(State) %>%
  summarise(Trips = sum(Trips, na.rm = TRUE))%>%
  arrange(desc(Trips))
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `State`, `Quarter` first.
head(tourism_state,10)
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `State`, `Quarter` first.
## Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `State`, `Quarter` first.

New South Wales is the state with the most total trips.

2.8 Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

2.8a Can you spot any seasonality, cyclicity and trend?

2.8b What do you learn about the series?

2.8c What can you say about the seasonal patterns?

2.8d Can you identify any unusual years?

head(us_employment)
?us_employment
us_emp <- us_employment %>%
  filter(Title == "Total Private")

autoplot(us_emp, Employed)

gg_season(us_emp, Employed)
## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

gg_subseries(us_emp, Employed)
## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

gg_lag(us_emp, Employed)
## Warning: `gg_lag()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_lag()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

us_emp %>% ACF(Employed) %>% autoplot()

Total Private Employment

  • 2.8a: The series shows a strong long-term upward trend with annual seasonality (shown by repeated up and down pattern) and occasional cyclical declines, such as during 2008–2009.
  • 2.8b: I learned that employment has steadily grown over the years but was quite affected by the recession in 2008-2009.
  • 2.8c: Seasonal patterns are stable, with employment consistently peaking around mid-year and dipping at the start and end of each year.
  • 2.8d: Unusual years include 2008–2009, when the Global Financial Crisis affected employment and caused a sharp decline in the regular trend.
autoplot(aus_production, Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_season(aus_production, Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_subseries(aus_production, Bricks)
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_lag(aus_production, Bricks)
## Warning: Removed 20 rows containing missing values (gg_lag).

aus_production %>% ACF(Bricks) %>% autoplot()

Brick Production

  • 2.8a: The series shows strong growth until around 1980 then has a long-term decline.
    • There are clear seasonal patterns and cycles with repeated ups and downs throughout as well as recurring peaks (around Q2 and Q3) and dips (around Q1 and Q4).
  • 2.8b: I learned that brick production had overall growth until around 1980 when it began declining.
    • This was perhaps due to an increase in other construction materials or just an overall general decline in Australian manufacturing.
  • 2.8c: The seasonal patterns are stable, with production typically peaking around te middle of the year and dipping at the beginning and end of each year.
  • 2.8d: Unusual years include the early 1980s and early 1990s, when brick production dropped significantly.
autoplot(pelt, Hare)

gg_lag(pelt, Hare)

pelt %>% ACF(Hare) %>% autoplot()

# gg_season() requires data with a known seasonal period and the pelt dataset has no seasonal structure
# gg_season(pelt, Hare)

## gg_subseries() also won't work because since the data is annual there are no repeating sub-periods within a year.
## gg_subseries(pelt, Hare)

Hare (Pelt data)

  • 2.8a: The series doesn’t show seasonality (since this is annual data) not a definite long term trend, but it shows a clear cyclical pattern with spikes and dips about every decade.
  • 2.8b: I learned that the number of Snowshoe Hare pelts traded fluctuates quite dramatically, with extreme peaks above 150,000 followed by harsh crashes of almost zero, showing irregular cycles rather than steady growth.
  • 2.8c: There are no seasonal patterns because the data is annual, so the changes seen were likely not due to any calendar-driven effects.
  • 2.8d: Unusual years include extreme peaks in the 1860s and 1880s and harsh crashes in the 1870s and 1890s - these were amongst the most extreme numbers so they stood out to me.
?PBS

pbs_h02_filt <- PBS %>%
  filter(ATC2 == "H02")
head(pbs_h02_filt)
pbs_h02 <- PBS %>%
  filter(ATC2 == "H02") %>%
  summarise(Cost = sum(Cost))

autoplot(pbs_h02, Cost)

gg_season(pbs_h02, Cost)

gg_subseries(pbs_h02, Cost)

gg_lag(pbs_h02, Cost)

pbs_h02 %>% ACF(Cost) %>% autoplot()

# H02 Cost (PBS)

  • 2.8a: The series shows an overall upward trend with a clear seasonal pattern peaking towards the end of each year.
  • 2.8b: I learned that the cost of scripts have risen steadily over time, with the seasonal cycle repeating consistently and the costs are pretty closely related to recent months.
  • 2.8c: Costs are typically lower during the earlier part of the year (sharp drop in February consistently) and steadily increase until peaking around December - this is a stable seasonal pattern.
  • 2.8d: 2005-2006 are the only years that stick out to me as slightly “unusual” and this is because the costs peaked higher than usual compared to surrounding years.
?us_gasoline
autoplot(us_gasoline, Barrels)

gg_season(us_gasoline, Barrels)

gg_subseries(us_gasoline, Barrels)

gg_lag(us_gasoline, Barrels)

us_gasoline %>% ACF(Barrels) %>% autoplot()

Barrels (US Gasoline)