Exercises: 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book.

Exercise 3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

head(global_economy)

## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country     Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
## 2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
## 3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
## 4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
## 5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
## 6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414

index(global_economy)

## Year

global_economy <- global_economy %>% 
  mutate(GDP_per_capita = GDP / Population)
global_economy %>%
  autoplot(GDP_per_capita, show.legend =  FALSE) + 
  labs(title = "GDP per Capita Per Country Over Time")

## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).

top_countries <- global_economy %>% 
  filter(Year == max(Year, na.rm = TRUE)) %>%
  arrange(desc(GDP_per_capita)) %>%
  slice_head(n=10)

top_countries

## # A tsibble: 10 x 10 [1Y]
## # Key:       Country [10]
##    Country          Code   Year     GDP Growth   CPI Imports Exports Population
##    <fct>            <fct> <dbl>   <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Luxembourg       LUX    2017 6.24e10   2.30 111.    194.    230.      599449
##  2 Macao SAR, China MAC    2017 5.04e10   9.10 136.     32.0    79.4     622567
##  3 Switzerland      CHE    2017 6.79e11   1.09  98.3    53.9    65.0    8466017
##  4 Norway           NOR    2017 3.99e11   1.92 115.     33.1    35.5    5282223
##  5 Iceland          ISL    2017 2.39e10   3.64 122.     42.8    47.0     341284
##  6 Ireland          IRL    2017 3.34e11   7.80 105.     87.9   120.     4813608
##  7 Qatar            QAT    2017 1.67e11   1.58 116.     37.3    51.0    2639211
##  8 United States    USA    2017 1.94e13   2.27 112.     NA      NA    325719178
##  9 North America    NAC    2017 2.10e13   2.35  NA      NA      NA    362492702
## 10 Singapore        SGP    2017 3.24e11   3.62 113.    149.    173.     5612253
## # ℹ 1 more variable: GDP_per_capita <dbl>

global_economy %>%
  filter(Country %in% top_countries$Country) %>%
  autoplot(GDP_per_capita, show.legend =  TRUE) + 
  labs(title = "GDP per Capita Per Country Over Time", 
       subtitle = "Only showing the top 10 countries as of 2017")

## Warning: Removed 32 rows containing missing values or values outside the scale range
## (`geom_line()`).

Luxembourg has the highest GDP per capita as of the most recent date in the data (2017).

top_per_year <- global_economy %>%
  index_by(Year) %>%
  slice_max(order_by = GDP_per_capita, n = 1)
   
top_per_year %>%
  ggplot(aes(x = Year, y=GDP_per_capita, fill=Country)) +
  geom_bar(stat="identity")   + 
  labs(title="Top County Each Year by GDP Per Capita")

In the past, Monaco has regularly had the highest GDP per Capita. Liechtenstein broke through as the highest in 2016 and 2014.

Exercise 3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
Victorian Electricity Demand from vic_elec.
Gas production from aus_production.

help(aus_production)

## starting httpd help server ... done

help(global_economy)
help(aus_livestock)
help(vic_elec)

Graphing the data

global_economy

United States GDP

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP) +
  labs(title = "United States GDP Over Time", x = "Year", y="GDP")

A transformation does not seem appropriate for this graph.

aus_livestock

Slaughter of Victorian “Bulls, bullocks and steers”.

index(aus_livestock)

## Month

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers" & State == "Victoria") %>%
  autoplot(Count)  +
  labs(title = "Slaughter of Victorian “Bulls, bullocks and steers", x = "Year-Month", y="# Slaughterd")

vic_elec

Victorian Electricity Demand

head(vic_elec)

## # A tsibble: 6 x 5 [30m] <Australia/Melbourne>
##   Time                Demand Temperature Date       Holiday
##   <dttm>               <dbl>       <dbl> <date>     <lgl>  
## 1 2012-01-01 00:00:00  4383.        21.4 2012-01-01 TRUE   
## 2 2012-01-01 00:30:00  4263.        21.0 2012-01-01 TRUE   
## 3 2012-01-01 01:00:00  4049.        20.7 2012-01-01 TRUE   
## 4 2012-01-01 01:30:00  3878.        20.6 2012-01-01 TRUE   
## 5 2012-01-01 02:00:00  4036.        20.4 2012-01-01 TRUE   
## 6 2012-01-01 02:30:00  3866.        20.2 2012-01-01 TRUE

index(vic_elec)

## Time

vic_elec %>% 
  autoplot(Demand)  +
  labs(title = "Victorian Electricity Demand", x = "Time (30min)", y="Electricity Demand")

This view of the Victorian electricity demand is very dense and noisy. I thought to sum it up to the daily view, but I doubt I’ll get much more out of that, so I’ll sum it up to the weekly view instead.

vic_elec %>%  
  mutate(Year_Week = yearweek(Date)) %>%
  index_by(Year_Week) %>%
  summarize(Weekly_Demand = sum(Demand)) %>%
  autoplot(Weekly_Demand)  +
  labs(title = "Weekly Victorian Electricity Demand", x = "Time (Weekly)", y="Electricity Demand")

Now we can see that electricity demands hover in between 1250000 and 2000000 units each week, and that early 2014 saw a big spike in particular.

aus_production

Gas production

aus_production %>%   
  autoplot(Gas)  +
  labs(title = "Australian Gas Production by Quarter", x = "Quarter", y="Gas Production")

The Australian gas production over time graph is increasing variance as time goes on. This makes it a good fit to apply box_cox().

# Referenced sec. 3.1 to get the lambda
lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)

aus_production %>%   
  autoplot(box_cox(Gas, lambda))  +
  labs(title = paste0("Transformed Australian Gas Production by Quarter. Lambda = ", round(lambda,2))
       , x = "Quarter", y="Gas Production")

With the box-cox transformation, the variance now looks more consistent over time (rather than increasing over time). We can see that around the 1970s, the gas production so a significant uptick.

Exercise 3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

head(canadian_gas)

## # A tsibble: 6 x 2 [1M]
##      Month Volume
##      <mth>  <dbl>
## 1 1960 Jan   1.43
## 2 1960 Feb   1.31
## 3 1960 Mar   1.40
## 4 1960 Apr   1.17
## 5 1960 May   1.12
## 6 1960 Jun   1.01

canadian_gas %>%   
  autoplot(Volume)  +
  labs(title = "Canadian Gas Production by Year-Month", x = "Year-Month", y="Gas Production")

canadian_gas %>%  
  gg_subseries(Volume) +
  labs(title = "Canadian Gas Production Segmented by Month")

## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

For the Canadian gas production, the variance does not have an easy-to-follow pattern of increasing or decreasing. This makes it hard for box-cox to do anything.

Exercise 3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

Quick recap of what this was

Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):

set.seed(12345678) myseries <- aus_retail |> filter(Series ID == sample(aus_retail$Series ID,1)) Explore your chosen retail time series using the following functions:

autoplot(), gg_season(), gg_subseries(), gg_lag(),

ACF() |> autoplot()

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

head(aus_retail)

## # A tsibble: 6 x 5 [1M]
## # Key:       State, Industry [1]
##   State                        Industry            `Series ID`    Month Turnover
##   <chr>                        <chr>               <chr>          <mth>    <dbl>
## 1 Australian Capital Territory Cafes, restaurants… A3349849A   1982 Apr      4.4
## 2 Australian Capital Territory Cafes, restaurants… A3349849A   1982 May      3.4
## 3 Australian Capital Territory Cafes, restaurants… A3349849A   1982 Jun      3.6
## 4 Australian Capital Territory Cafes, restaurants… A3349849A   1982 Jul      4  
## 5 Australian Capital Territory Cafes, restaurants… A3349849A   1982 Aug      3.6
## 6 Australian Capital Territory Cafes, restaurants… A3349849A   1982 Sep      4.2

aus_retail %>%
  filter(`Series ID` == "A3349561R") %>%
  autoplot(Turnover, show.legend = FALSE) + 
  labs(title = "AUS Retail Series A3349561R: Turnover Over Time")

aus_retail_lambda <- aus_retail %>%
  filter(`Series ID` == "A3349561R") %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)
 
aus_retail %>%
  filter(`Series ID` == "A3349561R") %>%     
  autoplot(box_cox(Turnover, aus_retail_lambda))  +
  labs(title = paste0("Transformed AUS Retail Series A3349561R: Turnover Over Time. Lambda = ", round(aus_retail_lambda,2)),
       y = "Transformed Turnover")

Pulling a random series of the aus_retail data, I would use a lambda of 0.13 for the Box-Cox transformation.

Exercise 3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Finding the Box-Cox transformation

aus_production

Tobacco

aus_production %>% 
  autoplot(Tobacco, show.legend = TRUE) + 
  labs(title = "AUS Tobacco Production over Time", 
       y = "Tobacco Production")

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

tobacco_lambda <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)

aus_production %>% 
  autoplot(box_cox(Tobacco, tobacco_lambda), show.legend = TRUE) + 
  labs(title = paste0("Transformed AUS Tobacco Production over Time. Lambda = ", round(tobacco_lambda, 2)), 
       y = "Transformed Tobacco Production")

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

Lambda is very close to 1, so next to no transformation is happening here to present the data better.

ansett

Economy class passengers between Melbourne and Sydney

ansett %>% 
  filter(Airports == "MEL-SYD" & Class == "Economy") %>%
  autoplot(Passengers, show.legend = TRUE) + 
  labs(title = "Economy class passengers between MEL-SYD over Time", 
       y = "# Passengers")

passenger_lambda <- ansett %>%
  filter(Airports == "MEL-SYD" & Class == "Economy") %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)

ansett %>% 
  filter(Airports == "MEL-SYD" & Class == "Economy") %>%
  autoplot(box_cox(Passengers, passenger_lambda), show.legend = TRUE) + 
  labs(title = 
         paste0("Transformed Economy class passengers MEL-SYD over Time. Lambda = ",
                round(passenger_lambda, 2)), 
       y = "Transformed # Passengers")

This lambda value is 2, indicating that a squared transformation is being used to show represent the data better.

pedestrian

Pedestrian counts at Southern Cross Station

pedestrian %>% 
  filter(Sensor == "Southern Cross Station"  ) %>%
  autoplot(Count, show.legend = TRUE) + 
  labs(title = "Pedestrian counts at Southern Cross Station over Time", 
       y = "# Pedestrians")

pedestrian_lambda <- pedestrian %>%
  filter(Sensor == "Southern Cross Station"  ) %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

pedestrian %>% 
  filter(Sensor == "Southern Cross Station") %>%
  autoplot(box_cox(Count, pedestrian_lambda), show.legend = TRUE) + 
  labs(title = 
         paste0("Transformed Pedestrian counts at Southern Cross Station over Time. Lambda = ",
                round(pedestrian_lambda, 2)), 
       y = "Transformed # Pedestrians")

The pedestrian count data is simply too noisy at the hourly level. Re-eggregating at the weekly level.

pedestrian %>% 
  filter(Sensor == "Southern Cross Station"  ) %>%
  mutate(Year_Week = yearweek(Date)) %>%
  index_by(Year_Week) %>%
  summarize(Count = sum(Count)) %>%
  autoplot(Count, show.legend = TRUE) + 
  labs(title = "Pedestrian counts at Southern Cross Station by Week", 
       y = "# Pedestrians")

pedestrian_lambda <- pedestrian %>%
  filter(Sensor == "Southern Cross Station"  ) %>%
  mutate(Year_Week = yearweek(Date)) %>%
  index_by(Year_Week) %>%
  summarize(Count = sum(Count)) %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

pedestrian %>% 
  filter(Sensor == "Southern Cross Station"  ) %>%
  mutate(Year_Week = yearweek(Date)) %>%
  index_by(Year_Week) %>%
  summarize(Count = sum(Count)) %>%
  autoplot(box_cox(Count, pedestrian_lambda), show.legend = TRUE) + 
  labs(title = 
         paste0("Transformed Pedestrian counts at Southern Cross Station by Week. Lambda = ",
                round(pedestrian_lambda, 2)), 
       y = "Transformed # Pedestrians")

Lambda is aroun -0.1, so the transformation is a bit of an inverse relationship.

Exercise 3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas %>%
  autoplot(Gas)

yes, it appears that there are seasonal fluctuations with a high point mid year. It also looks like it is trending up.

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices. Do the results support the graphical interpretation from part a?

gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical multiplicative decomposition of AUS gas production")

## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).

Yes, using classical multiplicative decomposition, we see a positive trend up, and seasonal peaks mid year.

Compute and plot the seasonally adjusted data.

gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>% 
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Persons (thousands)",
       title = "Total employment in US retail") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas[10,]$Gas <- gas[10,]$Gas + 300

gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>% 
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Persons (thousands)",
       title = "Total employment in US retail") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

The outlier has altered the seasonal adjustment and the trend of the data pretty substantially. Both are very different in window of time before and after the outlier

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas[10,]$Gas <- gas[10,]$Gas - 300 # undo previous outlier

gas[2,]$Gas <- gas[2,]$Gas + 300 # place new outlier near beginning

gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>% 
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Persons (thousands)",
       title = "Total employment in US retail") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

gas[2,]$Gas <- gas[2,]$Gas - 300 # undo outlier

gas[19,]$Gas <- gas[19,]$Gas + 300 # new outlier near end

gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>% 
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Persons (thousands)",
       title = "Total employment in US retail") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

When the outlier is near the beginning or end, it seems to have a less drastic impact on the seasonally adjusted line or the trend line as a whole. However, with the outlier at the end, one might come to misleading conclusions when inferring what the data will look like in the future.

Exercise 3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

aus_retail %>%
  filter(`Series ID` == "A3349561R") %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components() %>%
  autoplot() +
  labs(title = "Decomposition of Series A3349561R AUS Retail Turnover using X-11.")

Yes, with the use of X-11, I can see some irregular spiked in a couple of places. An example of one is around the start of 2005, which I otherwise would not have seen because it was masked by the seasonal spike around that time.

Exercise 3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Figure 3.19 Figure 3.20

a. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

We see a relatively steady, positive trend over the time frame (though it looks to be losing a little bit of traction after 1990, around the 1991/1992 recession point). We can also see outliers in the remainder section around this point in time (addressed in part b. of this question). When we look at the seasonal breakdown, we can also see that the March spike has slowly lost amplitude over the years, whereas the September and December ones have gained amplitude.

b. Is the recession of 1991/1992 visible in the estimated components?

Yes, it is very visible. The remainder section fo the decomposition shows a very clear drop between 1991 and 1992.A huge decrease in the number of persons in the civilian labour force would make sense at the time of a recession.

Homework 2

Catherine Dube

2026-02-15

Exercise 3.1

Exercise 3.2

Graphing the data

global_economy

aus_livestock

vic_elec

aus_production

Exercise 3.3

Exercise 3.4

Exercise 3.5

Finding the Box-Cox transformation

aus_production

ansett

pedestrian

Exercise 3.7

Exercise 3.8

Exercise 3.9