Homework 2

library(fpp3)

Exercises: 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9

3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

global_economy

## # A tsibble: 15,150 x 9 [1Y]
## # Key:       Country [263]
##    Country     Code   Year         GDP Growth   CPI Imports Exports Population
##    <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
##  2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
##  3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
##  4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
##  5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
##  6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414
##  7 Afghanistan AFG    1966 1399999967.     NA    NA   18.6     8.57   10152331
##  8 Afghanistan AFG    1967 1673333418.     NA    NA   14.2     6.77   10372630
##  9 Afghanistan AFG    1968 1373333367.     NA    NA   15.2     8.90   10604346
## 10 Afghanistan AFG    1969 1408888922.     NA    NA   15.0    10.1    10854428
## # i 15,140 more rows

global_economy %>%  
  mutate(GDP_Per_Capita=GDP/Population) -> nge 

nge %>%  
  autoplot(GDP_Per_Capita, show.legend=F)

## Warning: Removed 3242 rows containing missing values (`geom_line()`).

Assuming the question is asking about the line that has had the highest y-axis position for the longest time, we can find out which country it corresponds to by plotting with the condition GDP_Per_Capita > $150,000.

nge %>% 
  filter(GDP_Per_Capita>150000) %>% 
  autoplot(GDP_Per_Capita)

Monaco has the highest GDP per capita.

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
Victorian Electricity Demand from vic_elec.
Gas production from aus_production.

United States GDP from global_economy.

global_economy %>% 
  filter(Country=="United States") %>% 
  autoplot(GDP) +
  labs(title="United States GDP, 1960-2017", y="$USD")

This one doesn’t need a transformation.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

aus_livestock %>% 
  filter(Animal=="Bulls, bullocks and steers", State=="Victoria") -> bulls
bulls %>% 
  autoplot()

## Plot variable not specified, automatically selected `.vars = Count`

lambda <- bulls %>% 
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

bulls %>% 
  autoplot(box_cox(Count, 5))

Using a value of lambda=5 for a Box-Cox transformation, the plot appears more homogenous.

Victorian Electricity Demand from vic_elec.

vic_elec %>% 
  autoplot(Demand)

lambda <- vic_elec %>% 
  features(Demand, features = guerrero) %>%
  pull(lambda_guerrero)

vic_elec %>% 
  autoplot(-1/Demand)

Using an inverse transformation has slightly lessened the variation.

Gas production from aus_production.

aus_production %>% 
  select(Quarter, Gas) %>%
  autoplot(Gas)

The line has different variation over time.

aus_production %>% 
  select(Quarter, Gas) %>%
  autoplot(log(Gas))

Now the graph is more homogenous in variation.

3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas %>% autoplot()

## Plot variable not specified, automatically selected `.vars = Volume`

lambda <- canadian_gas |>
  features(Volume, features = guerrero) |>
  pull(lambda_guerrero)
canadian_gas |>
  autoplot(box_cox(Volume, lambda))

The variation in the line doesn’t differ much over time.

3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

set.seed(50)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries

## # A tsibble: 441 x 5 [1M]
## # Key:       State, Industry [1]
##    State      Industry                           `Series ID`    Month Turnover
##    <chr>      <chr>                              <chr>          <mth>    <dbl>
##  1 Queensland Other recreational goods retailing A3349480L   1982 Apr     11.1
##  2 Queensland Other recreational goods retailing A3349480L   1982 May     11.7
##  3 Queensland Other recreational goods retailing A3349480L   1982 Jun     11.5
##  4 Queensland Other recreational goods retailing A3349480L   1982 Jul     13.1
##  5 Queensland Other recreational goods retailing A3349480L   1982 Aug     13  
##  6 Queensland Other recreational goods retailing A3349480L   1982 Sep     13  
##  7 Queensland Other recreational goods retailing A3349480L   1982 Oct     12  
##  8 Queensland Other recreational goods retailing A3349480L   1982 Nov     13.2
##  9 Queensland Other recreational goods retailing A3349480L   1982 Dec     16.2
## 10 Queensland Other recreational goods retailing A3349480L   1983 Jan     12  
## # i 431 more rows

myseries %>% 
  autoplot(Turnover)

lambda <- myseries |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

myseries %>% 
  autoplot(box_cox(Turnover, lambda))

A lambda value of -0.20 removes some variaion. There is still a difference between the first part of the plot and the middle/end.

3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco

aus_production %>% 
  autoplot(Tobacco)

## Warning: Removed 24 rows containing missing values (`geom_line()`).

lambda <- aus_production |>
  features(Tobacco, features = guerrero) |>
  pull(lambda_guerrero)

aus_production %>% autoplot(box_cox(Tobacco, lambda))

## Warning: Removed 24 rows containing missing values (`geom_line()`).

Economy class passengers between Melbourne and Sydney from ansett

ansett %>% 
  filter(Class=="Economy", Airports=="MEL-SYD") -> economy

economy %>% autoplot(Passengers)

lambda <- economy |>
  features(Passengers, features = guerrero) |>
  pull(lambda_guerrero)

economy %>% autoplot(box_cox(Passengers, 6) )

Pedestrian counts at Southern Cross Station from pedestrian

ped <- pedestrian %>% 
  filter(Sensor=="Southern Cross Station")

ped %>% autoplot(Count)

lambda <- ped %>% features(Count, features=guerrero) %>% pull(lambda_guerrero)

ped %>% autoplot(box_cox(Count, lambda))

3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas %>% autoplot()

## Plot variable not specified, automatically selected `.vars = Gas`

The trend is increasing. There is a trough every year in the first quarter. The peaks appear to be around the middle of each year, around the end of the second quarter.

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas |>
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) |>
  components() %>% 
  select(trend)

## # A tsibble: 20 x 2 [1Q]
##    trend Quarter
##    <dbl>   <qtr>
##  1   NA  2005 Q3
##  2   NA  2005 Q4
##  3  200. 2006 Q1
##  4  204. 2006 Q2
##  5  207  2006 Q3
##  6  210. 2006 Q4
##  7  213  2007 Q1
##  8  216. 2007 Q2
##  9  219. 2007 Q3
## 10  219. 2007 Q4
## 11  219. 2008 Q1
## 12  219  2008 Q2
## 13  219  2008 Q3
## 14  220. 2008 Q4
## 15  222. 2009 Q1
## 16  223. 2009 Q2
## 17  225. 2009 Q3
## 18  226  2009 Q4
## 19   NA  2010 Q1
## 20   NA  2010 Q2

Do the results support the graphical interpretation from part a? Yes, the indices are increasing which aligns with what we saw in the plot from part a.
Compute and plot the seasonally adjusted data.

gas |>
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) |>
  components() %>% 
  select(seasonal) %>% 
  autoplot()

## Plot variable not specified, automatically selected `.vars = seasonal`

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas[3,1] <- gas[3,1] + 500
gas |>
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) |>
  components() %>% 
  select(seasonal) %>% 
  autoplot()

## Plot variable not specified, automatically selected `.vars = seasonal`

The outlier changed the shape of the seasonally adjusted data. Now peaks are at the first quarter of each year, not troughs. The troughs are in two places: between the first and second quarters, and during the third quarter.

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Part e had an outlier towards the beginning, so we can try to put it at the end.

gas <- tail(aus_production, 5*4) |> select(Gas)
gas[19,1] <- gas[19,1] + 500
gas |>
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) |>
  components() %>% 
  select(seasonal) %>% 
  autoplot()

## Plot variable not specified, automatically selected `.vars = seasonal`

It looks more similar in shape to the original time series. The peaks and troughs have shifted about a quarter.

3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

library(seasonal)

Original

myseries %>% autoplot(Turnover)

Decomposed

set.seed(50)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

x11_dcmp <- myseries |>
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
  components()

autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of Retail Turnover")

It’s easier to see the trend, how it increases for years before becoming unstable later. I also notice the change in seasonality between 2000 and 2010.

3.9

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The trend has the same scale as the original time series. The seasonal component graph shows that the distance between the peaks and troughs increases over time. The irregular graph reveals that the decrease seen in the original series is a severe outlier. It is shown as a precipitous drop.

Is the recession of 1991/1992 visible in the estimated components? Yes, particularly from March to August.

Homework 2

Tora Mullings

2023-09-17

3.1

3.2

3.3

3.4

3.5

3.7

3.8

3.9