Homework 2

library(tidyverse)
library(fpp3)
library(seasonal)

3.1 Exercises

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

Monaco has the highest GDP per capita. Monaco and Liechtenstein has the highest GDP per capita from 1985 to 2017. Prior to 1985 the GDP per capita differences was not as great between the countries.

global_economy %>%
  autoplot(GDP/Population,show.legend = FALSE) +
  labs(title = "GSP per capita",y = "$US")

global_economy %>%
  mutate(gdp_percapita = GDP/Population) %>%
  arrange(desc(gdp_percapita))

## # A tsibble: 15,150 x 10 [1Y]
## # Key:       Country [263]
##    Country       Code   Year         GDP Growth   CPI Imports Exports Population
##    <fct>         <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Monaco        MCO    2014 7060236168.  7.18     NA      NA      NA      38132
##  2 Monaco        MCO    2008 6476490406.  0.732    NA      NA      NA      35853
##  3 Liechtenstein LIE    2014 6657170923. NA        NA      NA      NA      37127
##  4 Liechtenstein LIE    2013 6391735894. NA        NA      NA      NA      36834
##  5 Monaco        MCO    2013 6553372278.  9.57     NA      NA      NA      37971
##  6 Monaco        MCO    2016 6468252212.  3.21     NA      NA      NA      38499
##  7 Liechtenstein LIE    2015 6268391521. NA        NA      NA      NA      37403
##  8 Monaco        MCO    2007 5867916781. 14.4      NA      NA      NA      35111
##  9 Liechtenstein LIE    2016 6214633651. NA        NA      NA      NA      37666
## 10 Monaco        MCO    2015 6258178995.  4.94     NA      NA      NA      38307
## # ℹ 15,140 more rows
## # ℹ 1 more variable: gdp_percapita <dbl>

3.2 Exercises

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

-United States GDP from global_economy.

A transformation is not needed since the data is almost like a straight line and transformation won’t change the data.

global_economy %>%
 filter(Country == "United States") %>%
  autoplot() + 
  labs(title = "United States GDP")

## Plot variable not specified, automatically selected `.vars = GDP`

-Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

No transformation as no approriate adjustment make sense. Using lamba the number would be negative.

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers" & State == "Victoria") %>%
  autoplot(Count) +
  labs(title = "Bulls, Bullocks and steers from Victoria")

-Victorian Electricity Demand from vic_elec.

Transforming the data from daily to to show it monthly will allow you to see the dip and increases in the demand of electricity.

-Gas production from aus_production.

Transforming the gas production using the Box-Cox transformation the variation on the data is more even.

3.3 Exercises

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

The Box-Cox transformation was unhelpful in the Canadian_gas data as it does not tell use additional information with the transformation as the variation is almost the same. The transformation should have made the variation more consistent across the data.

3.4 Exercises

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

The Box-Cox transformation using lambda as 0.08 would be best as the lines are distributed evenly.

## # A tibble: 1 × 3
##   State              Industry                                    lambda_guerrero
##   <chr>              <chr>                                                 <dbl>
## 1 Northern Territory Clothing, footwear and personal accessory …          0.0830

3.5 Exercises

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1           0.926

With the Box-Cox Transformation it did not change much in the chart. The appropriate lamba to stablize the variance is .926.

The transformation in the Melbourne to Sydney Passengers were a bit more clear with the transformation and using lamba 2.

## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1          -0.111

The Southern Cross Station Pedestrians transformation was changed to weekly and then used the box cox transformation. The lamba was negative which is a concern therefore I used a positive number 2 instead.

3.7 Exercises

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

This is both a seasonal fluctuation and trend cycle which is slowing moving upwards.

gas %>%
  autoplot(Gas) +
  labs(title = "Gas Production")

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components () %>%
  autoplot() + xlab("Quarter") +
  ggtitle("Classical multiplicative decomposition of Gas Production")

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

Do the results support the graphical interpretation from part a?

Yes, the results support the graphical interpretation from part a showing there is a seasonal fluctuation and upward trend in the data.

Compute and plot the seasonally adjusted data.

gas2 <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) 

components(gas2) %>%
  as_tsibble %>%
  autoplot(Gas, colour = "gray") + 
  geom_line(aes(y = season_adjust), colour = "#D55E00") +
  ggtitle("Classical multiplicative decomposition of Gas Production")

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

The outlier is showing a spike the data and the result sare different from the previous one which is more smooth across the data.

gas3 <- gas %>%
  mutate(Gas = ifelse(Gas == 192, Gas + 300, Gas)) %>%
  model(classical_decomposition(Gas, type = "multiplicative")) 

components(gas3) %>%
  as_tsibble %>%
  autoplot(Gas, colour = "gray") + 
  geom_line(aes(y = season_adjust), colour = "#D55E00") +
  ggtitle("Classical multiplicative decomposition of Gas Production")

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

There is a significant difference with the outlier as the seasonally adjusted data because the outlier caused hte seasonally data to have a spike while it was more consistent across the board.

3.8 Exercises

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()

autoplot(x11_dcmp) +
  labs(title = "Decomposition of total Retail data using X-11.")

Using the x-11 shows there irregularity between 1990 to 2000 and from 2010 to 2020 that was not originally shown in the Turnover data.

3.9 Exercises

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The results of the STL decomposition shows a large irregularity happening around 1992 data. In Jan and August tends to have the lowest average labour force and Dec was the highest average labour force. July has a bit of increase in the labour in more recent years while March has been declining over the years. There is an upward trend in the data.

Is the recession of 1991/1992 visible in the estimated components?

The recession is visible in the estimated components as there is a big decrease in the remainder chart during those periods.