DATA624_Assignment2

Objective

Assignment 2 involves answering questions 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8, 3.9 from the textbook Forecasting: principles and practice by Rob J Hyndman and George Athanasopoulos.

Exercise 3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

GDP per capita for each country

global_economy_ts <- global_economy

global_economy_ts %>%
   fill_gaps(GDP = NA, Population = NA)

## # A tsibble: 15,150 x 9 [1Y]
## # Key:       Country [263]
##    Country     Code   Year         GDP Growth   CPI Imports Exports Population
##    <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
##  2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
##  3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
##  4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
##  5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
##  6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414
##  7 Afghanistan AFG    1966 1399999967.     NA    NA   18.6     8.57   10152331
##  8 Afghanistan AFG    1967 1673333418.     NA    NA   14.2     6.77   10372630
##  9 Afghanistan AFG    1968 1373333367.     NA    NA   15.2     8.90   10604346
## 10 Afghanistan AFG    1969 1408888922.     NA    NA   15.0    10.1    10854428
## # ℹ 15,140 more rows

global_economy_ts %>%
  mutate(GDP_per_capita = GDP/Population) %>%
  autoplot(GDP_per_capita)

## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).

Highest GDP per capita

Monaco has the highest GDP per capita.

global_economy_ts %>%
 mutate(GDP_per_capita = GDP / Population) %>%
  select(Country, Year, GDP, GDP_per_capita) %>%
  arrange(desc(GDP_per_capita))

## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.

## # A tsibble: 15,150 x 4 [1Y]
## # Key:       Country [263]
##    Country        Year         GDP GDP_per_capita
##    <fct>         <dbl>       <dbl>          <dbl>
##  1 Monaco         2014 7060236168.        185153.
##  2 Monaco         2008 6476490406.        180640.
##  3 Liechtenstein  2014 6657170923.        179308.
##  4 Liechtenstein  2013 6391735894.        173528.
##  5 Monaco         2013 6553372278.        172589.
##  6 Monaco         2016 6468252212.        168011.
##  7 Liechtenstein  2015 6268391521.        167591.
##  8 Monaco         2007 5867916781.        167125.
##  9 Liechtenstein  2016 6214633651.        164993.
## 10 Monaco         2015 6258178995.        163369.
## # ℹ 15,140 more rows

Change over time

We can see an overall growth trend. There appears some cyclic declines that happens around 1985, 2000 and 2010.

ge_monaco <- global_economy_ts %>%
  filter(Country == "Monaco") %>%
  mutate(GDP_per_capita = GDP/Population)

autoplot(ge_monaco, GDP_per_capita)

## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_line()`).

Exercise 3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect. - United States GDP from global_economy - Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock - Victorian Electricity Demand from vic_elec - Gas production from aus_production

United States GDP from global_economy

I opted use a log_10 transformation because to reduce the exponential pattern. When using the transformation, we can more clearly see a linear relationship.

ge_USA <- global_economy %>%
  filter(Country == "United States") 
autoplot(ge_USA, GDP)

ge_USA <- ge_USA %>%
  mutate(log10_GDP = log10(GDP))

autoplot(ge_USA, log10_GDP)

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock

I used a Box-Cox transformation to address the fluctuations.

bbs_aus_livestock <- aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria")
autoplot(bbs_aus_livestock,Count)

lambda_bbs <- bbs_aus_livestock %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

bbs_aus_livestock %>%
  autoplot(box_cox(Count, lambda_bbs))

Victorian Electricity Demand from vic_elec

Similarly, I tried to use the Box-Cox transformation with this dataset. Unfortunately, it doesn’t reduce the large spikes.

vic_elec %>%
autoplot(Demand)

vic_lambda <- vic_elec %>%
  features(Demand, features = guerrero) %>%
  pull(lambda_guerrero)

vic_elec %>%
  autoplot(box_cox(Demand, vic_lambda))

Gas production from aus_production

No transformations are needed.

aus_production %>%
  autoplot(Gas)

Exercise 3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

It’s not useful in the case because the Box-Cox transformation is used towards data that is non-linear and exhibits high variance.

In the chart we can observe a linear relationship, some seasonality and an upward trend.

canadian_gas %>%
  autoplot(Volume)

Exercise 3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

Looking at the chart, there is evidence of both growing trend and increasing variance. To stablize the variance we can utilize the Box-Cox transformation. The ideal value or parameter for our transformation will be found using Guerrero’s method. Using this, it determines that our lambda will be 0.3.

myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>%
  autoplot(Turnover)

lambda <- myseries %>%
    features(Turnover, features = guerrero) %>%
    pull(lambda_guerrero)
lambda

## [1] 0.1375013

# plot with lambda transformation
myseries %>%
  autoplot(box_cox(Turnover, lambda))

Exercise 3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance.

Tobacco from aus_production
Economy class passengers between Melbourne and Sydney from ansett
Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco from aus_production

The appropriate transformation found through the guerrero method is 0.9264636.

aus_production %>%
  autoplot(Tobacco)

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

lambda_t <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)
lambda_t

## [1] 0.9264636

aus_production %>%
  autoplot(box_cox(Tobacco,lambda_t))

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

Economy class passengers between Melbourne and Sydney from ansett

The appropriate transformation found through the guerrero method is 1.999927.

econ_mel_syd <- ansett %>%
  filter(Airports == "MEL-SYD",Class == "Economy")
autoplot(econ_mel_syd,Passengers)

lambda_mel <- econ_mel_syd %>%
  features(Passengers, guerrero) %>%
  pull(lambda_guerrero)
lambda_mel

## [1] 1.999927

econ_mel_syd %>%
  autoplot(box_cox(Passengers, lambda_mel))

Pedestrian counts at Southern Cross Station from pedestrian.

The appropriate transformation found through the guerrero method is -0.2501616.

scs_ped <- pedestrian %>%
  filter(Sensor == "Southern Cross Station") 
autoplot(scs_ped, Count)

lambda_scs <- scs_ped %>%
  features(Count, guerrero) %>%
  pull(lambda_guerrero)
lambda_scs

## [1] -0.2501616

scs_ped %>%
  autoplot(box_cox(Count, lambda_scs))

Exercise 3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
Do the results support the graphical interpretation from part a?
Compute and plot the seasonally adjusted data.
Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Part A

The graph shows a clear consistent seasonal pattern. The series is consistent and does not show a clear upward or downward trend.

gas %>%
  autoplot(Gas)

### Part B

decomp_gas <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

autoplot(decomp_gas)

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

Part C

The classical decomposition supports the earlier interpretation in that it keeps most of the same shape with of the chart in part A. The seasonal components are less pronounced when decomposition is performed however. It’s still stable and keeps the same patterns.

Part D

decomp_gas %>%
  select(Quarter, season_adjust) %>%
  autoplot(season_adjust)

Part E

When inflating an observation by 300, it subdues the seasonality of the data. In other words, it distorts the time series patterns that is present.

gas_outlier <- gas
gas_outlier$Gas[1] <- gas_outlier$Gas[1] + 300

decomp_gas_outlier <- gas_outlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

decomp_gas_outlier %>%
  select(Quarter, season_adjust) %>%
  autoplot(season_adjust)

Part F

With the outlier at the end, it’s difficult to take note of any patterns. When the outlier is in the middle, it distorts any existing patterns.

gas_outlier_end <- gas
gas_outlier_end$Gas[nrow(gas_outlier_end) - 1] <- gas_outlier_end$Gas[nrow(gas_outlier_end) - 1] + 300

decomp_gas_outlier_end <- gas_outlier_end %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

decomp_gas_outlier_end %>%
  select(Quarter, season_adjust) %>%
  autoplot(season_adjust)

gas_outlier_middle <- gas
middle_index <- round(nrow(gas_outlier_middle) / 2)
gas_outlier_middle$Gas[middle_index] <- gas_outlier_middle$Gas[middle_index] + 300

decomp_gas_outlier_middle <- gas_outlier_middle %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

decomp_gas_outlier_middle %>%
  select(Quarter, season_adjust) %>%
  autoplot(season_adjust)

Exercise 3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

I noticed more outliers around the early 2000’s after using the X-11 decomposition.

myseries %>%
  autoplot(Turnover)

decomp_myseries <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()

autoplot(decomp_myseries)

Exercise 3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
Is the recession of 1991/1992 visible in the estimated components?

When observing for each part of the decomposition, we can see there is a general upward trend in the labor force, showing there is long-term growth. There are consistent fluctuations visible, a sign of seasonality. Looking at the remainder section, we can see that there are irregularities in the data.

The recession of 1991/1992 is visible in the estimated components. There is a noticeable dip around this time seen in the remainder section, which indicates a slow down in growth of the labor force.