This document contains the homework problems for the Data 624 course. Link: https://otexts.com/fpp3/graphics-exercises.html

Homework 2: Decomposition

Instructions

Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from Forecasting: Principles and Practice book. Please submit both your Rpubs link as well as attach the .rmd file with your code.

Packages

if (!require('fpp3')) (install.packages('fpp3'))
if (!require('latex2exp')) (install.packages('latex2exp'))
if (!require('dplyr')) (install.packages('dplyr'))
if (!require('forecast')) (install.packages('forecast'))
if (!require('seasonal')) (install.packages('seasonal'))

Exercises

3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

head(global_economy)
## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country     Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
## 2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
## 3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
## 4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
## 5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
## 6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414
global_economy %>%
  autoplot(GDP/Population, show.legend =  FALSE) +
    labs(title = "GDP per Capita",
       subtitle = "1960-2017",
       x = "Year",
       y = "GDP per capita")

global_economy <- global_economy %>% 
  mutate(GDP_per_capita = GDP/Population) 

global_economy %>%
  filter(GDP_per_capita > 100000) %>%
  autoplot(GDP_per_capita) +
  labs(title= "GDP per capita",
       subtitle = "1960-2017",
       y = "USD")

z <- global_economy %>%
  group_by(Country, GDP, Population) %>%
  summarise(GD_PoP = GDP/Population) %>% 
  arrange(desc(GD_PoP))
head(z)
## # A tsibble: 6 x 5 [1Y]
## # Key:       Country, GDP, Population [6]
## # Groups:    Country, GDP [6]
##   Country               GDP Population  Year  GD_PoP
##   <fct>               <dbl>      <dbl> <dbl>   <dbl>
## 1 Monaco        7060236168.      38132  2014 185153.
## 2 Monaco        6476490406.      35853  2008 180640.
## 3 Liechtenstein 6657170923.      37127  2014 179308.
## 4 Liechtenstein 6391735894.      36834  2013 173528.
## 5 Monaco        6553372278.      37971  2013 172589.
## 6 Monaco        6468252212.      38499  2016 168011.
global_economy %>%
  tsibble(key = Code, index = Year)%>%
  filter(Country=="Monaco") %>%
  autoplot(GDP/Population)

Monaco has the highest GDP per capita.

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

  • United States GDP from global_economy.
  • Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
  • Victorian Electricity Demand from vic_elec.
  • Gas production from aus_production.

United States GDP from global_economy

global_economy %>% 
  filter(Country == "United States") %>% 
  autoplot() +
  labs(title= "United States GDP",
       subtitle = "1960-2017",
       y = "USD")

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
  autoplot(Count) +
  labs(title= "Slaughter of Victorian “Bulls, bulls and steers [Monthly]", y = "The Total Count")

The trend has come down over time in the last 40 years where there is downward trend in slaughter of Victorian bulls, bulls, and steers. Interesting to see peculiar trend during 2015 where there was reversal of trend again.

Victorian Electricity Demand from vic_elec.

head(vic_elec)
## # A tsibble: 6 x 5 [30m] <Australia/Melbourne>
##   Time                Demand Temperature Date       Holiday
##   <dttm>               <dbl>       <dbl> <date>     <lgl>  
## 1 2012-01-01 00:00:00  4383.        21.4 2012-01-01 TRUE   
## 2 2012-01-01 00:30:00  4263.        21.0 2012-01-01 TRUE   
## 3 2012-01-01 01:00:00  4049.        20.7 2012-01-01 TRUE   
## 4 2012-01-01 01:30:00  3878.        20.6 2012-01-01 TRUE   
## 5 2012-01-01 02:00:00  4036.        20.4 2012-01-01 TRUE   
## 6 2012-01-01 02:30:00  3866.        20.2 2012-01-01 TRUE
autoplot(vic_elec)+
    labs(title= "Electricity Demand [30 Minute]", 
         subtitle= "Victoria, Australia",
         y = "MW")

when viewing the vic_elec demand plot we witness seasonality. The spikes seen are likely summer or winter [extreme months].

vic_elec %>%
    index_by(Date) %>%
    summarise(Demand = sum(Demand)) -> daily_demand

daily_demand %>% autoplot(Demand) +
    labs(title= "Daily Electricity Demand", 
         subtitle= "Victoria, Australia",
         y = "MW")

There is spike in the first few months of each year and another smaller peak towards the middle of each year.

Gas production from aus_production

head(aus_production)
## # A tsibble: 6 x 7 [1Q]
##   Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##     <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
## 1 1956 Q1   284    5225    189    465        3923     5
## 2 1956 Q2   213    5178    204    532        4436     6
## 3 1956 Q3   227    5297    208    561        4806     7
## 4 1956 Q4   308    5681    197    570        4418     6
## 5 1957 Q1   262    5577    187    529        4339     5
## 6 1957 Q2   228    5651    214    604        4811     7
autoplot(aus_production, Gas) +
  labs(title= "Australian Gas Production")

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)
aus_production %>%
  autoplot(box_cox(Gas, lambda)) +
  labs(y = "",
       title= "Australian Gas Production",
       subtitle = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda,2))))+
    theme_replace()+
  geom_line(col = "#69b3a2")

Using the Guerrero feature, an optimal value of lambda = 0.12 was obtained. As seen above, the transformation has minimized the seasonal variation across the whole series and has provided an almost consistent throughout. Guerrero’s (1993) method to select the lambda which minimises the coefficient of variation for subseries of x.

3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

head(canadian_gas)
## # A tsibble: 6 x 2 [1M]
##      Month Volume
##      <mth>  <dbl>
## 1 1960 Jan   1.43
## 2 1960 Feb   1.31
## 3 1960 Mar   1.40
## 4 1960 Apr   1.17
## 5 1960 May   1.12
## 6 1960 Jun   1.01
canadian_gas %>%
  autoplot(Volume) +
  labs(title = "Canadian Gas Production")

canadian_gas%>%
  summarise(Volume = sum(Volume)) %>% 
  gg_subseries(Volume)

It doesn’t look like the variation increases/decreases with the level of the series.

3.4

What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?

set.seed(123)

myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

head(myseries)
## # A tsibble: 6 x 5 [1M]
## # Key:       State, Industry [1]
##   State    Industry                  `Series ID`    Month Turnover
##   <chr>    <chr>                     <chr>          <mth>    <dbl>
## 1 Victoria Household goods retailing A3349643V   1982 Apr     173.
## 2 Victoria Household goods retailing A3349643V   1982 May     180.
## 3 Victoria Household goods retailing A3349643V   1982 Jun     167.
## 4 Victoria Household goods retailing A3349643V   1982 Jul     174.
## 5 Victoria Household goods retailing A3349643V   1982 Aug     178.
## 6 Victoria Household goods retailing A3349643V   1982 Sep     180.
autoplot(myseries, Turnover)

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)
myseries %>%
  autoplot(box_cox(Turnover, lambda)) +
  labs(y = "",
       title = paste("Transformation with lambda = ", round(lambda,2)))

The above shows the transformed retail turn over with the \(\lambda\) parameter of 0.22 chosen using the Guerrero method. The transformation shows a more tamed and consistent amplitude in Turnover throughout in comparison to the original plot of the Turnover.

3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance.

  • Tobacco from aus_production
  • Economy class passengers between Melbourne and Sydney from ansett
  • Pedestrian counts at Southern Cross Station from pedestrian

Tobacco from aus_production

head(aus_production)
## # A tsibble: 6 x 7 [1Q]
##   Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##     <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
## 1 1956 Q1   284    5225    189    465        3923     5
## 2 1956 Q2   213    5178    204    532        4436     6
## 3 1956 Q3   227    5297    208    561        4806     7
## 4 1956 Q4   308    5681    197    570        4418     6
## 5 1957 Q1   262    5577    187    529        4339     5
## 6 1957 Q2   228    5651    214    604        4811     7
autoplot(aus_production, (Tobacco)) +
  labs(title = "Tobacco Production")

aus_tobacco <- aus_production %>% 
  select(Quarter, Tobacco)

lambda <- aus_tobacco %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)

aus_tobacco %>%
  autoplot(box_cox(Tobacco, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Tobacco Production with $\\lambda$ = ",
         round(lambda,2))))

Given our \(\lambda\) is near 1 with value of 0.93, there is no substantive transformation resulting from the Box-Cox transformation.

Economy class passengers between Melbourne and Sydney from ansett

eco_mel_syd <- ansett %>% 
  filter(Class == "Economy", Airports == "MEL-SYD")
eco_mel_syd %>% autoplot(Passengers)

lambda <- eco_mel_syd %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)
eco_mel_syd %>%
  autoplot(box_cox(Passengers, lambda)) +
  labs(y = "",
       title = paste("Box-Cox Transformation with lambda = ", round(lambda,2)))

Pedestrian counts at Southern Cross Station from pedestrian

head(pedestrian)
## # A tsibble: 6 x 5 [1h] <Australia/Melbourne>
## # Key:       Sensor [1]
##   Sensor         Date_Time           Date        Time Count
##   <chr>          <dttm>              <date>     <int> <int>
## 1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01     0  1630
## 2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01     1   826
## 3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01     2   567
## 4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01     3   264
## 5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01     4   139
## 6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01     5    77

View unique Sensor categories:

unique(pedestrian$Sensor)
## [1] "Birrarung Marr"                "Bourke Street Mall (North)"   
## [3] "QV Market-Elizabeth St (West)" "Southern Cross Station"
scs_pedestrian <- pedestrian %>%
  filter(Sensor == "Southern Cross Station")
sct_count <- pedestrian %>%
  filter(Sensor == "Southern Cross Station") %>%
  group_by(Sensor) %>%
  index_by(Week = yearweek(Date_Time)) %>%
  summarise(Count = sum(Count))
sct_count %>% autoplot(Count)

lambda <- sct_count %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)
sct_count %>%
  autoplot(box_cox(Count, lambda)) +
  labs(y = "",
       title = paste("Box-Cox Transformation with lambda = ", round(lambda,2)))

### 3.7 Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) %>% select(Gas)
  1. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
head(gas)
## # A tsibble: 6 x 2 [1Q]
##     Gas Quarter
##   <dbl>   <qtr>
## 1   221 2005 Q3
## 2   180 2005 Q4
## 3   171 2006 Q1
## 4   224 2006 Q2
## 5   233 2006 Q3
## 6   192 2006 Q4
gas <- tail(aus_production, 5*4) %>% select(Gas)

autoplot(gas, Gas)+
  labs(title = "Quarterly Australian Gas Production",
       subtitle = "Q3 2005 - Q2 2010")

We can observe there is seasonality as evident by a decrease around Q1 and increase around Q3. There is also an upward trend-cycle in gas production.

  1. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical multiplicative decomposition of total AU Gas Production")

  1. Do the results support the graphical interpretation from part a?
    Yes, one can clearly see the upward trend and the seasonality in the decomposition components.

  2. Compute and plot the seasonally adjusted data.

class_decomp <- gas %>%
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components()

class_decomp %>% autoplot() +
  labs(title = "Classical multiplicative decomposition of Australia
                  Gas Production")

as_tsibble(class_decomp) %>%
  autoplot(season_adjust) +
  labs(title = "Seasonally Adjusted Data")

  1. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
gas %>%
  mutate(Gas = if_else(Quarter==yearquarter("2007Q2"), Gas + 300, Gas)) %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(season_adjust) +
  labs(title = 'Seasonally Adjusted Data with 300 added to "2007 Q2"')

Interestingly a new seasonal pattern emerged post adding the outlier. It did have a great impact on the overal trend though.

  1. Does it make any difference if the outlier is near the end rather than in the middle of the time series?
#change one observation to be an outlier
gas_outlier_2 <- gas
gas_outlier_2$Gas[10] <- gas_outlier_2$Gas[10] + 1000

#recompute the seasonally adjusted data

# STL decomposition
dcmp_2 <- gas_outlier_2 %>%
  model(stl = STL(Gas))

#Compute and plot the seasonally adjusted data
components(dcmp_2) %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(y = "Gas Production",
       title = "Seasonally Adjusted Australian Gas Production",
       subtitle = "Purposely skewed with one outlier")

Adding the outlier to the end vs. adding it to the middle of the series have small difference from one another. The outlier in both cases eliminates the original increasing trend shown in 3.7.d.

3.8

Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()
autoplot(x11_dcmp)+
  labs(title =
    "Decomposition of Australian Department Stores Turnover using X-11.")

Seasonality was much stronger or more variable during the years of 1982 to around 1990, which is quite the opposite of what was previously observed. There appears to be an outlier in terms of retail data when it is usually towards the end of the year that you see bigger turnover.

3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

  1. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

Based on the plots for the number of persons in the civilian labor force in Australia, it presents an overall increasing trend over time, and the seasonal plot says it contains certain amount of seasonal fluctuations.However, the growth does not appear to be a straight line, as an obvious drop shows at around 1992 ~ 1993; the remainder plot also reflected this drop as an outlier pops downward. We could acknowledge from such a situation that there was a large decrease in labor force during that time, and the causation might be national recession or the revolution of gender distribution needed by the labor market.

  1. Is the recession of 1991/1992 visible in the estimated components?
    we noticed a large drop or dip around 1992, which also reflects an outlier pops downward on the remainder plot, during a recession, we expect a huge lost in the population of labor foce; hence the recessio of 1991/1992 is quite visible in the estimated components.