Data 624 HW 2

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.4     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggplot2)
library(tsibble)
## 
## Attaching package: 'tsibble'
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union
library(DAAG)
## Loading required package: lattice
library(tsibbledata)
library(feasts)
## Loading required package: fabletools
library(ggfortify)
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:tsibble':
## 
##     interval
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(fpp3)
## -- Attaching packages -------------------------------------------- fpp3 0.4.0 --
## v fable 0.3.1
## -- Conflicts ------------------------------------------------- fpp3_conflicts --
## x lubridate::date()        masks base::date()
## x dplyr::filter()          masks stats::filter()
## x kableExtra::group_rows() masks dplyr::group_rows()
## x tsibble::intersect()     masks base::intersect()
## x lubridate::interval()    masks tsibble::interval()
## x dplyr::lag()             masks stats::lag()
## x tsibble::setdiff()       masks base::setdiff()
## x tsibble::union()         masks base::union()

3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

global_economy%>% autoplot(.vars=GDP) + theme(legend.position = "none")
kable(global_economy %>%as_tibble() %>% mutate(decade =floor(Year/10)) %>%  group_by(decade,Country) %>% summarise(DecadeGDP = sum(GDP)) %>% top_n(10))
## `summarise()` has grouped output by 'decade'. You can override using the `.groups` argument.
## Selecting by DecadeGDP
decade Country DecadeGDP
196 Euro area 3.954402e+12
196 Europe & Central Asia 6.460620e+12
196 European Union 5.455547e+12
196 High income 1.559466e+13
196 IDA & IBRD total 4.174282e+12
196 North America 7.965154e+12
196 OECD members 1.541092e+13
196 Post-demographic dividend 1.507652e+13
196 United States 7.418900e+12
196 World 1.923056e+13
197 Euro area 1.433248e+13
197 Europe & Central Asia 2.182553e+13
197 European Union 1.843057e+13
197 High income 4.693969e+13
197 IDA & IBRD total 1.217686e+13
197 North America 1.879375e+13
197 OECD members 4.616374e+13
197 Post-demographic dividend 4.511374e+13
197 United States 1.714465e+13
197 World 5.788691e+13
198 Euro area 3.187818e+13
198 Europe & Central Asia 4.944313e+13
198 European Union 4.174850e+13
198 High income 1.164679e+14
198 IDA & IBRD total 2.819051e+13
198 North America 4.566057e+13
198 OECD members 1.139212e+14
198 Post-demographic dividend 1.117309e+14
198 United States 4.181471e+13
198 World 1.422897e+14
199 East Asia & Pacific 6.819211e+13
199 Euro area 6.768658e+13
199 Europe & Central Asia 9.989157e+13
199 European Union 8.800957e+13
199 High income 2.383632e+14
199 North America 8.217094e+13
199 OECD members 2.332657e+14
199 Post-demographic dividend 2.267205e+14
199 United States 7.600297e+13
199 World 2.834186e+14
200 East Asia & Pacific 1.041071e+14
200 Europe & Central Asia 1.601217e+14
200 European Union 1.373385e+14
200 High income 3.634766e+14
200 IDA & IBRD total 1.086855e+14
200 North America 1.373044e+14
200 OECD members 3.578679e+14
200 Post-demographic dividend 3.417142e+14
200 United States 1.262343e+14
200 World 4.651848e+14
201 East Asia & Pacific 1.691152e+14
201 Europe & Central Asia 1.755569e+14
201 High income 3.947413e+14
201 IBRD only 2.034018e+14
201 IDA & IBRD total 2.186140e+14
201 Low & middle income 2.075814e+14
201 Middle income 2.034943e+14
201 OECD members 3.825172e+14
201 Post-demographic dividend 3.626871e+14
201 World 6.022147e+14

In the 1960s We see a number of high income OECD groupings. As time goes on we see other groupings; this is polluted by a lot of groupings. Euro Area European Union Europe & Central Asia all coexist despite quite a lot of overlap. We do are able though to see the rise of East Asia in general, and China in particular.

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

  1. United States GDP from global_economy.

The obvious transformation here is inflation (or trying to keep the dollars “real”)

global_economy %>% filter(Code=="USA") %>% mutate(RealGDP = GDP/CPI *100 ) %>% select(RealGDP,GDP) %>% autoplot(vars(RealGDP,GDP))

Real GDP has more distinct recessions than untransformed GDP, it also has a narrower range, showing that some monetary improvement is just inflation.

  1. Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
aus_livestock  %>%filter(Animal=='Bulls, bullocks and steers')%>%  filter(State=='Victoria') %>% mutate(year =  year(Month))%>%filter(year>1976) %>%left_join(austpop, by ="year")%>%fill_gaps()%>%fill(Vic)%>%autoplot(Count/(Vic*1000))

aus_livestock  %>%filter(Animal=='Bulls, bullocks and steers')%>%filter(State=='Victoria')%>%mutate(Year = year(Month))%>%filter(Year>1976) %>% autoplot(Count)

We adjust by population (decenially) and dropping the partial year we begin on. This tends to moderate recent swings.

  1. Victorian Electricity Demand from vic_elec.
lambda <- vic_elec %>%
  features(Demand, features = guerrero) %>%
  pull(lambda_guerrero)
vic_elec %>%
  autoplot(vars(box_cox(Demand, lambda), Demand))

Running guerrero produces a near 0 lambda which unsurprisingly doesn’t meaningfully modify the plot

vic_elec_two <- vic_elec %>%
  mutate(
    `21-MA` = slider::slide_dbl(Demand, mean,
                .before = 10, .after = 10, .complete = TRUE)
  )
vic_elec_two %>% autoplot(vars(Demand,`21-MA`))
## Warning: Removed 10 row(s) containing missing values (geom_path).

A moving average starts to allow a certain shape to be seen, but realistically without know ing what to focus on (intraday or seasonal it is hard to come up with an actual working scale)

  1. Gas production from aus_production.
aus_production  %>% mutate(Year = year(Quarter)) %>% filter(Year<2010)%>% index_by(Year)%>%group_by(Year)%>%summarise(Gas = sum(Gas))%>%left_join(global_economy %>% filter(Code=="AUS"), by ="Year")%>%autoplot(vars(Gas/Population,Gas))
## Warning: Removed 4 row(s) containing missing values (geom_path).

We again transform based on population which tends to draw attention to the ramp up in the 70s of gas use, while showing a good portion of the post 1982 timeframe has a strong population growth component.

3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

First we examine the data

canadian_gas %>% autoplot
## Plot variable not specified, automatically selected `.vars = Volume`

Immediately we see that it bulges in the middle.This will present problems. For fun we use guerrero to see what it would pick.

lambda <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)
canadian_gas %>%
  autoplot(box_cox(Volume, lambda)) +
  labs(y = "",
       title = paste0(
         "Volume boxCoxTransformed  lambda  = ",
         round(lambda,2)))

This has not resolved the issue of seasonality in the data.

3.4

What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?

set.seed(12345678-4321)
myseries <- aus_retail %>% filter(`Series ID` == sample(aus_retail$`Series ID`,1))

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)
myseries %>% autoplot(vars(box_cox(Turnover, lambda), Turnover)) +
  labs(y = "",
       title = paste0(
         "Volume boxCoxTransformed  lambda  = ",
         round(lambda,2)))

It has a transformation of -.03 using guerrero which does seem to control the trend nicely.

3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

lambda <-aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)
aus_production %>%
  autoplot(vars(box_cox(Tobacco, lambda),Tobacco)) +
 labs(y = "",
       title = paste0(
         "Tobacco boxCoxTransformed  lambda  = ",
         round(lambda,2)))
## Warning: Removed 24 row(s) containing missing values (geom_path).

Tobacco should probably be further population weighted

ourseries<-ansett %>% filter(Airports  == 'MEL-SYD' & Class == 'Economy')
lambda <-ourseries %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)
ourseries %>%
  autoplot(vars(box_cox(Passengers, lambda),Passengers)) +
 labs(y = "",
       title = paste0(
         "Passengers boxCoxTransformed  lambda  = ",
         round(lambda,2)))

So 2.

ourseries<-pedestrian %>% filter(Sensor == "Southern Cross Station")
lambda <-ourseries %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)
ourseries %>%
  autoplot(vars(box_cox(Count, lambda),Count)) +
 labs(y = "",
       title = paste0(
         "Pedestrians boxCoxTransformed  lambda  = ",
         round(lambda,2)))

Again we see the issue with dense data fields

##3.7 Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) %>% select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas%>%autoplot(.vars = Gas)

Q1 seems to be a minima for any particular year followed by two quarters of sharply increasing demand with a peak in q3. Trend in general is increasing. Some hints of compression (where the minima is not as far below the maxima)

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas %>% model(  classical_decomposition(Gas, type ="multiplicative") )  %>%  components() %>%  autoplot() 
## Warning: Removed 2 row(s) containing missing values (geom_path).

Do the results support the graphical interpretation from part a? Compute and plot the seasonally adjusted data.

Broadly yes, all is in accordance

gas %>% model(  classical_decomposition(Gas, type ="multiplicative") )  %>%  components() %>%  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2")  

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas[[1]][10]<-gas[[1]][10]+300
gas %>% model(  classical_decomposition(Gas, type ="multiplicative") )  %>%  components() %>%  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2")  

The single outlier seems to have properly broken the seasonal adjustment.

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Previous we did middle(ish) so let’s look at the end.

gas <- tail(aus_production, 5*4) %>% select(Gas)
gas[[1]][20]<-gas[[1]][20]+300
gas %>% model(  classical_decomposition(Gas, type ="multiplicative") )  %>%  components() %>%  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2")  

As expected this has a much less deleterious effect.

##3.8 Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()
autoplot(x11_dcmp)

We do see a very high irregular pulse around 2000, this may be related to measurement changes. We also see a lessening of the seasonal trend in the early 1990s.

##3.9 Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Decomposition of the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995. Figure 3.19: Decomposition of the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Seasonal component from the decomposition shown in the previous figure. Figure 3.20: Seasonal component from the decomposition shown in the previous figure.

  1. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

Looking at the trend we see a small plateau for a few years from 91-92 followed by a return to growth. The seasonal data shows a gradually increasing then rapidly fading February bump, while other components of the seasonal adjustment remain more constant. We see a sharp negative remainder from 91-92 (possibly related to the plateau above). We see August has become a worse month continously, while other months have quite variation and direction changes.

  1. Is the recession of 1991/1992 visible in the estimated components?

Yes, as mentioned above it is very visible in the remainder, and visible in the trend.