Data 624 HW 2

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.4     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(ggplot2)
library(tsibble)

## 
## Attaching package: 'tsibble'

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union

library(DAAG)

## Loading required package: lattice

library(tsibbledata)
library(feasts)

## Loading required package: fabletools

library(ggfortify)
library(kableExtra)

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:tsibble':
## 
##     interval

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(fpp3)

## -- Attaching packages -------------------------------------------- fpp3 0.4.0 --

## v fable 0.3.1

## -- Conflicts ------------------------------------------------- fpp3_conflicts --
## x lubridate::date()        masks base::date()
## x dplyr::filter()          masks stats::filter()
## x kableExtra::group_rows() masks dplyr::group_rows()
## x tsibble::intersect()     masks base::intersect()
## x lubridate::interval()    masks tsibble::interval()
## x dplyr::lag()             masks stats::lag()
## x tsibble::setdiff()       masks base::setdiff()
## x tsibble::union()         masks base::union()

3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

global_economy%>% autoplot(.vars=GDP) + theme(legend.position = "none")

kable(global_economy %>%as_tibble() %>% mutate(decade =floor(Year/10)) %>%  group_by(decade,Country) %>% summarise(DecadeGDP = sum(GDP)) %>% top_n(10))

## `summarise()` has grouped output by 'decade'. You can override using the `.groups` argument.

## Selecting by DecadeGDP

decade	Country	DecadeGDP
196	Euro area	3.954402e+12
196	Europe & Central Asia	6.460620e+12
196	European Union	5.455547e+12
196	High income	1.559466e+13
196	IDA & IBRD total	4.174282e+12
196	North America	7.965154e+12
196	OECD members	1.541092e+13
196	Post-demographic dividend	1.507652e+13
196	United States	7.418900e+12
196	World	1.923056e+13
197	Euro area	1.433248e+13
197	Europe & Central Asia	2.182553e+13
197	European Union	1.843057e+13
197	High income	4.693969e+13
197	IDA & IBRD total	1.217686e+13
197	North America	1.879375e+13
197	OECD members	4.616374e+13
197	Post-demographic dividend	4.511374e+13
197	United States	1.714465e+13
197	World	5.788691e+13
198	Euro area	3.187818e+13
198	Europe & Central Asia	4.944313e+13
198	European Union	4.174850e+13
198	High income	1.164679e+14
198	IDA & IBRD total	2.819051e+13
198	North America	4.566057e+13
198	OECD members	1.139212e+14
198	Post-demographic dividend	1.117309e+14
198	United States	4.181471e+13
198	World	1.422897e+14
199	East Asia & Pacific	6.819211e+13
199	Euro area	6.768658e+13
199	Europe & Central Asia	9.989157e+13
199	European Union	8.800957e+13
199	High income	2.383632e+14
199	North America	8.217094e+13
199	OECD members	2.332657e+14
199	Post-demographic dividend	2.267205e+14
199	United States	7.600297e+13
199	World	2.834186e+14
200	East Asia & Pacific	1.041071e+14
200	Europe & Central Asia	1.601217e+14
200	European Union	1.373385e+14
200	High income	3.634766e+14
200	IDA & IBRD total	1.086855e+14
200	North America	1.373044e+14
200	OECD members	3.578679e+14
200	Post-demographic dividend	3.417142e+14
200	United States	1.262343e+14
200	World	4.651848e+14
201	East Asia & Pacific	1.691152e+14
201	Europe & Central Asia	1.755569e+14
201	High income	3.947413e+14
201	IBRD only	2.034018e+14
201	IDA & IBRD total	2.186140e+14
201	Low & middle income	2.075814e+14
201	Middle income	2.034943e+14
201	OECD members	3.825172e+14
201	Post-demographic dividend	3.626871e+14
201	World	6.022147e+14

In the 1960s We see a number of high income OECD groupings. As time goes on we see other groupings; this is polluted by a lot of groupings. Euro Area European Union Europe & Central Asia all coexist despite quite a lot of overlap. We do are able though to see the rise of East Asia in general, and China in particular.

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

The obvious transformation here is inflation (or trying to keep the dollars “real”)

global_economy %>% filter(Code=="USA") %>% mutate(RealGDP = GDP/CPI *100 ) %>% select(RealGDP,GDP) %>% autoplot(vars(RealGDP,GDP))

Real GDP has more distinct recessions than untransformed GDP, it also has a narrower range, showing that some monetary improvement is just inflation.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

aus_livestock  %>%filter(Animal=='Bulls, bullocks and steers')%>%  filter(State=='Victoria') %>% mutate(year =  year(Month))%>%filter(year>1976) %>%left_join(austpop, by ="year")%>%fill_gaps()%>%fill(Vic)%>%autoplot(Count/(Vic*1000))

aus_livestock  %>%filter(Animal=='Bulls, bullocks and steers')%>%filter(State=='Victoria')%>%mutate(Year = year(Month))%>%filter(Year>1976) %>% autoplot(Count)

We adjust by population (decenially) and dropping the partial year we begin on. This tends to moderate recent swings.

Victorian Electricity Demand from vic_elec.

lambda <- vic_elec %>%
  features(Demand, features = guerrero) %>%
  pull(lambda_guerrero)
vic_elec %>%
  autoplot(vars(box_cox(Demand, lambda), Demand))

Running guerrero produces a near 0 lambda which unsurprisingly doesn’t meaningfully modify the plot

vic_elec_two <- vic_elec %>%
  mutate(
    `21-MA` = slider::slide_dbl(Demand, mean,
                .before = 10, .after = 10, .complete = TRUE)
  )
vic_elec_two %>% autoplot(vars(Demand,`21-MA`))

## Warning: Removed 10 row(s) containing missing values (geom_path).

A moving average starts to allow a certain shape to be seen, but realistically without know ing what to focus on (intraday or seasonal it is hard to come up with an actual working scale)

Gas production from aus_production.

aus_production  %>% mutate(Year = year(Quarter)) %>% filter(Year<2010)%>% index_by(Year)%>%group_by(Year)%>%summarise(Gas = sum(Gas))%>%left_join(global_economy %>% filter(Code=="AUS"), by ="Year")%>%autoplot(vars(Gas/Population,Gas))

## Warning: Removed 4 row(s) containing missing values (geom_path).

We again transform based on population which tends to draw attention to the ramp up in the 70s of gas use, while showing a good portion of the post 1982 timeframe has a strong population growth component.

3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

First we examine the data

canadian_gas %>% autoplot

## Plot variable not specified, automatically selected `.vars = Volume`

Immediately we see that it bulges in the middle.This will present problems. For fun we use guerrero to see what it would pick.

lambda <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)
canadian_gas %>%
  autoplot(box_cox(Volume, lambda)) +
  labs(y = "",
       title = paste0(
         "Volume boxCoxTransformed  lambda  = ",
         round(lambda,2)))

This has not resolved the issue of seasonality in the data.

3.4

What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?

set.seed(12345678-4321)
myseries <- aus_retail %>% filter(`Series ID` == sample(aus_retail$`Series ID`,1))

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)
myseries %>% autoplot(vars(box_cox(Turnover, lambda), Turnover)) +
  labs(y = "",
       title = paste0(
         "Volume boxCoxTransformed  lambda  = ",
         round(lambda,2)))

It has a transformation of -.03 using guerrero which does seem to control the trend nicely.

3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

lambda <-aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)
aus_production %>%
  autoplot(vars(box_cox(Tobacco, lambda),Tobacco)) +
 labs(y = "",
       title = paste0(
         "Tobacco boxCoxTransformed  lambda  = ",
         round(lambda,2)))

## Warning: Removed 24 row(s) containing missing values (geom_path).

Tobacco should probably be further population weighted

ourseries<-ansett %>% filter(Airports  == 'MEL-SYD' & Class == 'Economy')
lambda <-ourseries %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)
ourseries %>%
  autoplot(vars(box_cox(Passengers, lambda),Passengers)) +
 labs(y = "",
       title = paste0(
         "Passengers boxCoxTransformed  lambda  = ",
         round(lambda,2)))

So 2.

ourseries<-pedestrian %>% filter(Sensor == "Southern Cross Station")
lambda <-ourseries %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)
ourseries %>%
  autoplot(vars(box_cox(Count, lambda),Count)) +
 labs(y = "",
       title = paste0(
         "Pedestrians boxCoxTransformed  lambda  = ",
         round(lambda,2)))

Again we see the issue with dense data fields

##3.7 Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) %>% select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas%>%autoplot(.vars = Gas)

Q1 seems to be a minima for any particular year followed by two quarters of sharply increasing demand with a peak in q3. Trend in general is increasing. Some hints of compression (where the minima is not as far below the maxima)

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas %>% model(  classical_decomposition(Gas, type ="multiplicative") )  %>%  components() %>%  autoplot()

## Warning: Removed 2 row(s) containing missing values (geom_path).

Do the results support the graphical interpretation from part a? Compute and plot the seasonally adjusted data.

Broadly yes, all is in accordance

gas %>% model(  classical_decomposition(Gas, type ="multiplicative") )  %>%  components() %>%  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2")

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas[[1]][10]<-gas[[1]][10]+300
gas %>% model(  classical_decomposition(Gas, type ="multiplicative") )  %>%  components() %>%  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2")

The single outlier seems to have properly broken the seasonal adjustment.

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Previous we did middle(ish) so let’s look at the end.

gas <- tail(aus_production, 5*4) %>% select(Gas)
gas[[1]][20]<-gas[[1]][20]+300
gas %>% model(  classical_decomposition(Gas, type ="multiplicative") )  %>%  components() %>%  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2")

As expected this has a much less deleterious effect.

##3.8 Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()
autoplot(x11_dcmp)

We do see a very high irregular pulse around 2000, this may be related to measurement changes. We also see a lessening of the seasonal trend in the early 1990s.

##3.9 Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Decomposition of the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995. Figure 3.19: Decomposition of the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Seasonal component from the decomposition shown in the previous figure. Figure 3.20: Seasonal component from the decomposition shown in the previous figure.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

Looking at the trend we see a small plateau for a few years from 91-92 followed by a return to growth. The seasonal data shows a gradually increasing then rapidly fading February bump, while other components of the seasonal adjustment remain more constant. We see a sharp negative remainder from 91-92 (possibly related to the plateau above). We see August has become a worse month continously, while other months have quite variation and direction changes.

Is the recession of 1991/1992 visible in the estimated components?

Yes, as mentioned above it is very visible in the remainder, and visible in the trend.

Data 624 HW 2

Scott Reed

9/8/2021

Data 624 HW 2

3.1

3.2

3.3

3.4

3.5