library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.4 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 2.0.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggplot2)
library(tsibble)
##
## Attaching package: 'tsibble'
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
library(DAAG)
## Loading required package: lattice
library(tsibbledata)
library(feasts)
## Loading required package: fabletools
library(ggfortify)
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:tsibble':
##
## interval
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(fpp3)
## -- Attaching packages -------------------------------------------- fpp3 0.4.0 --
## v fable 0.3.1
## -- Conflicts ------------------------------------------------- fpp3_conflicts --
## x lubridate::date() masks base::date()
## x dplyr::filter() masks stats::filter()
## x kableExtra::group_rows() masks dplyr::group_rows()
## x tsibble::intersect() masks base::intersect()
## x lubridate::interval() masks tsibble::interval()
## x dplyr::lag() masks stats::lag()
## x tsibble::setdiff() masks base::setdiff()
## x tsibble::union() masks base::union()
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
global_economy%>% autoplot(.vars=GDP) + theme(legend.position = "none")
kable(global_economy %>%as_tibble() %>% mutate(decade =floor(Year/10)) %>% group_by(decade,Country) %>% summarise(DecadeGDP = sum(GDP)) %>% top_n(10))
## `summarise()` has grouped output by 'decade'. You can override using the `.groups` argument.
## Selecting by DecadeGDP
| decade | Country | DecadeGDP |
|---|---|---|
| 196 | Euro area | 3.954402e+12 |
| 196 | Europe & Central Asia | 6.460620e+12 |
| 196 | European Union | 5.455547e+12 |
| 196 | High income | 1.559466e+13 |
| 196 | IDA & IBRD total | 4.174282e+12 |
| 196 | North America | 7.965154e+12 |
| 196 | OECD members | 1.541092e+13 |
| 196 | Post-demographic dividend | 1.507652e+13 |
| 196 | United States | 7.418900e+12 |
| 196 | World | 1.923056e+13 |
| 197 | Euro area | 1.433248e+13 |
| 197 | Europe & Central Asia | 2.182553e+13 |
| 197 | European Union | 1.843057e+13 |
| 197 | High income | 4.693969e+13 |
| 197 | IDA & IBRD total | 1.217686e+13 |
| 197 | North America | 1.879375e+13 |
| 197 | OECD members | 4.616374e+13 |
| 197 | Post-demographic dividend | 4.511374e+13 |
| 197 | United States | 1.714465e+13 |
| 197 | World | 5.788691e+13 |
| 198 | Euro area | 3.187818e+13 |
| 198 | Europe & Central Asia | 4.944313e+13 |
| 198 | European Union | 4.174850e+13 |
| 198 | High income | 1.164679e+14 |
| 198 | IDA & IBRD total | 2.819051e+13 |
| 198 | North America | 4.566057e+13 |
| 198 | OECD members | 1.139212e+14 |
| 198 | Post-demographic dividend | 1.117309e+14 |
| 198 | United States | 4.181471e+13 |
| 198 | World | 1.422897e+14 |
| 199 | East Asia & Pacific | 6.819211e+13 |
| 199 | Euro area | 6.768658e+13 |
| 199 | Europe & Central Asia | 9.989157e+13 |
| 199 | European Union | 8.800957e+13 |
| 199 | High income | 2.383632e+14 |
| 199 | North America | 8.217094e+13 |
| 199 | OECD members | 2.332657e+14 |
| 199 | Post-demographic dividend | 2.267205e+14 |
| 199 | United States | 7.600297e+13 |
| 199 | World | 2.834186e+14 |
| 200 | East Asia & Pacific | 1.041071e+14 |
| 200 | Europe & Central Asia | 1.601217e+14 |
| 200 | European Union | 1.373385e+14 |
| 200 | High income | 3.634766e+14 |
| 200 | IDA & IBRD total | 1.086855e+14 |
| 200 | North America | 1.373044e+14 |
| 200 | OECD members | 3.578679e+14 |
| 200 | Post-demographic dividend | 3.417142e+14 |
| 200 | United States | 1.262343e+14 |
| 200 | World | 4.651848e+14 |
| 201 | East Asia & Pacific | 1.691152e+14 |
| 201 | Europe & Central Asia | 1.755569e+14 |
| 201 | High income | 3.947413e+14 |
| 201 | IBRD only | 2.034018e+14 |
| 201 | IDA & IBRD total | 2.186140e+14 |
| 201 | Low & middle income | 2.075814e+14 |
| 201 | Middle income | 2.034943e+14 |
| 201 | OECD members | 3.825172e+14 |
| 201 | Post-demographic dividend | 3.626871e+14 |
| 201 | World | 6.022147e+14 |
In the 1960s We see a number of high income OECD groupings. As time goes on we see other groupings; this is polluted by a lot of groupings. Euro Area European Union Europe & Central Asia all coexist despite quite a lot of overlap. We do are able though to see the rise of East Asia in general, and China in particular.
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
The obvious transformation here is inflation (or trying to keep the dollars “real”)
global_economy %>% filter(Code=="USA") %>% mutate(RealGDP = GDP/CPI *100 ) %>% select(RealGDP,GDP) %>% autoplot(vars(RealGDP,GDP))
Real GDP has more distinct recessions than untransformed GDP, it also has a narrower range, showing that some monetary improvement is just inflation.
aus_livestock %>%filter(Animal=='Bulls, bullocks and steers')%>% filter(State=='Victoria') %>% mutate(year = year(Month))%>%filter(year>1976) %>%left_join(austpop, by ="year")%>%fill_gaps()%>%fill(Vic)%>%autoplot(Count/(Vic*1000))
aus_livestock %>%filter(Animal=='Bulls, bullocks and steers')%>%filter(State=='Victoria')%>%mutate(Year = year(Month))%>%filter(Year>1976) %>% autoplot(Count)
We adjust by population (decenially) and dropping the partial year we begin on. This tends to moderate recent swings.
lambda <- vic_elec %>%
features(Demand, features = guerrero) %>%
pull(lambda_guerrero)
vic_elec %>%
autoplot(vars(box_cox(Demand, lambda), Demand))
Running guerrero produces a near 0 lambda which unsurprisingly doesn’t meaningfully modify the plot
vic_elec_two <- vic_elec %>%
mutate(
`21-MA` = slider::slide_dbl(Demand, mean,
.before = 10, .after = 10, .complete = TRUE)
)
vic_elec_two %>% autoplot(vars(Demand,`21-MA`))
## Warning: Removed 10 row(s) containing missing values (geom_path).
A moving average starts to allow a certain shape to be seen, but realistically without know ing what to focus on (intraday or seasonal it is hard to come up with an actual working scale)
aus_production %>% mutate(Year = year(Quarter)) %>% filter(Year<2010)%>% index_by(Year)%>%group_by(Year)%>%summarise(Gas = sum(Gas))%>%left_join(global_economy %>% filter(Code=="AUS"), by ="Year")%>%autoplot(vars(Gas/Population,Gas))
## Warning: Removed 4 row(s) containing missing values (geom_path).
We again transform based on population which tends to draw attention to the ramp up in the 70s of gas use, while showing a good portion of the post 1982 timeframe has a strong population growth component.
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
First we examine the data
canadian_gas %>% autoplot
## Plot variable not specified, automatically selected `.vars = Volume`
Immediately we see that it bulges in the middle.This will present problems. For fun we use guerrero to see what it would pick.
lambda <- canadian_gas %>%
features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
canadian_gas %>%
autoplot(box_cox(Volume, lambda)) +
labs(y = "",
title = paste0(
"Volume boxCoxTransformed lambda = ",
round(lambda,2)))
This has not resolved the issue of seasonality in the data.
What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?
set.seed(12345678-4321)
myseries <- aus_retail %>% filter(`Series ID` == sample(aus_retail$`Series ID`,1))
lambda <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
myseries %>% autoplot(vars(box_cox(Turnover, lambda), Turnover)) +
labs(y = "",
title = paste0(
"Volume boxCoxTransformed lambda = ",
round(lambda,2)))
It has a transformation of -.03 using guerrero which does seem to control the trend nicely.
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
lambda <-aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(vars(box_cox(Tobacco, lambda),Tobacco)) +
labs(y = "",
title = paste0(
"Tobacco boxCoxTransformed lambda = ",
round(lambda,2)))
## Warning: Removed 24 row(s) containing missing values (geom_path).
Tobacco should probably be further population weighted
ourseries<-ansett %>% filter(Airports == 'MEL-SYD' & Class == 'Economy')
lambda <-ourseries %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
ourseries %>%
autoplot(vars(box_cox(Passengers, lambda),Passengers)) +
labs(y = "",
title = paste0(
"Passengers boxCoxTransformed lambda = ",
round(lambda,2)))
So 2.
ourseries<-pedestrian %>% filter(Sensor == "Southern Cross Station")
lambda <-ourseries %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
ourseries %>%
autoplot(vars(box_cox(Count, lambda),Count)) +
labs(y = "",
title = paste0(
"Pedestrians boxCoxTransformed lambda = ",
round(lambda,2)))
Again we see the issue with dense data fields
##3.7 Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) %>% select(Gas)
Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
gas%>%autoplot(.vars = Gas)
Q1 seems to be a minima for any particular year followed by two quarters of sharply increasing demand with a peak in q3. Trend in general is increasing. Some hints of compression (where the minima is not as far below the maxima)
Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
gas %>% model( classical_decomposition(Gas, type ="multiplicative") ) %>% components() %>% autoplot()
## Warning: Removed 2 row(s) containing missing values (geom_path).
Do the results support the graphical interpretation from part a? Compute and plot the seasonally adjusted data.
Broadly yes, all is in accordance
gas %>% model( classical_decomposition(Gas, type ="multiplicative") ) %>% components() %>% as_tsibble() %>%
autoplot(Gas, colour = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2")
Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
gas[[1]][10]<-gas[[1]][10]+300
gas %>% model( classical_decomposition(Gas, type ="multiplicative") ) %>% components() %>% as_tsibble() %>%
autoplot(Gas, colour = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2")
The single outlier seems to have properly broken the seasonal adjustment.
Does it make any difference if the outlier is near the end rather than in the middle of the time series?
Previous we did middle(ish) so let’s look at the end.
gas <- tail(aus_production, 5*4) %>% select(Gas)
gas[[1]][20]<-gas[[1]][20]+300
gas %>% model( classical_decomposition(Gas, type ="multiplicative") ) %>% components() %>% as_tsibble() %>%
autoplot(Gas, colour = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2")
As expected this has a much less deleterious effect.
##3.8 Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
x11_dcmp <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(x11_dcmp)
We do see a very high irregular pulse around 2000, this may be related to measurement changes. We also see a lessening of the seasonal trend in the early 1990s.
##3.9 Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Decomposition of the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995. Figure 3.19: Decomposition of the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Seasonal component from the decomposition shown in the previous figure. Figure 3.20: Seasonal component from the decomposition shown in the previous figure.
Looking at the trend we see a small plateau for a few years from 91-92 followed by a return to growth. The seasonal data shows a gradually increasing then rapidly fading February bump, while other components of the seasonal adjustment remain more constant. We see a sharp negative remainder from 91-92 (possibly related to the plateau above). We see August has become a worse month continously, while other months have quite variation and direction changes.
Yes, as mentioned above it is very visible in the remainder, and visible in the trend.