Exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the Hyndman online Forecasting book.
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
library(fpp3)
gdp_per_capita<-global_economy %>%
drop_na(GDP, Population)
gdp_per_capita%>%
autoplot(GDP/Population)+
labs(title= "GDP per capita", y = "$US")+
theme(legend.position="none")gdp_per_capita%>%
mutate(gpc=GDP/Population)%>%
filter(gpc==max(gpc))I didn’t spend time to remove data which are not under any specific country because it’s assumed that the GDP per capita in those data are not at the maximum. Monaco has the highest GDP per capita. The GDP per capita has a trend of increase from 1970 to 2017. It has several dips in 1985, 2000, 2010 and 2017. It looks like the cycle is getting shorter in last 57 years.
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
simple_plot<-function(s, k, f_value, d){
s%>%
filter({{k}}==f_value)%>%
autoplot({{d}})+
labs(title = paste(f_value))
}
simple_plot(global_economy, Country, "United States", GDP)simple_plot(aus_livestock, State, "Victoria", Count)autoplot(vic_elec, Demand)autoplot(aus_production, Gas)# Carry out transformation
simple_plot(global_economy, Country, "United States", GDP/Population)aus<-aus_livestock%>%
filter(Animal=='Bulls, bullocks and steers')%>%
index_by(Year=year(Month))%>%
group_by(Year, Animal, State)%>%
summarise(AVGCount=mean(Count))
simple_plot(aus, State, "Victoria", AVGCount)elec<-vic_elec%>%
index_by(Date)%>%
group_by(Date)%>%
summarise(DailyDemand=mean(Demand))%>%
as_tsibble(index=Date)
autoplot(elec, DailyDemand)library(latex2exp)
lambda <- aus_production %>%
features(Gas, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Gas, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lambda,2)))) 1.
The US GDP per capita looks more linear than the total GDP before
transformation 2. The overall trend is decreasing and easier to
visualize after transformation for Slaughter of Victorian “Bulls,
bullocks and steers” in aus_livestock 3. For Victorian Electricity
Demand, it is about the same messy before and after transformation. 4.
With Lambda equals 0.11 in the transformation, the Gas production has
higher value starting from 1970 Q1.
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
autoplot(canadian_gas, Volume)l=c(-1, -0.5, 0, 0.5, 1)
for (lam in l){
print(autoplot(canadian_gas, box_cox(Volume, lambda=lam))+
labs(title = latex2exp::TeX(paste0("Canadian Gas with $\\lambda$ = ",round(lam,2)))))
}
If the data shows variation that increases or decreases with the level
of the series, then a transformation can be useful. Having said that,
the data in Canaidan gas has similar variation across the whole period.
The box_cox transformation doesn’t makes the size of the seasonal
variation change much, as such it is not helpful and makes the
forecasting model simpler for the 5 lambda shown above.
What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?
set.seed(12000)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
lambda2 <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
myseries %>%
autoplot(box_cox(Turnover, lambda2)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Australian Monthly Retail Data with $\\lambda2$ = ",
round(lambda2,2))))
Using the guerrero feature to choose a value of lambda which is 0.41,
which is close to the square root of the original data.
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
# Find an appropriate Box-Cox transformation for these cases
simple_lamb<-function(s1, k1){
s1%>%
features({{k1}}, features = guerrero) %>%
pull(lambda_guerrero)
}
# 1. Tobacco from aus_production
simple_lamb(aus_production, Tobacco)## [1] 0.9264636
# 2. Economy class passengers between Melbourne and Sydney from ansett
aus_air<-ansett%>%
filter(Class=='Economy',
Airports=='SYD-MEL'|
Airports=='MEL-SYD')
simple_lamb(aus_air, Passengers)## [1] 1.999927
# 3. Pedestrian counts at Southern Cross Station from pedestrian
ped<-pedestrian%>%
filter(Sensor=='Southern Cross Station')
simple_lamb(ped, Count)## [1] -0.2501616
The number above are the lambdas for Box Cox transformation for those 3 cases.
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) %>%
select(Gas)
# Plot the time series (a).
autoplot(gas, Gas)# Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
autoplot() +
labs(title = "Classical multiplicative decomposition of Gas data from aus_production")# Compute and plot the seasonally adjusted data.
gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
autoplot(season_adjust) +
labs(title = "Season adjust Gas data from aus_production")# Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
gas_o<-gas
gas_o['Gas'][gas_o['Gas'] == 194] <- 494
gas_o %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
autoplot(season_adjust) +
labs(title = "Season adjust Gas data from aus_production with Outliner")# Does it make any difference if the outlier is near the end rather than in the middle of the time series?
gas_o2<-gas
gas_o2['Gas'][gas_o2['Gas'] == 236] <- 536
gas_o2 %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
autoplot(season_adjust) +
labs(title = "Season adjust Gas data from aus_production with Outliner in the middle")Can you identify seasonal fluctuations and/or a trend-cycle? There are both seasonal fluctuations every year and a trend-cycle in the plot.
Do the results support the graphical interpretation from part a? The results support the graphical interpretation from part a. However, there is no trend at the beginning and end of the series
Change one observation to be an outliner (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outliner? It changes the whole seasonal pattern and flatten the adjusted and original trend plots.
Does it make any difference if the outliner is near the end rather than in the middle of the time series? It doesn’t change the whole seasonal pattern but flatten the adjusted and original trend plots
Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
# myseries is computed above
myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()%>%
autoplot() +
labs(title =
"Decomposition of Turnover of myseries using X-11.") The
relative scales of the seasonal component is decreasing. There was a
spike in between 2000 and 2001; and a trough in around 1993.
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
There was a general increase from 6500 to 9000 in the trend of civilian labour force in Australia from 1978 to 1995. There is a variation in a range of 200 in seasonal each year; the variation increases slightly with time. The maximum months are at March and December whilst the minimum months are at January and August. There was a sharp drop in 1991. Besides that, the variation in the remainder is usually less than 100.
Is the recession of 1991/1992 visible in the estimated components? Yes, it’s very visible with a sharp drop.